Introduction

This report describes an analysis of exoplanet data taken from NASA and Caltech’s website (https://exoplanetarchive.ipac.caltech.edu/). Amazingly, over 3500 exoplants outside of our solar system have been found by astronomers so far. In this analysis, there will be several areas of investigation: 1) A summary of the number of counts of each discovery method used to find the planets, 2) A print-out of the least and most massive planet in the data, along with mean, median and mode mass, 3) A histogram-density plot of planet masses, 4) A bar plot of the orbital periods of 10 planets, with error bars included, 5) A correlation plot of measured planet masses vs. their density-volume products.

Load NASA/Caltech exoplanet data:

planet_data <- read.csv("compositepars.csv")

1. Discovery Method Analysis

In this section, I will investigate the number of times each discovery method was used to find the exoplanets in the data. This will be summarized in a bar plot. Since some methods were used a lot more than others, a log scale will be used for the y-axis to be able to show everything clearly.

discovery_summary <- as.data.frame(table(planet_data$fpl_discmethod))
discovery_summary <- arrange(discovery_summary,desc(Freq))
colnames(discovery_summary) <- c("Discovery_Method","Counts")

Create and print a bar plot of the number of counts of each planet discovery method.

discovery_plot <- ggplot(data=discovery_summary, aes(reorder(Discovery_Method,-Counts),
                         Counts), na.rm=T) + 
geom_bar(stat="identity", fill = 'cyan1') +
labs(title = "Planet Discovery Method Counts",x = "Discovery Method",
y= "Counts (log10 scale)") + theme(text = element_text(size=16, 
                                family="OCR A Extended",color="yellow")) +
theme(panel.background = element_rect(fill = "black",
                                      colour = "yellow",
                                      size = 1, linetype = "solid"),
        panel.grid.major = element_line(size = 1, linetype = 'solid',
                                        colour = "yellow"), 
        panel.grid.minor = element_line(size = 1, linetype = 'solid',
                                        colour = "yellow")
  )+ theme(axis.text.x = element_text(angle = 45, hjust = 1, color="yellow"))+
  scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x))+
  theme(plot.background = element_rect(fill = "black"))+
  theme(axis.text.y = element_text(color="yellow"))
print(discovery_plot)

The transit method appears to be the most frequently used exoplanet discovery technique, followed by the radial velocity method.

2. Planet Mass Analysis

In this section, the planets with the largest and smallest masses are determined, along with the mean, median and mode mass.

Determine the least and most massive planets.

#Reorder the planet data based on increasing mass
planet_mass_data <- arrange(planet_data, fpl_bmasse)

#Convert planet names to a character vector.
planet_mass_data$fpl_name <- as.character(planet_mass_data$fpl_name)
#Get the mass of the most massive planet and the row index where it is in the
#data.
max_mass<- max(planet_mass_data$fpl_bmasse, na.rm=T)
max_index <- which(planet_mass_data$fpl_bmasse == max_mass)

#Get the mass of the smallest planet and the corresponding row index.
min_mass <- min(planet_mass_data$fpl_bmasse, na.rm=T)
min_index <- which(planet_mass_data$fpl_bmasse == min_mass)

Print out the planets with the minimum and maximum masses.

#Print out the least and most massive exoplanets.
cat("The most massive exoplanet is: ", planet_mass_data$fpl_name[max_index],"\n")
## The most massive exoplanet is:  HD 100546 b
cat("This planet has a mass ",planet_mass_data$fpl_bmasse[max_index],
    "times that of the Earth according to the NASA data.\n\n")
## This planet has a mass  239000 times that of the Earth according to the NASA data.
cat("The least massive exoplanet is: ", planet_mass_data$fpl_name[min_index],"\n")
## The least massive exoplanet is:  PSR B1257+12 b
cat("This planet has a mass ",planet_mass_data$fpl_bmasse[min_index],
    "times that of the Earth according to the NASA data.\n\n")
## This planet has a mass  0.02 times that of the Earth according to the NASA data.

Calculate the mean, median and mode planet masses.

#Calculate the mean planet mass.
mean_mass <- mean(planet_mass_data$fpl_bmasse, na.rm=T)

#Calculate the median planet mass.
median_mass <- median(planet_mass_data$fpl_bmasse, na.rm=T)

#Calculate the mode of the planet masses.  First, remove NAs from planet
#masses and put them in a separate vector.
mass_values <- planet_mass_data$fpl_bmasse[!is.na(planet_mass_data$fpl_bmasse)]

#Calculate the mode
mass_counts <- table(planet_mass_data$fpl_bmasse)
mode_mass <- mass_counts[which(mass_counts == max(mass_counts))]
mass_value <- as.numeric(names(mode_mass))

Print out the mean, median and mode mass value.

cat("The mean exoplanet mass is ", mean_mass,"times that of the Earth. \n")
## The mean exoplanet mass is  468.0092 times that of the Earth.
cat("The median exoplanet mass is ", median_mass,"times that of the Earth. \n")
## The median exoplanet mass is  7.75 times that of the Earth.
cat("The mode exoplanet mass is ", mass_value, "times that of the Earth.  \n")
## The mode exoplanet mass is  3.33 times that of the Earth.

3. Planet Mass Histogram-Density Plot

In this section, a histogram density plot of the planet masses is generated and plotted. Most of the planets fall into the bin of 2-4 times the mass of the Earth.

mass_hist <-ggplot(data=planet_mass_data, aes(planet_mass_data$fpl_bmasse, na.rm=T)) + 
  geom_histogram(aes(y =..density..), 
                 breaks=seq(0, 40, by = 2), 
                 col="darkblue", 
                 fill="#FFA420") + 
  geom_density(col="red",size=1.5) + 
  labs(title="Exoplanet mass distribution") +
  labs(x="Earth Masses", y="Density")+ xlim(c(0,40))+
  theme(text = element_text(size=16, family="OCR A Extended", face="bold", color="yellow"))+
  theme(plot.background = element_rect(fill = "black"))+
  theme(panel.background = element_rect(fill = "black",
                                      colour = "yellow",
                                      size = 1, linetype = "solid"))+
  theme(panel.grid.major = element_line(size = 1, linetype = 'solid',
                                        colour = "yellow"), 
        panel.grid.minor = element_line(size = 1, linetype = 'solid',
                                        colour = "yellow"))+
  theme(axis.text = element_text(color="yellow"))
print(mass_hist)

4. Orbital Period Analysis

In this section, the orbital period of 10 planets is shown on a bar plot, with error bars. The error bars show the uncertainty in the orbital period measurements.

Get the orbital period data of 10 planets chosen from the dataset.

#Create a sub frame of data that takes only orbital period and its error bar values. 
orbit_data <- select(planet_mass_data,fpl_name,fpl_orbper,fpl_orbpererr1,fpl_orbpererr2)

#Remove any rows that have NAs in them.  Want completed cases only.
orbit_data <- orbit_data[complete.cases(orbit_data), ]

#Arrange the orbital period data in order of increasing orbital period.
orbit_data <- arrange(orbit_data,fpl_orbper)

#Take a subset of 10 planets that have large orbital periods.
orbit_plot_data <- orbit_data[3515:3524,]

Show a bar plot with error bars of the orbital period of the ten chosen planets.

orbit_plot<- ggplot(orbit_plot_data, aes(x=reorder(fpl_name, fpl_orbper), y=fpl_orbper)) + 
  geom_bar(stat="identity", color="black", fill="springgreen") +
  theme(axis.text.x = element_text(angle = 70, hjust = 1))+
  labs(x="Planet name", y="Orbital Period (days)")+
  geom_errorbar(aes(ymin= fpl_orbper+fpl_orbpererr2, 
                    ymax=fpl_orbper+fpl_orbpererr1), width=.2, 
                color="hotpink", size=2) +
  theme(text = element_text(size=14, family="OCR A Extended", face="bold", color="yellow"))+
  theme(plot.background = element_rect(fill = "black"))+
  theme(panel.background = element_rect(fill = "black",
                                      colour = "yellow",
                                      size = 1, linetype = "solid"))+
  theme(panel.grid.major = element_line(size = 1, linetype = 'solid',
                                        colour = "yellow"), 
        panel.grid.minor = element_line(size = 1, linetype = 'solid',
                                        colour = "yellow"))+
  theme(axis.text = element_text(color="yellow"))
print(orbit_plot)

There is quite a large variation in the uncertainty in the measurements of the orbital periods of these planets.

5. Correlation between measured planet masses and measured density/volume products

In this section a linear correlation model is fit to determine the relationship between the measured planet masses and the measured density/volume products. In theory, this relationship between these two quantities should be very strong, since planet mass should have a value close to its mean density times its volume. This is also a good test of the accuracy of the measurements of these quantities for the planets.

#Get data for a correlation plot between planet mass and density/volume product.
cor_data <- select(planet_data, fpl_name, fpl_bmasse, fpl_rade, fpl_dens)
cor_data <- cor_data[complete.cases(cor_data),]

#Determine the radius of each planet in km.  Convert
#all radii to km by multiplying by the radius of the Earth (in km)
cor_data <- mutate(cor_data, radius = fpl_rade*6371)

#convert the earth mass values to actual masses in kg.
cor_data <- mutate(cor_data, mass = fpl_bmasse*5.972e24)

#Volume of each planet in km^3
cor_data <- mutate(cor_data, vol = (4/3)*pi*radius^3)

#convert densities to units of kg/km^3.
cor_data <- mutate(cor_data, density = fpl_dens*1e12)

#Calculate density/volume product.
cor_data <- mutate(cor_data, densvol = density*vol)

Produce and summarize the linear model comparing planet masses to density/volume products.

mass_fit <- lm(mass~densvol, data=cor_data)
summary(mass_fit)
## 
## Call:
## lm(formula = mass ~ densvol, data = cor_data)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -2.334e+28  1.342e+25  1.344e+25  1.363e+25  7.752e+27 
## 
## Coefficients:
##               Estimate Std. Error  t value Pr(>|t|)    
## (Intercept) -1.342e+25  9.540e+24   -1.406     0.16    
## densvol      1.002e+00  3.589e-04 2792.483   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.737e+26 on 3652 degrees of freedom
## Multiple R-squared:  0.9995, Adjusted R-squared:  0.9995 
## F-statistic: 7.798e+06 on 1 and 3652 DF,  p-value: < 2.2e-16

From the results of this fit, the relationship between planet mass and density-volume product is very strong, indicating that the quantities of planet mass, radius and density were accurately measured by astronomers.

Now I will show the correlation plot, highlighting the strong relationship between planet mass and density-volume product.

cor_data <- cor_data[!(cor_data$mass == max(cor_data$mass)),]
mass_fit_plot <- ggplot(data = cor_data, aes(x = densvol, y = mass)) + 
  geom_point(color='magenta', size=3) +
  geom_smooth(method = "lm", se = FALSE, color="green")+
  labs(x="Density/volume product (kg)", y="Planet Mass (kg)" )+
  labs(title = "Correlation plot")+
  theme(text = element_text(size=16, family="OCR A Extended", face="bold", color="yellow"))+
  theme(plot.background = element_rect(fill = "black"))+
  theme(panel.background = element_rect(fill = "black",
                                      colour = "yellow",
                                      size = 1, linetype = "solid"))+
  theme(panel.grid.major = element_line(size = 1, linetype = 'solid',
                                        colour = "yellow"), 
        panel.grid.minor = element_line(size = 1, linetype = 'solid',
                                        colour = "yellow"))+
  theme(axis.text = element_text(color="yellow"))
print(mass_fit_plot)

Conclusion

This report highlighted some of the properties of the exoplanet data obtained from NASA/Caltech sources. There is a very large range of behavior and properties among these planets, and it’s really quite remarkable that exoplanets have been discovered in our lifetimes (no aliens yet, though :)).