Introduction
In this tutorial we will take a look at measurements of Hubble (the telescope). Besides taking beautiful pictures, it measured speed and distance of Super-Novae. Similar data was used in 1929 by Hubble (the person) and he found out that there is a linear relationship.
We will create a linear model based on observations and create predictions.
- Objectives: Learn to create a linear model and predictions
- Requirements: R-Basics
Data Preparation
We need to load the package that includes the data, and then load the data.
# Load libraries
suppressPackageStartupMessages(library("lava")) # includes data
# Load data
data("hubble")
Let’s take a look at the data.
There seems to be a linear relationship. So we will create a linear model. This is easily done with lm. As parameters it requires a formula, which defines dependent and independent variable, as well as a dataframe assigned to data.
fit_lm <- lm(formula = D ~ v, data = hubble)
fit_lm
## ## Call: ## lm(formula = D ~ v, data = hubble) ## ## Coefficients: ## (Intercept) v ## -6.44022 0.01449
In the output you see the intercept and linear parameter for v.
More details are presented with summary.
summary(fit_lm)
## ## Call: ## lm(formula = D ~ v, data = hubble) ## ## Residuals: ## Min 1Q Median 3Q Max ## -43.806 -7.917 2.801 8.529 35.218 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -6.4402207 5.2777338 -1.22 0.231 ## v 0.0144852 0.0003702 39.13 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 14.19 on 34 degrees of freedom ## Multiple R-squared: 0.9783, Adjusted R-squared: 0.9776 ## F-statistic: 1531 on 1 and 34 DF, p-value: < 2.2e-16
Here, you find also significance of parameters, but also p-value or R**2. These can be extracted from the model.
lm_intercept <- round(summary(fit_lm)$coeff[1], 4)
lm_v <- round(summary(fit_lm)$coeff[2], 4)
lm_R2 <- round(summary(fit_lm)$r.squared, 3)
lm_intercept
## [1] -6.4402
lm_v
## [1] 0.0145
lm_R2
## [1] 0.978
A prediction based on this model can be produced with predict. predict function requires an object, the linear model, and newdata, a dataframe with exactly the same names of dependant variables used in the model, here v.
First, a new dataframe, called newdata is created. It includes only one variable v, which is a sequence. This will be used for creating the prediction. The prediction will be save in new variable D_predicted.
newdata <- data.frame(v = seq(0, 30000, 1000))
newdata$D_predicted <- predict(object = fit_lm, newdata = newdata)
Finally, we want to compare baseline data, which was used for creation of model, with linear prediction. We use ggplot to get the result. Measurements are represented by black, predictions by blue points.
g <- ggplot(newdata, aes(v, D_predicted))
g <- g + geom_point(color = "blue")
g <- g + geom_point(data = hubble, aes(v, D))
g <- g + theme_bw()
g <- g + xlab ("Speed v [km/s]")
g <- g + ylab ("Distance D [MPc]")
g <- g + ggtitle("Hubble Super-Novae Type Ia Measurements and Predictions")
g
Conclusion
You learned to create a linear (and univariate) model, which is the most basic model you can create. But if the basics are understood, more complicated models can be created with little effort.
Bibliography
Hubble telescope: https://en.wikipedia.org/wiki/Hubble_Space_Telescope
Hubble’s law: https://en.wikipedia.org/wiki/Hubble%27s_law
Lava package: https://cran.r-project.org/web/packages/lava/index.html