Training a simple linear model

Introduction

In this tutorial we will take a look at measurements of Hubble (the telescope). Besides taking beautiful pictures, it measured speed and distance of Super-Novae. Similar data was used in 1929 by Hubble (the person) and he found out that there is a linear relationship.

We will create a linear model based on observations and create predictions.

  • Objectives: Learn to create a linear model and predictions
  • Requirements: R-Basics

Data Preparation

We need to load the package that includes the data, and then load the data.

# Load libraries
suppressPackageStartupMessages(library("lava"))  # includes data

# Load data
data("hubble")

Let’s take a look at the data.

plot of chunk plot_simple

There seems to be a linear relationship. So we will create a linear model. This is easily done with lm. As parameters it requires a formula, which defines dependent and independent variable, as well as a dataframe assigned to data.

fit_lm <- lm(formula = D ~ v, data = hubble)
fit_lm
## 
## Call:
## lm(formula = D ~ v, data = hubble)
## 
## Coefficients:
## (Intercept)            v  
##    -6.44022      0.01449

In the output you see the intercept and linear parameter for v.

More details are presented with summary.

summary(fit_lm)
## 
## Call:
## lm(formula = D ~ v, data = hubble)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -43.806  -7.917   2.801   8.529  35.218 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -6.4402207  5.2777338   -1.22    0.231    
## v            0.0144852  0.0003702   39.13   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 14.19 on 34 degrees of freedom
## Multiple R-squared:  0.9783, Adjusted R-squared:  0.9776 
## F-statistic:  1531 on 1 and 34 DF,  p-value: < 2.2e-16

Here, you find also significance of parameters, but also p-value or R**2. These can be extracted from the model.

lm_intercept <- round(summary(fit_lm)$coeff[1], 4)
lm_v <- round(summary(fit_lm)$coeff[2], 4)
lm_R2 <- round(summary(fit_lm)$r.squared, 3)

lm_intercept
## [1] -6.4402
lm_v
## [1] 0.0145
lm_R2
## [1] 0.978

A prediction based on this model can be produced with predict. predict function requires an object, the linear model, and newdata, a dataframe with exactly the same names of dependant variables used in the model, here v.

First, a new dataframe, called newdata is created. It includes only one variable v, which is a sequence. This will be used for creating the prediction. The prediction will be save in new variable D_predicted.

newdata <- data.frame(v = seq(0, 30000, 1000))
newdata$D_predicted <- predict(object = fit_lm, newdata = newdata)

Finally, we want to compare baseline data, which was used for creation of model, with linear prediction. We use ggplot to get the result. Measurements are represented by black, predictions by blue points.

g <- ggplot(newdata, aes(v, D_predicted))
g <- g + geom_point(color = "blue")
g <- g + geom_point(data = hubble, aes(v, D))
g <- g + theme_bw()
g <- g + xlab ("Speed v [km/s]")
g <- g + ylab ("Distance D [MPc]")
g <- g + ggtitle("Hubble Super-Novae Type Ia Measurements and Predictions")
g

plot of chunk plot_final

Conclusion

You learned to create a linear (and univariate) model, which is the most basic model you can create. But if the basics are understood, more complicated models can be created with little effort.

Bibliography

Hubble telescope: https://en.wikipedia.org/wiki/Hubble_Space_Telescope

Hubble’s law: https://en.wikipedia.org/wiki/Hubble%27s_law

Lava package: https://cran.r-project.org/web/packages/lava/index.html

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close