ggplot2 Introduction

ggplot2 is a plotting environment and delivers very appealing graphs, has compact and good-readable code and is easy to learn. It only takes small changes to get complex visualisations. To me it is by far the best plotting environment in R.

Data Understanding and Preparation

We will use “iris” dataset. It is a multivariate data set, created by Fisher in 1936. It consists of 50 samples of three different Iris species. Measured features are lengths and widths of sepals and petals. Unit is centimeters.

First, we load ggplot2 package. Please make sure you have installed it before loading. Data is loaded with data() function. “iris” is part of datasets package, which is preloaded at R startup.

suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(dplyr))
data(iris)
tbl_df (iris)
## # A tibble: 150 x 5
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
##           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
##  1          5.1         3.5          1.4         0.2 setosa 
##  2          4.9         3            1.4         0.2 setosa 
##  3          4.7         3.2          1.3         0.2 setosa 
##  4          4.6         3.1          1.5         0.2 setosa 
##  5          5           3.6          1.4         0.2 setosa 
##  6          5.4         3.9          1.7         0.4 setosa 
##  7          4.6         3.4          1.4         0.3 setosa 
##  8          5           3.4          1.5         0.2 setosa 
##  9          4.4         2.9          1.4         0.2 setosa 
## 10          4.9         3.1          1.5         0.1 setosa 
## # ... with 140 more rows

ggplot2 Components

Each ggplot2 graph has the following components

  • data: a dataframe is used as input data
  • aesthetics: define the axes (x, y), color, size, shape, text, fill, …
  • geometry: type of plot (line, bar, histogram)

Bar Plot

We create the very first plot. It will be a bar plot, showing the count of elements per species. In general a ggplot is built up in steps. First, we assign a new variable “g”, which will be loaded with all plot information. We start by calling ggplot() function and pass data (here: “iris”), and the aesthetics, which is column “Species”. In the second step we define geometry (here: geom_bar()). Last, we show the plot by calling variable “g”.

g <- ggplot (data = iris, aes(x = Species))
g <- g + geom_bar()
g

plot of chunk unnamed-chunk-2

As a result we find out, there are three species. Data is balanced: each group has 50 elements.

Histogram

A histogram shows distribution of one variable. It is applied with geom_histogram().

g <- ggplot (data = iris, aes(x = Sepal.Length))
g <- g + geom_histogram()
g

plot of chunk unnamed-chunk-3

Point Plot

We continue with a point plot. For this we need an “x”-column and a “y”-column. We use an additional feature and ink points according to their group with “color”. If graph is printed in black and white colors might be not distinguishable, so changing the shape of point according to species is necessary. All this is defined in aesthetics.

Since we want to get a point plot, we now define geometry with geom_point(). Default size of points is too small, so we change it with “size = 2”.

As a bonus a smoothed line is added with geom_smooth(). A linear regression line is defined with parameter (method = “lm”).

g <- ggplot (iris, aes(x = Sepal.Length, y = Petal.Length, color = Species, shape = Species))
g <- g + geom_point(size = 2)
g <- g + geom_smooth(method = "lm")
g

plot of chunk unnamed-chunk-4

Box-Plot

A boxplot is useful to show distribution properties

g <- ggplot (iris, aes(x = Species, y = Sepal.Length))
g <- g + geom_boxplot()
g

plot of chunk unnamed-chunk-5

Faceting

One of the most impressive features of ggplot() is faceting. Thus for each group different subplots are created. This is achieved with facet_grid(). Parameter is “. ~ Species”, which means that different species-plots are shown horizontally.

g <- ggplot (iris, aes(x = Sepal.Length, y = Petal.Length))
g <- g + geom_point()
g <- g + geom_smooth(method = "lm")
g <- g + facet_grid(. ~ Species)
g

plot of chunk unnamed-chunk-6

Axes and Scales

Axes labels and scales can be modified. We change the previous plot and add x-label und y-label with xlab() and ylab(). Scales are modified with scale_x_continuous().

g <- ggplot (iris, aes(x = Sepal.Length, y = Petal.Length))
g <- g + geom_point()
g <- g + geom_smooth(method = "lm")
g <- g + facet_grid(. ~ Species)
g <- g + xlab ("Sepal Length [cm]")
g <- g + ylab ("Petal Length [cm]")
g <- g + scale_x_continuous(breaks = seq(4, 8, .5))
g <- g + scale_y_continuous(breaks = seq(0, 7, .5))
g

plot of chunk unnamed-chunk-7

Themes

Themes define the general look of a plot. You can use a pre-defined theme, e.g. with theme_bw(). You can also specify each component of theme. Here, “legend.position” is changed from default (right) to bottom.

g <- ggplot (iris, aes(x = Sepal.Length, y = Petal.Length, color = Species))
g <- g + geom_point()
g <- g + theme_bw()
g <- g + theme(legend.position = "bottom")
g

plot of chunk unnamed-chunk-8

Saving a Plot

A ggplot can be saved with ggsave() function. Many parameter can be defined, e.g. height, width, dpi, or units. File type is implicitely defined within “filename” extension.

ggsave(filename = "my_first_ggplot.png", plot = g, height = 20, width = 20, units = "cm", dpi = 300)

More Information

For a quick overview you can use “Data visualisiation with ggplot2” cheatsheet (RStudio –> Help –> Cheatsheets).

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close