Wide and Tidy Data

In this tutorial we will learn what wide and tidy data is, why it is used and how you can transform it from one format in the other. Transforming data from one format in the other is a very common task in data analysis.

What is wide and tidy data?

Wide data means that each row contains more than one observation, whereas with tidy data there is exactly one observation for each row. Let’s take a look at an example.

We have a small school class with three students (Stuart, Bob, Kevin), who got grades in three different subjects (math, sport, art). The shown table would be a typical representation of Excel, SPSS or other programs.

school_wide <- data.frame(name = c("Stuart", "Bob", "Kevin"),
             math = c(2, 3, 4),
             sport = c(3, 1, 2),
             art = c(3, 2, 1))
school_wide
##     name math sport art
## 1 Stuart    2     3   3
## 2    Bob    3     1   2
## 3  Kevin    4     2   1

From wide to tidy

If you want to transform data from wide to tidy you can use functions from package tidyr. The relevant function is gather(). You need to provide some parameters

  • a key parameter name, which represents the columns. I will call the resulting column subject
  • a value parameter, which will be the name of the columns holding the values
  • Finally, you need to define which columns to use for transformation. Here this is 2:4.
library(tidyr)
school_tidy <- school_wide %>% 
    gather(key = "subject", value = "grade", 2:4)
school_tidy
##     name subject grade
## 1 Stuart    math     2
## 2    Bob    math     3
## 3  Kevin    math     4
## 4 Stuart   sport     3
## 5    Bob   sport     1
## 6  Kevin   sport     2
## 7 Stuart     art     3
## 8    Bob     art     2
## 9  Kevin     art     1

This is the same data, in a tidy-representation.

From tidy to wide

The function with the opposite effect is spread(). You need to pass the exact same parameters:

  • key: here you define the column name of the key-column
  • value: you pass the chosen name of the column with the values
school_tidy %>% 
  spread(key = "subject", value = "grade")
##     name art math sport
## 1    Bob   2    3     1
## 2  Kevin   1    4     2
## 3 Stuart   3    2     3

You see, that you get the original wide-data.

Why are there two different formats?

Both have their right to exist. Wide data typically comes from other programs. Excel or SPSS usually work with wide-data.

For specific tasks it is better to transform wide-data (also called messy data) into tidy-data. I will show two examples.

Example 1: Calculating group averages

Assume you want to calculate average class-grades per subject. With wide data you would do something like this:

mean(school_wide$math)
## [1] 3
mean(school_wide$sport)
## [1] 2
mean(school_wide$art)
## [1] 2

You repeat your code several times, which should be avoided. If there are only three columns, that is ok, but imagine you want to calculate the average for 50 columns.

Here is the solution based on tidy-data. You need to load dplyr package and make use of group_by() and summarise(). The code is much cleaner.

dplyr and tidyr are part of the tidyverse, which is a bunch of different packages, which rely on tidy-data.

library(dplyr)
school_tidy %>% 
  group_by(subject) %>% 
  summarise(mean_grade = mean(grade))
## # A tibble: 3 x 2
##   subject mean_grade
##   <chr>        <dbl>
## 1 art              2
## 2 math             3
## 3 sport            2

Example 2: Plotting with ggplot2

Assume you want to plot the results with ggplot2. This is done with three lines of code. Trying to get a similar representation based on wide-data will result in much more complex code.

library(ggplot2)
g <- ggplot(school_tidy, aes(name, grade, fill = subject))
g <- g + geom_bar(stat = "identity", position = "dodge")
g

plot of chunk unnamed-chunk-6

Conclusion

You have seen how easy it is to transform data from wide to tidy, and reverse. You also saw some examples in which tidy-data has advantages over wide-data.

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close