Functions with dplyr

In this tutorial we will learn how to use dplyr within functions. For this we will use movie data.

Learning Objectives: use dplyr within functions

Level: advanced

Data Preparation

The data we will analyse is part of ggplot2movies. We load the packages dplyr and ggplot2movies with pacman.

library(pacman)
p_load(dplyr, ggplot2movies)
data("movies")

Check with packageVersion() that dplyr version at 0.7 or above.

packageVersion("dplyr")
## [1] '0.7.8'

Only for sake of a nicer plot some variables are left out and only these variables are kept.

movies_filt <- movies %>% 
    select(title, rating, length, Action, Animation, Comedy, Drama, Documentary, Romance, Short)
knitr::kable(movies_filt %>% head(5))
title rating length Action Animation Comedy Drama Documentary Romance Short
$ 6.4 121 0 0 1 1 0 0 0
$1000 a Touchdown 6.0 71 0 0 1 0 0 0 0
$21 a Day Once a Month 8.2 7 0 1 0 0 0 0 1
$40,000 8.2 70 0 0 1 0 0 0 0
$50,000 Climax Show, The 3.4 71 0 0 0 0 0 0 0

Problem

Now assume you want to know the average length or rating of each of these 7 genres. There are 14 combinations, which means a lot of repeated code, which should be avoided. How can you avoid this? By using dplyr within a function.

Solution

The problem is that we want to pass a column name, that should be evaluated at a later step.

The first intuitive approach is this:

genre <- "Action"
movies_filt %>% 
    group_by(genre) %>% 
    summarise (rating = mean(rating))

This didn’t work. You have to make a quosure of the column name with quo() and unquosure it when it is used with !!.

genre <- quo(Comedy)
movies_filt %>% 
    filter((!!genre) == 1) %>% 
    summarise (rating = mean(rating))
## # A tibble: 1 x 1
##   rating
##    <dbl>
## 1   5.96

Now we use this in a function.

genre_stat_mean <- function(df, 
             group_var = c("Action", "Animation", "Comedy", "Drama", "Documentary", "Romance", "Short"), 
             stat_var = c("rating", "length") ) {
  group_var <- enquo (group_var)
  stat_var <- enquo (stat_var)
  col_name <- paste0("median_", 
           as.character(stat_var)[2])
  df %>% 
    filter((!!group_var) == 1) %>% 
    summarise(!!col_name := median((!!stat_var)))
}

How does it work:

  • Passed Parameters are a dataframe df, a grouping variable group_var, which is only allowed to have certain values to avoid errors, and a variable which is used for calculating the mean statistics stat_var which can either be “rating” or “length”.
  • group_var and stat_var are transformed into a quosure with enquo(). Did you recognize that we use enquo() instead of quo(). If you want to make a quosure within a function you have to use enquo() and not quo()!
  • col_name is the name of the column that is returned, e.g. “mean_rating”. Important: after casting stat_var into a character two values are returned, which is why the second has to be specified with [2].
  • Now we use the dataframe df, pipe it to the filter() function. Here we filter each genre to be equal 1. Important: you have to use brackets around !!group_var!
  • Within summarise() call at first column name for the returning value is defined. Important: the assignment is not make with “=”, but instead with “:=”, because we have quosures on both sides!

Now we can put the function to a test. We calculate the average rating of animated films. And we calculate the average length of Comedies.

genre_stat_mean(df = movies, 
         group_var = Animation, 
         stat_var = rating)
## # A tibble: 1 x 1
##   median_rating
##           <dbl>
## 1           6.7
genre_stat_mean(df = movies, 
         group_var = Comedy, 
         stat_var = length)
## # A tibble: 1 x 1
##   median_length
##           <int>
## 1            89

Summary

We learned to use dplyr within functions. Now we know when to use quo() and when to use enquo(), as well as when to use “:=” instead of “=”.

This knowledge will enable you to write more powerful code which is less error prone, because you can avoid to repeat your code many times.

More Information

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close