8 dplyr

In the introductory vignette we learned that creating tidy eval functions boils down to a single pattern: quote and unquote. In this vignette we’ll apply this pattern in a series of recipes for dplyr.

This vignette is organised so that you can quickly find your way to a copy-paste solution when you face an immediate problem.

8.1 Patterns for single arguments

8.1.1 enquo() and !! - Quote and unquote arguments

We start with a quick recap of the introductory vignette. Creating a function around dplyr pipelines involves three steps: abstraction, quoting, and unquoting.

We end up with a function that automatically quotes its arguments group_var and summary_var and unquotes them when they are passed to other quoting functions:

8.1.2 as_label() - Create default column names

Use as_label() to transform a quoted expression to a column name:

These names are only a default stopgap. For more complex uses, you’ll probably want to let the user override the default. Here is a case where the default name is clearly suboptimal:

8.1.3 := and !! - Unquote column names

In expressions like c(name = NA), the argument name is quoted. Because of the quoting it’s not possible to make an indirect reference to a variable that contains a name:

In tidy eval function it is possible to unquote argument names with !!. However you need the special := operator:

This unusual operator is needed because using ! on the left-hand side of = is not valid R code:

Let’s use this !! technique to pass custom column names to group_by() and summarise():

8.2 Patterns for multiple arguments

8.2.1 ... - Forward multiple arguments

We have created a function that takes one grouping variable and one summary variable. It would make sense to take multiple grouping variables instead of just one. Let’s adjust our function with a ... argument.

  1. Replace group_var by ...:

  2. Swap ... and summary_var because arguments on the right-hand side of ... are harder to pass. They can only be passed with their full name explictly specified while arguments on the left-hand side can be passed without name:

  3. It’s good practice to prefix named arguments with a . to reduce the risk of conflicts between your arguments and the arguments passed to ...:

Because of the magic of dots forwarding we don’t have to use the quote-and-unquote pattern. We can just pass ... to other quoting functions like group_by():

Forwarding ... is straightforward but has the downside that you can’t modify the arguments or their names.

8.2.2 enquos() and !!! - Quote and unquote multiple arguments

Quoting and unquoting multiple variables with ... is pretty much the same process as for single arguments:

  • Quoting multiple arguments can be done in two ways: internal quoting with the plural variant enquos() and external quoting with vars(). Use internal quoting when your function takes expressions with ... and external quoting when your function takes a list of expressions.

  • Unquoting multiple arguments requires a variant of !!, the unquote-splice operator !!! which unquotes each element of a list as an independent argument in the surrounding function call.

Quote the dots with enquos() and unquote-splice them with !!!:

The quote-and-unquote pattern does more work than simple forwarding of ... and is functionally identical. Don’t do this extra work unless you need to modify the arguments or their names.

8.2.3 expr() - Modify quoted arguments

Modifying quoted expressions is often necessary when dealing with multiple arguments. Say we’d like a grouped_mean() variant that takes multiple summary variables rather than multiple grouping variables. We need to somehow take the mean() of each summary variable.

One easy way is to use the quote-and-unquote pattern with expr(). This function is just like quote() from base R. It plainly returns your argument, quoted:

But expr() has a twist, it has full unquoting support:

You can loop over a list of arguments and modify each of them:

This makes it easy to take multiple summary variables, wrap them in a call to mean(), before unquote-splicing within summarise():

8.2.4 vars() - Quote multiple arguments externally

How could we take multiple summary variables in addition to multiple grouping variables? Internal quoting with ... has a major disadvantage: the arguments in ... can only have one purpose. If you need to quote multiple sets of variables you have to delegate the quoting to another function. That’s the purpose of vars() which quotes its arguments and returns a list:

The arguments can be complex expressions and have names:

When the quoting is external you don’t use enquos(). Simply take lists of expressions in your function and forward the lists to other quoting functions with !!!:

One advantage of vars() is that it lets users specify their own names:

8.2.5 enquos(.named = TRUE) - Automatically add default names

If you pass .named = TRUE to enquos() the unnamed expressions are automatically given default names:

User-supplied names are never overridden:

This is handy when you need to modify the names of quoted expressions. In this example we’ll ensure the list is named before adding a prefix:

One big downside of this technique is that all arguments get a prefix, including the arguments that were given specific names by the user:

In general it’s better to preserve the names explicitly passed by the user. To do that we can’t automatically add default names with enquos() because once the list is fully named we don’t have any way of detecting which arguments were passed with an explicit names. We’ll have to add default names manually with quos_auto_name().

8.2.6 quos_auto_name() - Manually add default names

It can be helpful add default names to the list of quoted dots manually:

  • We can detect which arguments were explicitly named by the user.
  • The default names can be applied to lists returned by vars().

Let’s add default names manually with quos_auto_name() to lists of externally quoted variables. We’ll detect unnamed arguments and only add a prefix to this subset of arguments. This way we preserve user-supplied names:

Note how we add the default names before wrapping the arguments in a mean() call. This way we avoid including mean() in the name:

We get nicely prefixed default names:

And the user is able to fully override the names:

8.3 select()

TODO

8.4 filter()

TODO

8.5 case_when()

TODO