21.1 Introduction

In functions, us talked around how vital it is to mitigate duplication in your password by developing functions instead of copying-and-pasting. Reducing password duplication has actually three main benefits:

It’s much easier to watch the will of your code, since your eye are attracted to what’s different, no what stays the same.

You are watching: Non-numeric argument to mathematical function

It’s less complicated to answer to alters in requirements. As your requirements change, girlfriend only must make transforms in one place, fairly than mental to change every ar that girlfriend copied-and-pasted the code.

You’re most likely to have actually fewer bugs due to the fact that each heat of password is used in an ext places.

One tool for to reduce duplication is functions, which minimize duplication through identifying repeated patterns that code and also extract them out right into independent piece that can be quickly reused and updated. An additional tool for reducing duplication is iteration, which helps you when you should do the exact same thing to multiple inputs: repeating the same procedure on different columns, or on various datasets. In this chapter you will do it learn about two vital iteration paradigms: imperative programming and also functional programming. Top top the imperative side you have tools favor for loops and also while loops, which room a good place come start due to the fact that they make iteration an extremely explicit, so it’s evident what’s happening. However, because that loops are fairly verbose, and require rather a little bit of accounting code that is copied for every for loop. Practical programming (FP) provides tools come extract the end this duplicated code, so each common for loop pattern gets its very own function. When you grasp the vocabulary of FP, you deserve to solve many typical iteration troubles with less code, much more ease, and also fewer errors.


21.1.1 Prerequisites

Once did you do it mastered the for loops detailed by base R, you’ll find out some the the powerful programming tools noted by purrr, one of the tidyverse main point packages.


library(tidyverse)
Every for loop has actually three components:

The output: calculation . Before you start the loop, you must constantly allocate sufficient an are for the output. This is an extremely important for efficiency: if you grow the because that loop at every iteration using c() (for example), your for loop will be really slow.

A general means of developing an north vector of provided length is the vector() function. It has actually two arguments: the form of the vector (“logical”, “integer”, “double”, “character”, etc) and the length of the vector.

The sequence: i in seq_along(df). This determines what to loop over: each run of the for loop will certainly assign i to a different value native seq_along(df). It’s useful to think the i together a pronoun, favor “it”.

You could not have seen seq_along() before. It’s a safe version of the familiar 1:length(l), with critical difference: if you have a zero-length vector, seq_along() walk the right thing:


y vector("double", 0)seq_along(y)#> integer(0)1:length(y)#> <1> 1 0
You probably won’t produce a zero-length vector deliberately, yet it’s easy to develop them accidentally. If you usage 1:length(x) rather of seq_along(x), you’re likely to gain a confuse error message.

The body: output<> . This is the code that walk the work. It’s operation repeatedly, every time v a various value because that i. The an initial iteration will certainly run output<<1>> , the second will operation output<<2>> , and also so on.

That’s all there is come the because that loop! currently is a good time come practice creating some simple (and not so basic) because that loops using the exercises below. Then we’ll relocate on some variations that the for loop that aid you fix other troubles that will crop up in practice.


21.2.1 Exercises

Write because that loops to:

Compute the mean of every pillar in mtcars.Determine the form of each tower in nycflights13::flights.Compute the number of unique values in each shaft of iris.Generate 10 arbitrarily normals because that each of (mu = -10), (0), (10), and also (100).

Think about the output, sequence, and also body before you begin writing the loop.

Eliminate the for loop in every of the following examples by taking advantage of an existing role that works with vectors:


out ""for (x in letters) out stringr::str_c(out, x)x sample(100)sd 0for (i in seq_along(x)) sd sd + (x - mean(x)) ^ 2sd sqrt(sd / (length(x) - 1))x runif(100)out vector("numeric", length(x))out<1> x<1>for (i in 2:length(x)) out out + x
Combine your duty writing and for loop skills:

Write a because that loop the prints() the text to the children’s track “Alice the camel”.

Convert the nursery happiness “ten in the bed” to a function. Generalise it come any number of people in any kind of sleeping structure.

Convert the tune “99 bottles of beer top top the wall” come a function. Generalise come any variety of any ship containing any type of liquid on any kind of surface.

It’s common to see for loops the don’t preallocate the output and also instead rise the size of a vector at every step:


output vector("integer", 0)for (i in seq_along(x)) calculation c(output, lengths(x<>))output
How go this impact performance? Design and execute an experiment.


21.3 because that loop variations

Once you have the basic for loop under her belt, there space some variations the you need to be conscious of. These variations are important regardless of how you perform iteration, so don’t forget around them as soon as you’ve grasp the FP approaches you’ll learn around in the next section.

There are four variations on the simple theme that the for loop:

Modifying an present object, instead of creating a new object.Looping over names or values, rather of indices.Handling outputs that unknown length.Handling order of unknown length.

21.3.1 editing and enhancing an currently object

Sometimes you want to usage a for loop to change an present object. Because that example, remember our an obstacle from functions. We want to rescale every pillar in a data frame:


df tibble( a = rnorm(10), b = rnorm(10), c = rnorm(10), d = rnorm(10))rescale01 function(x) rng range(x, na.rm = TRUE) (x - rng<1>) / (rng<2> - rng<1>)df$a rescale01(df$a)df$b rescale01(df$b)df$c rescale01(df$c)df$d rescale01(df$d)
To fix this through a because that loop we again think around the 3 components:

Output: we currently have the calculation — it’s the very same as the input!

Sequence: we can think around a data framework as a perform of columns, so we deserve to iterate end each shaft with seq_along(df).

Body: use rescale01().

This gives us:


for (i in seq_along(df)) df<> rescale01(df<>)
Typically you’ll be editing and enhancing a list or data framework with this type of loop, for this reason remember to use <<, not <. You could have spotted the I provided << in every my for loops: i think it’s far better to use << also for atom vectors due to the fact that it provides it clear that I desire to work-related with a single element.


21.3.2 Looping patterns

There are three an easy ways to loop end a vector. So much I’ve shown you the many general: looping over the numeric indices with for (i in seq_along(xs)), and extracting the value v x<>. There are two various other forms:

Loop over the elements: because that (x in xs). This is most useful if you only care around side-effects, favor plotting or conserving a file, since it’s complicated to save the output efficiently.

Loop over the names: because that (nm in names(xs)). This offers you name, i beg your pardon you have the right to use to accessibility the value with x<>. This is useful if you want to usage the surname in a plot title or a file name. If you’re producing named output, make sure to name the results vector prefer so:


results vector("list", length(x))names(results) names(x)
Iteration end the numeric indexes is the most general form, because given the position you have the right to extract both the name and the value:


for (i in seq_along(x)) name names(x)<> worth x<>

21.3.3 Unknown output length

Sometimes you could not know exactly how long the output will certainly be. Because that example, imagine you desire to simulate part random vectors of random lengths. You might be tempted to fix this trouble by progressively farming the vector:


means c(0, 1, 2)output double()for (i in seq_along(means)) n sample(100, 1) calculation c(output, rnorm(n, means<>))str(output)#> num <1:202> 0.912 0.205 2.584 -0.789 0.588 ...
But this is not an extremely efficient due to the fact that in every iteration, R needs to copy all the data from the previous iterations. In technical terms you obtain “quadratic” ((O(n^2))) behaviour which means that a loop with three times together many aspects would take nine ((3^2)) times as lengthy to run.

A better solution to save the outcomes in a list, and also then incorporate into a single vector after ~ the loop is done:


out vector("list", length(means))for (i in seq_along(means)) n sample(100, 1) out<> rnorm(n, means<>)str(out)#> list of 3#> $ : num <1:83> 0.367 1.13 -0.941 0.218 1.415 ...#> $ : num <1:21> -0.485 -0.425 2.937 1.688 1.324 ...#> $ : num <1:40> 2.34 1.59 2.93 3.84 1.3 ...str(unlist(out))#> num <1:144> 0.367 1.13 -0.941 0.218 1.415 ...
Here I’ve used unlist() come flatten a list of vectors right into a single vector. A stricter alternative is to usage purrr::flatten_dbl() — it will throw one error if the input isn’t a perform of doubles.

This pattern wake up in other places too:

You might be generating a long string. Instead of paste()ing together each iteration with the previous, save the output in a character vector and also then integrate that vector into a single string v paste(output, collapse = "").

You can be generating a large data frame. Instead of sequentially rbind()ing in every iteration, save the output in a list, then use dplyr::bind_rows(output) to incorporate the output right into a solitary data frame.

Watch out for this pattern. Anytime you check out it, move to a more complex result object, and then incorporate in one step at the end.


21.3.4 Unknown sequence length

Sometimes you don’t also know exactly how long the intake sequence should run for. This is common when act simulations. Because that example, you could want to loop till you acquire three top in a row. Girlfriend can’t do that sort of iteration through the because that loop. Instead, you have the right to use a while loop. A if loop is less complicated than because that loop since it only has actually two components, a condition and a body:


while (condition) # body
A while loop is also much more general than a for loop, because you deserve to rewrite any type of for loop as a when loop, yet you can’t rewrite every while loop as a because that loop:


for (i in seq_along(x)) # body# equivalent toi 1while (i length(x)) # body ns i + 1
Here’s just how we can use a if loop to discover how numerous tries the takes to acquire three heads in a row:


flip function() sample(c("T", "H"), 1)flips 0nheads 0while (nheads 3) if (flip() == "H") nheads nheads + 1 else nheads 0 flips flips + 1flips#> <1> 3
I mention while loops just briefly, since I hardly ever before use them. They’re most often used because that simulation, which is outside the border of this book. However, the is great to recognize they exist so that you’re ready for difficulties where the number of iterations is not known in advance.


21.3.5 Exercises

Imagine you have actually a directory full of CSV records that you desire to check out in. You have their courses in a vector, records , and now want to read each one with read_csv(). Write the for loop the will fill them right into a solitary data frame.

What wake up if you use for (nm in names(x)) and also x has no names? What if just some of the elements are named? What if the names space not unique?

Write a role that prints the median of every numeric pillar in a data frame, in addition to its name. Because that example, show_mean(iris) would print:


show_mean(iris)#> Sepal.Length: 5.84#> Sepal.Width: 3.06#> Petal.Length: 3.76#> Petal.Width: 1.20
(Extra challenge: what function did I use to make sure that the numbers lined up nicely, even though the variable names had different lengths?)

What does this password do? how does it work?


trans list( disp = function(x) x * 0.0163871, am = function(x) factor(x, labels = c("auto", "manual")) )for (var in names(trans)) mtcars<> trans<>(mtcars<>)
The idea of pass a role to another duty is extremely powerful idea, and it’s among the behaviours that provides R a practical programming language. It might take friend a while come wrap your head about the idea, but it’s precious the investment. In the rest of the chapter, you will do it learn about and use the purrr package, i beg your pardon provides functions that eliminate the require for many usual for loops. The use family of attributes in base R (apply(), lapply(), tapply(), etc) deal with a comparable problem, yet purrr is an ext consistent and thus is much easier to learn.

The score of making use of purrr features instead of for loops is to permit you break usual list manipulation obstacles into elevation pieces:

How can you resolve the difficulty for a solitary element that the list? when you’ve resolved that problem, purrr takes care of generalising your systems to every aspect in the list.

If you’re solving a complex problem, how have the right to you break it down right into bite-sized piece that enable you to development one tiny step towards a solution? with purrr, you get lots of tiny pieces that you have the right to compose along with the pipe.

This structure renders it easier to solve new problems. It additionally makes it much easier to understand your solutions to old difficulties when friend re-read her old code.


21.4.1 Exercises

Read the documentation because that apply(). In the 2d case, what 2 for loops does it generalise?

Adapt col_summary() so that it only uses to numeric columns You might want to begin with one is_numeric() function that return a reasonable vector that has a TRUE matching to each numeric column.


21.5 The map functions

The sample of looping over a vector, law something to each element and saving the results is so usual that the purrr package gives a household of attributes to perform it for you. There is one duty for each form of output:

map() renders a list.map_lgl() makes a reasonable vector.map_int() renders an integer vector.map_dbl() renders a dual vector.map_chr() makes a character vector.

Each function takes a vector as input, applies a duty to each piece, and also then return a brand-new vector it is the same length (and has actually the exact same names) together the input. The type of the vector is figured out by the suffix come the map function.

Once you master these functions, you’ll discover it take away much much less time to fix iteration problems. Yet you should never feeling bad about using a because that loop rather of a map function. The map functions are a action up a tower of abstraction, and also it deserve to take a long time to get your head approximately how they work. The necessary thing is that you fix the trouble that you’re working on, not write the most concise and also elegant code (although that’s definitely something you want to strive towards!).

Some people will tell friend to avoid for loops since they room slow. They’re wrong! (Well at the very least they’re fairly out of date, together for loops i can not use been slow-moving for numerous years). The chief benefits of using features like map() is not speed, but clarity: they make her code easier to write and also to read.

We can use these functions to execute the exact same computations as the last because that loop. Those review functions returned doubles, therefore we have to use map_dbl():


map_dbl(df, mean)#> a b c d #> 0.2026 -0.2068 0.1275 -0.0917map_dbl(df, median)#> a b c d #> 0.237 -0.218 0.254 -0.133map_dbl(df, sd)#> a b c d #> 0.796 0.759 1.164 1.062
Compared to making use of a because that loop, emphasis is top top the operation being performed (i.e. Mean(), median(), sd()), not the accountancy required come loop end every element and also store the output. This is even more apparent if we use the pipe:


df %>% map_dbl(mean)#> a b c d #> 0.2026 -0.2068 0.1275 -0.0917df %>% map_dbl(median)#> a b c d #> 0.237 -0.218 0.254 -0.133df %>% map_dbl(sd)#> a b c d #> 0.796 0.759 1.164 1.062
There room a couple of differences between map_*() and col_summary():

All purrr attributes are applied in C. This provides them a little faster in ~ the price of readability.

The second argument, .f, the duty to apply, can be a formula, a character vector, or an essence vector. You will do it learn about those handy shortcuts in the following section.

map_*() provides … () to happen along added arguments to .f every time that called:


map_dbl(df, mean, trim = 0.5)#> a b c d #> 0.237 -0.218 0.254 -0.133
The map functions also preserve names:


z list(x = 1:3, y = 4:5)map_int(z, length)#> x y #> 3 2

21.5.1 Shortcuts

There room a couple of shortcuts that you can use v .f in stimulate to save a tiny typing. Imagine you want to right a linear model to each team in a dataset. The adhering to toy example splits the increase the mtcars dataset in to three pieces (one for each worth of cylinder) and fits the same linear model to every piece:


models mtcars %>% split(.$cyl) %>% map(function(df) lm(mpg ~ wt, data = df))
The syntax for developing an anonymous function in R is rather verbose for this reason purrr gives a practically shortcut: a one-sided formula.


models mtcars %>% split(.$cyl) %>% map(~lm(mpg ~ wt, data = .))
Here I’ve supplied . Together a pronoun: it refers to the existing list facet (in the same method that i referred to the present index in the for loop).

When you’re looking at numerous models, you could want to extract a review statistic like the (R^2). To do that we require to an initial run summary() and also then extract the component dubbed r.squared. We might do that utilizing the shorthand because that anonymous functions:


But extract named contents is a common operation, therefore purrr offers an even much shorter shortcut: you can use a string.


You can additionally use one integer come select facets by position:


x list(list(1, 2, 3), list(4, 5, 6), list(7, 8, 9))x %>% map_dbl(2)#> <1> 2 5 8

21.5.2 basic R

If you’re familiar with the use family of functions in basic R, you could have noticed some similarities with the purrr functions:

lapply() is basically identical to map(), except that map() is continuous with every the other features in purrr, and also you deserve to use the shortcuts for .f.

Base sapply() is a wrapper roughly lapply() that immediately simplifies the output. This is helpful for interactive work but is problematic in a role because you never know what kind of calculation you’ll get:


x1 list( c(0.27, 0.37, 0.57, 0.91, 0.20), c(0.90, 0.94, 0.66, 0.63, 0.06), c(0.21, 0.18, 0.69, 0.38, 0.77))x2 list( c(0.50, 0.72, 0.99, 0.38, 0.78), c(0.93, 0.21, 0.65, 0.13, 0.27), c(0.39, 0.01, 0.38, 0.87, 0.34))threshold function(x, cutoff = 0.8) x cutoff>x1 %>% sapply(threshold) %>% str()#> perform of 3#> $ : num 0.91#> $ : num <1:2> 0.9 0.94#> $ : num(0)x2 %>% sapply(threshold) %>% str()#> num <1:3> 0.99 0.93 0.87
vapply() is a safe different to sapply() because you supply an additional argument that defines the type. The only difficulty with vapply() is the it’s a most typing: vapply(df, is.numeric, logical(1)) is indistinguishable to map_lgl(df, is.numeric). One of benefit of vapply() end purrr’s map attributes is the it can likewise produce matrices — the map functions only ever produce vectors.

I focus on purrr attributes here due to the fact that they have much more consistent names and arguments, beneficial shortcuts, and in the future will provide easy parallelism and also progress bars.


21.5.3 Exercises

Write password that supplies one that the map features to:

Compute the average of every column in mtcars.Determine the type of each pillar in nycflights13::flights.Compute the number of unique values in each pillar of iris.Generate 10 arbitrarily normals for each of (mu = -10), (0), (10), and (100).

How have the right to you produce a solitary vector the for each column in a data frame indicates even if it is or not it’s a factor?

What happens once you use the map features on vectors the aren’t lists? What walk map(1:5, runif) do? Why?

What does map(-2:2, rnorm, n = 5) do? Why? What walk map_dbl(-2:2, rnorm, n = 5) do? Why?

Rewrite map(x, function(df) lm(mpg ~ wt, data = df)) to remove the anonymous function.


21.6 managing failure

When you usage the map features to repeat countless operations, the opportunities are much higher that among those operations will certainly fail. Once this happens, you’ll acquire an error message, and no output. This is annoying: why go one failure protect against you native accessing all the various other successes? exactly how do girlfriend ensure the one negative apple doesn’t ruin the whole barrel?

In this ar you’ll learn just how to transaction this situation with a brand-new function: safely(). Safely() is an adverb: it takes a duty (a verb) and also returns a modified version. In this case, the modified function will never throw an error. Instead, it constantly returns a list with two elements:

result is the initial result. If there to be an error, this will be NULL.

error is one error object. If the procedure was successful, this will certainly be NULL.

(You can be familiar with the try() function in basic R. The similar, but since it sometimes returns the original result and it sometimes returns an error thing it’s more challenging to work with.)

Let’s illustrate this v a straightforward example: log():


safe_log safely(log)str(safe_log(10))#> list of 2#> $ result: num 2.3#> $ error : NULLstr(safe_log("a"))#> perform of 2#> $ result: NULL#> $ error :List of 2#> ..$ message: chr "non-numeric argument to math function"#> ..$ call : language .f(...)#> ..- attr(*, "class")= chr <1:3> "simpleError" "error" "condition"
When the duty succeeds, the result element has the an outcome and the error element is NULL. As soon as the function fails, the an outcome element is NULL and the error element contains an error object.

safely() is design to occupational with map:


x list(1, 10, "a")y x %>% map(safely(log))str(y)#> perform of 3#> $ :List the 2#> ..$ result: num 0#> ..$ error : NULL#> $ :List of 2#> ..$ result: num 2.3#> ..$ error : NULL#> $ :List that 2#> ..$ result: NULL#> ..$ error :List the 2#> .. ..$ message: chr "non-numeric argument to math function"#> .. ..$ speak to : language .f(...)#> .. ..- attr(*, "class")= chr <1:3> "simpleError" "error" "condition"
This would certainly be much easier to work with if we had two lists: one of all the errors and one of all the output. That’s basic to gain with purrr::transpose():


y y %>% transpose()str(y)#> perform of 2#> $ result:List that 3#> ..$ : num 0#> ..$ : num 2.3#> ..$ : NULL#> $ error :List the 3#> ..$ : NULL#> ..$ : NULL#> ..$ :List the 2#> .. ..$ message: chr "non-numeric dispute to mathematical function"#> .. ..$ speak to : language .f(...)#> .. ..- attr(*, "class")= chr <1:3> "simpleError" "error" "condition"
It’s up to you just how to attend to the errors, however typically you’ll either look in ~ the values of x where y is one error, or work-related with the worths of y that space ok:


is_ok y$error %>% map_lgl(is_null)x#> <<1>>#> <1> "a"y$result %>% flatten_dbl()#> <1> 0.0 2.3
Purrr offers two other advantageous adverbs:

Like safely(), possibly() always succeeds. It’s less complicated than safely(), because you offer it a default value to return as soon as there is one error.


x list(1, 10, "a")x %>% map_dbl(possibly(log, NA_real_))#> <1> 0.0 2.3 NA
quietly() performs a similar duty to safely(), but instead of recording errors, it catches printed output, messages, and also warnings:


x list(1, -1)x %>% map(quietly(log)) %>% str()#> perform of 2#> $ :List the 4#> ..$ an outcome : num 0#> ..$ calculation : chr ""#> ..$ warnings: chr(0) #> ..$ messages: chr(0) #> $ :List the 4#> ..$ result : num NaN#> ..$ output : chr ""#> ..$ warnings: chr "NaNs produced"#> ..$ messages: chr(0)

21.7 Mapping over multiple arguments

So much we’ve mapped follow me a single input. However often you have actually multiple related inputs that you require iterate along in parallel. That’s the project of the map2() and also pmap() functions. Because that example, imagine you desire to simulate part random normals with various means. Friend know just how to execute that through map():


What if you additionally want to vary the typical deviation? One method to perform that would be come iterate end the indices and also index right into vectors of method and sds:


sigma list(1, 5, 10)seq_along(mu) %>% map(~rnorm(5, mu<<.>>, sigma<<.>>)) %>% str()#> perform of 3#> $ : num <1:5> 4.94 2.57 4.37 4.12 5.29#> $ : num <1:5> 11.72 5.32 11.46 10.24 12.22#> $ : num <1:5> 3.68 -6.12 22.24 -7.2 10.37
But the obfuscates the will of the code. Instead we can use map2() which iterates over two vectors in parallel:


map2(mu, sigma, rnorm, n = 5) %>% str()#> list of 3#> $ : num <1:5> 4.78 5.59 4.93 4.3 4.47#> $ : num <1:5> 10.85 10.57 6.02 8.82 15.93#> $ : num <1:5> -1.12 7.39 -7.5 -10.09 -2.7
map2() generates this collection of function calls:

*

Note that the debates that vary for each speak to come before the function; disagreements that space the very same for every contact come after.

Like map(), map2() is just a wrapper roughly a because that loop:


map2 function(x, y, f, ...) out vector("list", length(x)) because that (i in seq_along(x)) out<> f(x<>, y<>, ...) out
You could also imagine map3(), map4(), map5(), map6() etc, but that would gain tedious quickly. Instead, purrr offers pmap() which takes a perform of arguments. You might use that if you want to differ the mean, traditional deviation, and variety of samples:


n list(1, 3, 5)args1 list(n, mu, sigma)args1 %>% pmap(rnorm) %>% str()#> perform of 3#> $ : num 4.55#> $ : num <1:3> 13.4 18.8 13.2#> $ : num <1:5> 0.685 10.801 -11.671 21.363 -2.562
That watch like:

*

If girlfriend don’t name the aspects of list, pmap() will usage positional corresponding when call the function. It is a little fragile, and also makes the password harder come read, for this reason it’s much better to name the arguments:


args2 list(mean = mu, sd = sigma, n = n)args2 %>% pmap(rnorm) %>% str()
That generates longer, however safer, calls:

*

Since the arguments are every the exact same length, it provides sense to save them in a data frame:


params tribble( ~mean, ~sd, ~n, 5, 1, 1, 10, 5, 3, -3, 10, 5)params %>% pmap(rnorm)#> <<1>>#> <1> 4.68#> #> <<2>>#> <1> 23.44 12.85 7.28#> #> <<3>>#> <1> -5.34 -17.66 0.92 6.06 9.02
As soon as your code gets complicated, i think a data frame is a good approach because it ensures that each column has actually a name and is the same size as every the various other columns.


21.7.1 Invoking different functions

There’s one more step increase in intricacy - as well as varying the disagreements to the function you might likewise vary the function itself:


f c("runif", "rnorm", "rpois")param list( list(min = -1, max = 1), list(sd = 5), list(lambda = 10))
To handle this case, you have the right to use invoke_map():


*

The an initial argument is a perform of functions or personality vector of function names. The 2nd argument is a list of lists providing the arguments that vary for every function. The subsequent arguments are pass on come every function.

And again, you can use tribble() to make producing these corresponding pairs a little easier:


sim tribble( ~f, ~params, "runif", list(min = -1, max = 1), "rnorm", list(sd = 5), "rpois", list(lambda = 10))sim %>% mutate(sim = invoke_map(f, params, n = 10))

iris %>% keep(is.factor) %>% str()#> 'data.frame': 150 obs. Of 1 variable:#> $ Species: variable w/ 3 level "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...iris %>% discard(is.factor) %>% str()#> 'data.frame': 150 obs. That 4 variables:#> $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...#> $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...#> $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...#> $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
some() and also every() determine if the property is true for any kind of or for every one of the elements.


x list(1:5, letters, list(10))x %>% some(is_character)#> <1> TRUEx %>% every(is_vector)#> <1> TRUE
detect() find the an initial element whereby the predicate is true; detect_index() returns its position.


x sample(10)x#> <1> 8 7 5 6 9 2 10 1 3 4x %>% detect(~ . > 5)#> <1> 8x %>% detect_index(~ . > 5)#> <1> 1
head_while() and tail_while() take aspects from the begin or finish of a vector while a property is true:


x %>% head_while(~ . > 5)#> <1> 8 7x %>% tail_while(~ . > 5)#> integer(0)

21.9.2 Reduce and accumulate

Sometimes you have actually a facility list that you want to alleviate to a straightforward list by repeatedly applying a duty that reduces a pair come a singleton. This is helpful if you desire to apply a two-table dplyr verb to multiple tables. Because that example, you could have a list of data frames, and also you want to mitigate to a solitary data frame by authorized the facets together:


dfs list( period = tibble(name = "John", period = 30), sex = tibble(name = c("John", "Mary"), sex = c("M", "F")), trt = tibble(name = "Mary", treatment = "A"))dfs %>% reduce(full_join)#> Joining, by = "name"#> Joining, by = "name"#> # A tibble: 2 × 4#> name period sex treatment#> #> 1 man 30 M #> 2 mar NA F A
Or perhaps you have actually a perform of vectors, and want to discover the intersection:


vs list( c(1, 3, 5, 6, 10), c(1, 2, 3, 7, 8, 10), c(1, 2, 3, 4, 8, 9, 10))vs %>% reduce(intersect)#> <1> 1 3 10
The reduce role takes a “binary” role (i.e. a duty with two main inputs), and also applies it repetitively to a list till there is just a solitary element left.

See more: What Is The Solution Of N^2-49=0, Quadratic Equations

Accumulate is comparable but that keeps all the interim results. You can use it come implement a cumulative sum:


x sample(10)x#> <1> 6 9 8 5 2 4 7 1 10 3x %>% accumulate(`+`)#> <1> 6 15 23 28 30 34 41 42 52 55

21.9.3 Exercises

Implement your own version the every() making use of a for loop. Compare it v purrr::every(). What does purrr’s version perform that your variation doesn’t?

Create an magnified col_sum() that uses a summary duty to every numeric column in a data frame.

A feasible base R equivalent of col_sum() is:


col_sum3 function(df, f) is_num sapply(df, is.numeric) df_num df<, is_num> sapply(df_num, f)
But it has a number of bugs as depicted with the following inputs:


df tibble( x = 1:3, y = 3:1, z = c("a", "b", "c"))# OKcol_sum3(df, mean)# has actually problems: don't constantly return numeric vectorcol_sum3(df<1:2>, mean)col_sum3(df<1>, mean)col_sum3(df<0>, mean)
What causes the bugs?