Iterating over the lines of a data.frame with purrr
I sometimes have a function which takes some parameters and returns a data.frame as a result. Then I have a data.frame where each row of it is a set of parameters. So I like to apply the function to each row of the parameter-data.frame and rbind the resulting data.frames.
There are several ways to do it. Let’s have a look:
The function …
So let’s build a simple function we can use
So this function takes three arguments and returns a data.frame
. The length of the data.frame depends on the last parameter.
… and its parameters
So now we have several tuples of paramters. Each tuple is a row of our parameter-data.frame:
So now we want to apply our function three times, one time for each row of the data.frame parameters
.
Iterating with …
There are several ways to interate.
… a for-loop
The most common way in programming is a for-loop:
That’s very ugly: You have to initialize the result-data.frame and it’s slow. Whenever you want to use a for-loop in R step back and think about using something else.
… lapply()
Instead of for-loops you should use apply
or one of its derivates. But apply
works with lists. data.frames are lists but column-wise ones.
So we need to split the data.frame parameters into a list rowwise using split
. Then we can apply my_function
. Then we use do.call(rbind, x)
do merge the results into one data.frame.
That’s a lot more R-like. But the winner is:
… pmap_dfr() out of the purrr-package
The most elegant way I know of is purr’s pmap_dfr
pmap_dfr
respects the column-names and parameter-names of the function. So you can mix them in the parameter-data.frame:
Originally published at http://github.com.