Parallel computation in R
2026-06-03
Víctor Granda-García
Ecosystem Modelling Facility - CREAF
Ecosystem Modelling Facility
Ecosystem Modelling Facility - CREAF
Introduction
This document will explain the basic concepts of parallel computing in R, with code examples to illustrate them. The topics covered include:
- What is parallel computing?
- When can we use it?
- A little introduction to loops and maps in R (
for,lapply,map…) - Ways to use parallelization in your code (
parallel,furrr,mirai…) - How to check if is worth the hassle
What is parallel computing?
Figure 1: Modern CPU and cores
First of all we need to understand a little about CPU (Central Processing Unit) and cores. Modern computers (Figure 1) have multiple CPUs, and those can have one or multiple cores. Each core is responsible of running individual processes.
Think of a simple algebraic operation, adding two numbers (1 + 1). In a
nutshell, that operation is translated to machine code and a process is started
in one core to perform the operation (Figure 2).
Figure 2: One core is performing the ‘1 +1’ operation. This leaves the other cores available to concurrently start other procesess
So CPU cores can be used to run the the same process with different data in parallel to speed up long tasks. In theory, this can make things $1/n_{cores}$ faster, but in reality, other factors must be added (time consumed transferring data to each process, time consumed gathering results from different processes and join them, time consumed spawning new processes…) so the time gain highly depends on the type of operations, data types used…
Note
In fact, sometimes workflows are slower when parallelized, so we always need to check if we are really saving time and effort. See below for more information on this.
R is a single process software
R is designed to run in only one CPU process. This is due to the time when S
(R predecessor) and R were developed, where CPUs had mostly one core and
multitasking and parallel computation were still not widely available or
technologies were still undeveloped.
Given the R limitations explained before, parallel computing in R is not
available out-of-the-box. We will need to use extra packages/libraries and we
can only use it in specific cases and data types.
When can we use it?
You have been using parallel computing in R without knowing it. A lot of R
base functions are calls to methods written in languages that support
multitasking (C++, Rust…). For example, matrix operations and other linear
algebra functions (common operations when calculating regression coefficients
when using lm and other model functions) are calls to C++ methods that are
parallelized and use the CPU cores available in your system (Listing 1)
1observations <- matrix(rnorm(1e6 * 100), 1e6, 100)
2predictions <- rnorm(100)
3system.time(outcome <- drop(observations %*% predictions) + rnorm(1e6))
Listing 1: Time consumed by matrix operations. We can see that the user time is bigger than the elapsed time. This means that the task (matrix product) was parallelized consuming more CPU time (user), but less real time (elapsed)
user system elapsed
0.337 0.009 0.347
Note
Some other
Rpackages have implemented the methods we’ll explain in the following sections and offer arguments to the user to choose if and how parallelization must be done. For example,bootpackage offer parallelization options when bootstrapping model coefficients.
Working with embarrassingly parallel problems
Embarrassingly parallel problems are those where we can easily separate the problem into several parallel tasks 1. This kind of problems are very usual in scientific and statistics fields. Think of the example data analysis showed below (Figure 3):
Figure 3: Classic data analysis workflow
In this process, we need to (I) ingest the data, (II) processing it for cleaning/transforming/…, (III) modelling the data and (IV) finally visualize/store the results. Now imagine we have to repeat the same process for hundred or thousands of data files (i.e., remote sensing images, genomic analyses, historical and projections climatic analyses…). Instead of processing each task one after another (in a sequential way) we can divide the input (names of the files to read) in chunks and send each chunk to CPU processes that run in parallel, saving time and effort (Figure 4).
Figure 4: Same data analysis workflow as before but running in parallel, each process in a different CPU core
This kind of embarrasingly parallel tasks are the ones that beneficiate most of parallelization.
A little introduction to loops and maps in R (for, lapply, map…)
Loops
We talk before about embarrassingly parallel problems, repetitive tasks that
have little or not connection between each other more than the origin of the
inputs and can be easily separated into parallel tasks.
These tasks are usually the ones we think about when we talk about for loops.
One example can be bootstrapping model coefficients (Listing 2). In this
example, we are interested in the relationship between sepal length and Iris
species (example extracted from the doParallel package vignette):
1# libraries
2library(dplyr)
3
4# data needed
5n_repetitions <- 1e4
6res_coefs <- list()
7iris_data <- iris |>
8 dplyr::filter(Species != "setosa")
9# we measure the time for illustration purposes
10system.time({
11 for (index in 1:n_repetitions) {
12 sample_individuals <- sample(85, 85, replace = TRUE)
13 model_res <- glm(
14 iris_data[sample_individuals, "Species"] ~ iris_data[sample_individuals, "Petal.Length"],
15 family = binomial
16 )
17 res_coefs[[index]] <- coefficients(model_res)
18 }
19})
Listing 2: Boostrapping model coefficients in iris dataset with a for loop
user system elapsed
22.739 0.026 22.854
We can see the user time (CPU time) is roughly the same as the elapsed time
(real time), as we should expect from a sequential for loop.
lapply
The same problem can be solved with lapply, but we need to encapsulate the
logic of the for loop in a function (Listing 3):
1# create the function to process data from one state
2coef_function <- function(repetition) {
3 sample_individuals <- sample(85, 85, replace = TRUE)
4 model_res <- glm(
5 iris_data[sample_individuals, "Species"] ~ iris_data[sample_individuals, "Petal.Length"],
6 family = binomial
7 )
8 return(coefficients(model_res))
9}
10# number of repetitions
11n_repetitions <- 1e4
12# data
13iris_data <- iris |>
14 dplyr::filter(Species != "setosa")
15
16# and now the lapply (we monitorize the time again for illustration purposes)
17system.time(
18 res_coefs <- lapply(1:n_repetitions, coef_function)
19)
Listing 3: Boostrapping model coefficients in iris dataset with lapply
user system elapsed
13.690 0.007 13.731
As we see, the time is the same as with the for loop, something we would
expect.
map
If using tidyverse packages, instead of lapply
we will use map function in the purrr package (Listing 4):
1# libraries
2library(purrr)
3
4coef_function <- function(repetition) {
5 sample_individuals <- sample(85, 85, replace = TRUE)
6 model_res <- glm(
7 iris_data[sample_individuals, "Species"] ~ iris_data[sample_individuals, "Petal.Length"],
8 family = binomial
9 )
10 return(coefficients(model_res))
11}
12# number of repetitions
13n_repetitions <- 1e4
14# data
15iris_data <- iris |>
16 dplyr::filter(Species != "setosa")
17
18# and now the map (we monitorize the time again for illustration purposes)
19system.time({
20 res_coefs <- purrr::map(1:n_repetitions, .f = coef_function)
21})
Listing 4: Boostrapping model coefficients in iris dataset with map
user system elapsed
12.273 0.002 12.301
Again times are similar to the other workflows.
Ways to use parallelization in your code (parallel, furrr…)
If we can use loops, lapply or map, then we can parallelize without any
problem. In this section we will see the different options we can do it.
Preparations
Before we start, we need to know how many cores are available in our system.
This can be done with parallel::detectCores(). In the system this document
has been created the available cores are:
1library(parallel)
2parallel::detectCores()
Listing 5: Numer of cores available
[1] 24
Tip
In the following examples we will be using 4 cores, but if your system has less than that, please change it to a valid number.
foreach and doParallel
In a very similar way to a for loop we can use the foreach and doParallel
packages to build a loop that will run the bootstrap models in paralell
(Listing 6):
1# libraries
2library(parallel)
3library(foreach)
4library(doParallel)
5
6# data needed
7n_repetitions <- 1e4
8res_coefs <- list()
9iris_data <- iris |>
10 dplyr::filter(Species != "setosa")
11
12# set the number of cores to use, in this example 4
13doParallel::registerDoParallel(cores = 4)
14
15# foreach loop (for illustration purposes, we check the time used)
16system.time(
17 {res_coefs <- foreach::foreach(index = 1:n_repetitions) %dopar% {
18 sample_individuals <- sample(85, 85, replace = TRUE)
19 model_res <- glm(
20 iris_data[sample_individuals, "Species"] ~ iris_data[sample_individuals, "Petal.Length"],
21 family = binomial
22 )
23 coefficients(model_res)
24 }}
25)
Listing 6: Boostrapping model coefficients in iris dataset in parallel with a foreach
user system elapsed
13.369 0.341 4.077
As we can see, time has reduced almost four times when compared with processing the files sequentially. We are really close to the ideal $1/4$ reduction in time we should expect from using 4 cores, but not quite, as starting the extra R processes, sending the data and retrieving the results takes some time. With bigger datasets we can see the elapsed time increasing because of this communication overload.
mclapply
If we prefer the lapply syntax, we can use mclapply to run the same
expression concurrently. mclapply belongs to the parallel pacakge and works
exactly the same as lapply (Listing 7):
1# create the function to process data from one state
2coef_function <- function(repetition) {
3 sample_individuals <- sample(85, 85, replace = TRUE)
4 model_res <- glm(
5 iris_data[sample_individuals, "Species"] ~ iris_data[sample_individuals, "Petal.Length"],
6 family = binomial
7 )
8 return(coefficients(model_res))
9}
10# number of repetitions
11n_repetitions <- 1e4
12# data
13iris_data <- iris |>
14 dplyr::filter(Species != "setosa")
15
16# and now the lapply (we monitorize the time again for illustration purposes)
17system.time({
18 res_coefs <- mclapply(1:n_repetitions, coef_function, mc.cores = 4)
19})
Listing 7: Boostrapping model coefficients in iris dataset in parallel with a mclapply
user system elapsed
0.006 0.051 3.196
We see again the time reduction in time with mclapply.
future_map
furrr package offers parallelized versions of purrr::map family of
functions. We can use it to run the map example above in parallel
(Listing 8):
1# libraries
2library(future)
3library(furrr)
4
5coef_function <- function(repetition) {
6 sample_individuals <- sample(85, 85, replace = TRUE)
7 model_res <- glm(
8 iris_data[sample_individuals, "Species"] ~ iris_data[sample_individuals, "Petal.Length"],
9 family = binomial
10 )
11 return(coefficients(model_res))
12}
13# number of repetitions
14n_repetitions <- 1e4
15# data
16iris_data <- iris |>
17 dplyr::filter(Species != "setosa")
18
19# setting core options
20future::plan(future.callr::callr, workers = 4)
21
22# and now the map (we monitorize the time again for illustration purposes)
23system.time({
24 state_models <- furrr::future_map(1:n_repetitions, .f = coef_function)
25})
Listing 8: Boostrapping model coefficients in iris dataset in parallel with a furrr
user system elapsed
14.138 0.630 3.843
mirai and purrr
mirai R packages offers an asynchronous and parallel framework to execute code
in different processes. Is already integrated in purrr with the in_parallel
function. We can just
reuse the purrr example, but using in_parallel (Listing 9):
1# libraries
2library(purrr)
3library(mirai)
4
5# setting the cores (daemons in mirai)
6mirai::daemons(4)
7
8coef_function <- function(repetition) {
9 sample_individuals <- sample(85, 85, replace = TRUE)
10 model_res <- glm(
11 iris_data[sample_individuals, "Species"] ~ iris_data[sample_individuals, "Petal.Length"],
12 family = binomial
13 )
14 return(coefficients(model_res))
15}
16# number of repetitions
17n_repetitions <- 1e4
18# data
19iris_data <- iris |>
20 dplyr::filter(Species != "setosa")
21
22# and now the parallel map
23# (we monitorize the time again for illustration purposes)
24system.time({
25 res_coefs <- purrr::map(
26 1:n_repetitions,
27 .f = in_parallel(
28 \(rep) {coef_function(rep)},
29 coef_function = coef_function,
30 iris_data = iris_data
31 )
32 )
33})
Listing 9: Boostrapping model coefficients in iris dataset with in_parallel
user system elapsed
0.818 0.581 4.370
Tip
Calculating bootstrapped coefficients is a toy example for illustration purposes. There are R packages that can bootstrap in more efficient ways, including parallelization already included, like the
bootpackage.
How to check if parallelization is worthy the hassle
We have been using system.time to check the time our code takes to run. This
is one way of check if the benefit of parallelization outweighs the
inconvenience of setting it up. Other ways include using other benchmarking
libraries, like bench or rbenchmark. Their documentation explain everything
you need to know about timing code.
In any case, as a rule of thumb, parallelization is not worthy in the following scenarios:
-
We are parallelizing fast operations: In this scenario, when the operations we want to paralellize are fast (math operations, indexing or assigning) the overhead of sending data to cores, and retrieving and joining the results usually is greater than the time of performing the operation.
-
We are parallelizing processes that involve sending a lot of data to each spawned parallel process. Think for example in working with a global raster at 1km resolution. If we want to parallelize a process involving calculation with this kind of objects then the overhead of sending the data could be bigger than the process itself.
-
Related to the previous, when performing memory intensive processes. Take into account that each spawned parallel process is going to need to use memory as in the sequential process. Parallelizing in the same computing multiplies the memory needed for almost the number of cores used, so is easy to run out of memory if we are not careful.
-
Also known as perfectly parallel, delightfully parallel or pleasingly parallel problems, but those names don’t have that ring on it ↩︎