A new, experimental feature of glmmTMB
is the ability to
parallelize the optimization process. This vignette shows an example and
timing of a simple model fit with and without parallelizing across
threads.
If your OS supports OpenMP parallelization and R was installed using
OpenMP, glmmTMB
will automatically pick up the OpenMP flags
from R’s Makevars
and compile the C++ model with OpenMP
support. If the flag is not available, then the model will be compiled
with serial optimization only.
Load packages:
library(glmmTMB)
set.seed(1)
<- min(parallel::detectCores(),5) nt
Simulate a dataset with large N
:
<- 3e5
N <- rnorm(N, 1, 2)
xdata <- 0.3 + 0.4*xdata + rnorm(N, 0, 0.25) ydata
First, we fit the model serially. We can pass the number of
parallelizing process we want using the parallel
parameter
in glmmTMBcontrol
:
system.time(
<- glmmTMB(formula = ydata ~ 1 + xdata,
model1 control = glmmTMBControl(parallel = 1))
)
## Warning in glmmTMB(formula = ydata ~ 1 + xdata, control =
## glmmTMBControl(parallel = 1)): use of the 'data' argument is recommended
## user system elapsed
## 2.588 0.319 2.909
Now, we fit the same model using five threads (or as many as possible - 3 in this case):
system.time(
<- glmmTMB(formula = ydata ~ 1 + xdata,
model2 control = glmmTMBControl(parallel = nt))
)
## Warning in glmmTMB(formula = ydata ~ 1 + xdata, control =
## glmmTMBControl(parallel = nt)): use of the 'data' argument is recommended
## user system elapsed
## 2.428 0.312 2.741
The speed-up is definitely more visible on models with a much larger number of observations, or in models with random effects.
Here’s an example where we have an IID Gaussian random effect. We first simulate the data with 200 groups (our random effect):
<- rnorm(N, 1, 2)
xdata <- 200
groups <- data.frame(obs = 1:N)
data_use <- within(data_use,
data_use
{
<- rep(seq(groups), times = nrow(data_use) / groups)
group_var <- rnorm(groups, 0, 0.1)[group_var]
group_intercept <- xdata
xdata <- 0.3 + group_intercept + 0.5*xdata + rnorm(N, 0, 0.25)
ydata })
We fit the random effect model, first with a single thread:
<- system.time(
(t_serial <- glmmTMB(formula = ydata ~ 1 + xdata + (1 | group_var), data = data_use, control = glmmTMBControl(parallel = 1))
model3
) )
## user system elapsed
## 15.481 1.356 16.844
Now we fit the same model, but using 3 threads. The speed-up is more noticeable with this model.
<- system.time(
(t_parallel update(model3, control = glmmTMBControl(parallel = nt))
) )
## user system elapsed
## 15.100 1.299 16.408
From Writing R Extensions:
Apple builds of clang on macOS currently have no OpenMP support, but CRAN binary packages are built with a clang-based toolchain which supports OpenMP. https://www.openmp.org/resources/openmp-compilers-tools/ gives some idea of what compilers support what versions.
The performance of OpenMP varies substantially between platforms. The Windows implementation has substantial overheads, so is only beneficial if quite substantial tasks are run in parallel. Also, on Windows new threads are started with the default FPU control word, so computations done on OpenMP threads will not make use of extended-precision arithmetic which is the default for the main process. ## System information
This report was built using 3 parallel threads (on a machine with a total of 3 cores)
print(sessionInfo(), locale=FALSE)
## R version 4.3.1 (2023-06-16)
## Platform: x86_64-apple-darwin20 (64-bit)
## Running under: macOS Monterey 12.7
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
##
## attached base packages:
## [1] grid stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] png_0.1-8 ggplot2_3.4.4 lattice_0.21-8 reshape2_1.4.4
## [5] coda_0.19-4 mvabund_4.2.1 TMB_1.9.6 MASS_7.3-60
## [9] glmmTMB_1.1.8-9000 knitr_1.44
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.2.0 dplyr_1.1.3
## [3] farver_2.1.1 R.utils_2.12.2
## [5] fastmap_1.1.1 TH.data_1.1-2
## [7] digest_0.6.33 estimability_1.4.1
## [9] lifecycle_1.0.3 survival_3.5-5
## [11] statmod_1.5.0 processx_3.8.2
## [13] magrittr_2.0.3 compiler_4.3.1
## [15] rlang_1.1.1 sass_0.4.7
## [17] tools_4.3.1 utf8_1.2.3
## [19] yaml_2.3.7 prettyunits_1.2.0
## [21] labeling_0.4.3 pkgbuild_1.4.2
## [23] plyr_1.8.9 multcomp_1.4-25
## [25] R.cache_0.16.0 withr_2.5.1
## [27] numDeriv_2016.8-1.1 desc_1.4.2
## [29] R.oo_1.25.0 stats4_4.3.1
## [31] fansi_1.0.5 pkgdown.extras_0.0.0-9003
## [33] xtable_1.8-4 colorspace_2.1-0
## [35] emmeans_1.8.8 scales_1.2.1
## [37] bbmle_1.0.25 cli_3.6.1
## [39] mvtnorm_1.2-3 rmarkdown_2.25
## [41] crayon_1.5.2 generics_0.1.3
## [43] bdsmatrix_1.3-6 minqa_1.2.6
## [45] cachem_1.0.8 stringr_1.5.0
## [47] splines_4.3.1 parallel_4.3.1
## [49] vctrs_0.6.4 boot_1.3-28.1
## [51] Matrix_1.6-1.1 sandwich_3.0-2
## [53] jsonlite_1.8.7 callr_3.7.3
## [55] tweedie_2.3.5 jquerylib_0.1.4
## [57] glue_1.6.2 nloptr_2.0.3
## [59] pkgdown_2.0.7 codetools_0.2-19
## [61] ps_1.7.5 stringi_1.7.12
## [63] gtable_0.3.4 lme4_1.1-34
## [65] munsell_0.5.0 tibble_3.2.1
## [67] pillar_1.9.0 htmltools_0.5.6.1
## [69] R6_2.5.1 rprojroot_2.0.3
## [71] evaluate_0.22 R.methodsS3_1.8.2
## [73] memoise_2.0.1 bslib_0.5.1
## [75] Rcpp_1.0.11 R.rsp_0.45.0
## [77] nlme_3.1-162 mgcv_1.8-42
## [79] xfun_0.40 fs_1.6.3
## [81] zoo_1.8-12 pkgconfig_2.0.3