This vignette covers common problems that occur while using
glmmTMB
. The contents will expand with experience.
If your problem is not covered below, there’s a chance it has been
solved in the development version; try updating to the latest version of
glmmTMB
on GitHub.
This warning
(Model convergence problem; non-positive-definite Hessian matrix
)
states that at glmmTMB
’s maximum-likelihood estimate, the
curvature of the negative log-likelihood surface is inconsistent with
glmmTMB
really having found the best fit (minimum):
instead, the surface is downward-curving, or flat, in some
direction(s).
It will usually be accompanied by NA
values for the
standard errors, log-likelihood, AIC, and BIC, and deviance. When you
run summary()
on the resulting model, you’ll get the
warning In sqrt(diag(vcov)) : NaNs produced
.
These problems are most likely:
How do we diagnose the problem?
Consider this example:
= glmmTMB(count~spp + (1|site), zi=~spp, Salamanders, family=nbinom2) zinbm0
First, see if any of the estimated coefficients are extreme. If you’re using a non-identity link function (e.g. log, logit), then parameter values with \(|\beta|>10\) are suspect (for a logit link, this implies probabilities very close to 0 or 1; for a log link, this implies mean counts that are close to 0 or extremely large).
Inspecting the fixed-effect estimates for this model:
fixef(zinbm0)
##
## Conditional model:
## (Intercept) sppPR sppDM sppEC-A sppEC-L sppDES-L
## -0.5377 -0.6531 0.3303 -0.2001 0.6615 0.7993
## sppDF
## 0.3714
##
## Zero-inflation model:
## (Intercept) sppPR sppDM sppEC-A sppEC-L sppDES-L
## -17.825 17.852 -5.202 17.486 15.431 14.708
## sppDF
## 15.485
FIXME: don’t hard-code, use inline expressions instead (search for “zi1” below)
The zero-inflation intercept parameter is tiny (\(\approx -17\)): since the parameters are
estimated on the logit scale, we back-transform with
plogis(-17)
to see the at the zero-inflation probability
for the baseline level is about \(4 \times
10^{-8}\))). Many of the other ZI parameters are very large,
compensating for the intercept: the estimated zero-inflation
probabilities for all species are
<- fixef(zinbm0)$zi
ff round(plogis(c(sppGP=unname(ff[1]),ff[-1]+ff[1])),3)
## sppGP sppPR sppDM sppEC-A sppEC-L sppDES-L sppDF
## 0.000 0.507 0.000 0.416 0.084 0.042 0.088
Since the baseline probability is already effectively zero, making the intercept parameter larger or smaller will have very little effect - the likelihood is flat, which leads to the non-positive-definite warning.
Now that we suspect the problem is in the zero-inflation component, we can try to come up with ways of simplifying the model: for example, we could use a model that compared the first species (“GP”) to the rest:
<- transform(Salamanders, GP=as.numeric(spp=="GP"))
Salamanders = update(zinbm0, ziformula=~GP) zinbm0_A
This fits without a warning, although the GP zero-inflation parameter is still extreme:
fixef(zinbm0_A)[["zi"]]
## (Intercept) GP
## -2.890515 -15.414542
Another possibility would be to fit the variation among species in the zero-inflation parameter as a random effect, rather than a fixed effect: this is slightly more parsimonious. This again fits without an error, although both the average level of zero-inflation and the among-species variation are estimated as very small:
= update(zinbm0, ziformula=~(1|spp))
zinbm0_B fixef(zinbm0_B)[["zi"]]
## (Intercept)
## -16.54352
VarCorr(zinbm0_B)
##
## Conditional model:
## Groups Name Std.Dev.
## site (Intercept) 1.3894
##
## Zero-inflation model:
## Groups Name Std.Dev.
## spp (Intercept) 0.0078215
The original analysis considered variation in zero-inflation by site status (mined or not mined) rather than by species - this simpler model only tries to estimate two parameters (mined + difference between mined and no-mining) rather than 7 (one per species) for the zero-inflation model.
= glmmTMB(count~spp + (1|site), zi=~mined, Salamanders, family=nbinom2)
zinbm1 fixef(zinbm1)[["zi"]]
## (Intercept) minedno
## 0.3787986 -17.5118413
This again fits without a warning, but we see that the zero-inflation
is effectively zero in the unmined (“minedno”) condition
(plogis(0.38-17.5)
is approximately \(4 \times 10^{-8}\)). We can estimate the
confidence interval, but it takes some extra work: the default Wald
standard errors and confidence intervals are useless in this case.
## at present we need to specify the parameter by number; for
## extreme cases need to specify the parameter range
## (not sure why the upper bound needs to be so high ... ?)
= confint(zinbm1,method="uniroot",parm=9, parm.range=c(-20,20))
cc print(cc)
## 2.5 % 97.5 % Estimate
## zi~minedno NA -2.083725 -17.51184
The lower CI is not defined; the upper CI is -2.08, i.e. we can state
that the zero-inflation probability is less than
plogis(-2.08)
= 0.11.
More broadly, general inspection of the data (e.g., plotting the response against potential covariates) should help to diagnose overly complex models.
In some cases, scaling predictor variables may help. For example, in
this example from @phisanti, the results
of glm
and glmmTMB
applied to a scaled version
of the data set agree, while glmmTMB
applied to the raw
data set gives a non-positive-definite Hessian warning. (FIXME:
This is no longer true now that we try harder to compute an accurate
Hessian … we need another example …)
## data taken from gamlss.data:plasma, originally
## http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/plasma.html
gt_load("vignette_data/plasma.rda")
## [1] TRUE
.1 <- glm(calories ~ fat*fiber, family = Gamma(link = "log"), data = plasma)
m4.2 <- glmmTMB(calories ~ fat*fiber, family = Gamma(link = "log"), data = plasma)
m4<- transform(plasma,fat=scale(fat,center=FALSE),fiber=scale(fiber,center=FALSE))
ps .3 <- update(m4.2, data=ps)
m4## scaling factor for back-transforming standard deviations
<- c(1,
ss <- 1/attr(ps$fat,"scaled:scale"),
fatsc <- 1/attr(ps$fiber,"scaled:scale"),
fibsc *fibsc)
fatsc## combine SEs, suppressing the warning from the unscaled model
<- cbind(glm=sqrt(diag(vcov(m4.1))),
s_vals glmmTMB_unsc=suppressWarnings(sqrt(diag(vcov(m4.2)$cond))),
glmmTMB_sc=sqrt(diag(vcov(m4.3)$cond))*ss)
print(s_vals,digits=3)
## glm glmmTMB_unsc glmmTMB_sc
## (Intercept) 5.11e-02 5.21e-02 5.21e-02
## fat 6.69e-04 6.77e-04 6.77e-04
## fiber 3.70e-03 3.79e-03 3.79e-03
## fat:fiber 4.48e-05 4.54e-05 4.54e-05
Here is another example (from Samantha Sherman):
<- gt_load("vignette_data/sherman.rda") L
The first model gives the warning: “non-integer counts in an nbinom1 model” (indicating that we probably should use a different response distribution, or round the values if that seems appropriate).
summary(mod1)
## Family: nbinom1 ( log )
## Formula: taeCPUE ~ Avg.Relief + (1 | site.name/reef.name)
## Zero inflation: ~1
## Data: dd
##
## AIC BIC logLik deviance df.resid
## NA NA NA NA 3401
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev.
## reef.name:site.name (Intercept) 0.168 0.4098
## site.name (Intercept) 3.544 1.8825
## Number of obs: 3407, groups: reef.name:site.name, 70; site.name, 39
##
## Dispersion parameter for nbinom1 family (): 1.08e-09
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.61496 0.42395 -10.89 < 2e-16 ***
## Avg.Relief 0.16745 0.05896 2.84 0.00451 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Zero-inflation model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -19.60 16.81 -1.166 0.244
We can immediately see that the dispersion is very small and that the zero-inflation parameter is strongly negative.
Running diagnostics on the model, these are the only problems reported.
<- diagnose(mod1) d1
## Unusually large coefficients (|x|>10):
##
## zi~(Intercept) d~(Intercept)
## -19.59956 -20.64922
##
## Large negative coefficients in zi (log-odds of zero-inflation),
## dispersion, or random effects (log-standard deviations) suggest
## unnecessary components (converging to zero on the constrained scale);
## large negative and/or positive components in binomial or Poisson
## conditional parameters suggest (quasi-)complete separation. Large
## values of nbinom2 dispersion suggest that you should use a Poisson
## model instead.
##
##
## Unusually large Z-statistics (|x|>5):
##
## (Intercept) d~(Intercept)
## -10.88569 -142.51767
##
## Large Z-statistics (estimate/std err) suggest a *possible* failure of
## the Wald approximation - often also associated with parameters that are
## at or near the edge of their range (e.g. random-effects standard
## deviations approaching 0). (Alternately, they may simply represent
## very well-estimated parameters; intercepts of non-centered models may
## fall in this category.) While the Wald p-values and standard errors
## listed in summary() may be unreliable, profile confidence intervals
## (see ?confint.glmmTMB) and likelihood ratio test p-values derived by
## comparing models (e.g. ?drop1) are probably still OK. (Note that the
## LRT is conservative when the null value is on the boundary, e.g. a
## variance or zero-inflation value of 0 (Self and Liang 1987; Stram and
## Lee 1994; Goldman and Whelan 2000); in simple cases these p-values are
## approximately twice as large as they should be.)
##
##
## Non-positive definite (NPD) Hessian
##
## The Hessian matrix represents the curvature of the log-likelihood
## surface at the maximum likelihood estimate (MLE) of the parameters (its
## inverse is the estimate of the parameter covariance matrix). A
## non-positive-definite Hessian means that the likelihood surface is
## approximately flat (or upward-curving) at the MLE, which means the
## model is overfitted or poorly posed in some way. NPD Hessians are often
## associated with extreme parameter estimates.
##
##
## recomputing Hessian via Richardson extrapolation. If this is too slow, consider setting check_hessian = FALSE
##
## The next set of diagnostics attempts to determine which elements of the
## Hessian are causing the non-positive-definiteness. Components with
## very small eigenvalues represent 'flat' directions, i.e., combinations
## of parameters for which the data may contain very little information.
## So-called 'bad elements' represent the dominant components (absolute
## values >0.01) of the eigenvectors corresponding to the 'flat'
## directions
##
##
## maximum Hessian eigenvalue = 323
## Hessian eigenvalue 6 = 2.57e-07 (relative val = 7.95e-10)
## bad elements: zi~(Intercept)
Let’s try dropping the zero-inflation term:
<- update(mod1, ziformula=~0) mod2
We also get a “false convergence (8)” warning; see below.
FIXME: this anticipates/duplicates some of the discussion near the end.
The summary()
and diagnose()
functions
reveal only the large, negative dispersion parameter:
summary(mod2)
## Family: nbinom1 ( log )
## Formula: taeCPUE ~ Avg.Relief + (1 | site.name/reef.name)
## Data: dd
##
## AIC BIC logLik deviance df.resid
## NA NA NA NA 3402
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev.
## reef.name:site.name (Intercept) 0.168 0.4099
## site.name (Intercept) 3.544 1.8826
## Number of obs: 3407, groups: reef.name:site.name, 70; site.name, 39
##
## Dispersion parameter for nbinom1 family (): 2.96e-08
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.61507 0.43803 -10.536 < 2e-16 ***
## Avg.Relief 0.16745 0.05898 2.839 0.00452 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Diagnose:
diagnose(mod2)
## Unusually large coefficients (|x|>10):
##
## d~(Intercept)
## -17.33394
##
## Large negative coefficients in zi (log-odds of zero-inflation),
## dispersion, or random effects (log-standard deviations) suggest
## unnecessary components (converging to zero on the constrained scale);
## large negative and/or positive components in binomial or Poisson
## conditional parameters suggest (quasi-)complete separation. Large
## values of nbinom2 dispersion suggest that you should use a Poisson
## model instead.
##
##
## Unusually large Z-statistics (|x|>5):
##
## (Intercept)
## -10.53597
##
## Large Z-statistics (estimate/std err) suggest a *possible* failure of
## the Wald approximation - often also associated with parameters that are
## at or near the edge of their range (e.g. random-effects standard
## deviations approaching 0). (Alternately, they may simply represent
## very well-estimated parameters; intercepts of non-centered models may
## fall in this category.) While the Wald p-values and standard errors
## listed in summary() may be unreliable, profile confidence intervals
## (see ?confint.glmmTMB) and likelihood ratio test p-values derived by
## comparing models (e.g. ?drop1) are probably still OK. (Note that the
## LRT is conservative when the null value is on the boundary, e.g. a
## variance or zero-inflation value of 0 (Self and Liang 1987; Stram and
## Lee 1994; Goldman and Whelan 2000); in simple cases these p-values are
## approximately twice as large as they should be.)
##
##
## Non-positive definite (NPD) Hessian
##
## The Hessian matrix represents the curvature of the log-likelihood
## surface at the maximum likelihood estimate (MLE) of the parameters (its
## inverse is the estimate of the parameter covariance matrix). A
## non-positive-definite Hessian means that the likelihood surface is
## approximately flat (or upward-curving) at the MLE, which means the
## model is overfitted or poorly posed in some way. NPD Hessians are often
## associated with extreme parameter estimates.
##
##
## parameters with non-finite standard deviations:
## d~(Intercept)
##
##
##
## recomputing Hessian via Richardson extrapolation. If this is too slow, consider setting check_hessian = FALSE
##
## The next set of diagnostics attempts to determine which elements of the
## Hessian are causing the non-positive-definiteness. Components with
## very small eigenvalues represent 'flat' directions, i.e., combinations
## of parameters for which the data may contain very little information.
## So-called 'bad elements' represent the dominant components (absolute
## values >0.01) of the eigenvectors corresponding to the 'flat'
## directions
##
##
## maximum Hessian eigenvalue = 312
## Hessian eigenvalue 5 = 0.00131 (relative val = 4.19e-06)
## bad elements: d~(Intercept)
The “false convergence” warning comes from the nlminb()
optimizer, and is difficult
to interpret and resolve. The documentation says that the cause of
this warning is that:
the gradient ∇f(x) may be computed incorrectly, the other stopping tolerances may be too tight, or either f or ∇f may be discontinuous near the current iterate x.
The only practical options we have for satisfying ourselves that a
false convergence warning is really a false positive are the standard
brute-force solutions of (1) making sure the gradients are small and the
Hessian is positive definite (these are already checked internally); (2)
trying different starting conditions, including re-starting at the
current optimum; and (3) trying a different optimizer. We’ll try option
3 and refit with the BFGS option from optim()
:
<- update(mod2,
mod2_optim control=glmmTMBControl(optimizer=optim,
optArgs=list(method="BFGS")))
## Warning in glmmTMB(formula = taeCPUE ~ Avg.Relief + (1 | site.name/reef.name),
## : non-integer counts in a nbinom1 model
BFGS doesn’t give us any warnings. Comparing the parameter estimates:
<- cbind(nlminb=unlist(fixef(mod2)),
(parcomp optim=unlist(fixef(mod2_optim))))
## nlminb optim
## cond.(Intercept) -4.6150738 -4.6152081
## cond.Avg.Relief 0.1674507 0.1674755
## disp.(Intercept) -17.3339441 -11.4800820
The conditional-model parameters are practically identical. The
dispersion parameters look quite different (-17.3 vs. -11.5),
but if we back-transform from the log scale (via exp()
) we
can see that they are both extremely small (\(2.96\times 10^{-8}\) vs. \(1.03\times 10^{-5}\)).
Simplify the model by switching from NB1 to Poisson:
<- update(mod2, family=poisson) mod3
summary(mod3)
## Family: poisson ( log )
## Formula: taeCPUE ~ Avg.Relief + (1 | site.name/reef.name)
## Data: dd
##
## AIC BIC logLik deviance df.resid
## 1562.5 1587.0 -777.2 1554.5 3403
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev.
## reef.name:site.name (Intercept) 0.168 0.4099
## site.name (Intercept) 3.544 1.8826
## Number of obs: 3407, groups: reef.name:site.name, 70; site.name, 39
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.61507 0.43804 -10.536 < 2e-16 ***
## Avg.Relief 0.16745 0.05898 2.839 0.00452 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
diagnose(mod3)
## Unusually large Z-statistics (|x|>5):
##
## (Intercept)
## -10.53575
##
## Large Z-statistics (estimate/std err) suggest a *possible* failure of
## the Wald approximation - often also associated with parameters that are
## at or near the edge of their range (e.g. random-effects standard
## deviations approaching 0). (Alternately, they may simply represent
## very well-estimated parameters; intercepts of non-centered models may
## fall in this category.) While the Wald p-values and standard errors
## listed in summary() may be unreliable, profile confidence intervals
## (see ?confint.glmmTMB) and likelihood ratio test p-values derived by
## comparing models (e.g. ?drop1) are probably still OK. (Note that the
## LRT is conservative when the null value is on the boundary, e.g. a
## variance or zero-inflation value of 0 (Self and Liang 1987; Stram and
## Lee 1994; Goldman and Whelan 2000); in simple cases these p-values are
## approximately twice as large as they should be.)
You can also check directly whether the Hessian (curvature) of the
model is OK by examining the pdHess
(“positive-definite
Hessian”) component of the sdr
(“standard deviation
report”) component of the model:
$sdr$pdHess mod3
## [1] TRUE
In general models with non-positive definite Hessian matrices should be excluded from further consideration.
<- glmmTMB(count~spp + mined + (1|site), zi=~spp + mined, Salamanders, family=genpois) m1
diagnose(m1)
## Unusually large coefficients (|x|>10):
##
## zi~sppPR zi~sppDF zi~minedno
## -35.20726 -24.18081 -20.16418
##
## Large negative coefficients in zi (log-odds of zero-inflation),
## dispersion, or random effects (log-standard deviations) suggest
## unnecessary components (converging to zero on the constrained scale);
## large negative and/or positive components in binomial or Poisson
## conditional parameters suggest (quasi-)complete separation. Large
## values of nbinom2 dispersion suggest that you should use a Poisson
## model instead.
In this example, the fixed-effect covariance matrix is
NaN
. It may have to do with the generalized Poisson
(genpois
) distribution, which is known to have convergence
problems; luckily, the negative binomial (nbinom1
and
nbinom2
) and/or Conway-Maxwell Poisson
(compois
) are good alternatives.
Models with convergence problems should be excluded from further consideration, in general.
In some cases, extreme eigenvalues may be caused by having predictor variables that are on very different scales: try rescaling, and centering, continuous predictors in the model.
Warning in nlminb(start = par, objective = fn, gradient = gr) : NA/NaN function evaluation
This warning occurs when the optimizer visits a region of parameter space that is invalid. It is not a problem as long as the optimizer has left that region of parameter space upon convergence, which is indicated by an absence of the model convergence warnings described above.
The following warnings indicate possibly-transient numerical problems with the fit, and can be treated in the same way (i.e. ignored if there are no errors or convergence warnings about the final fitted model).
Cholmod warning ‘matrix not positive definite’
In older versions of R (< 3.6.0):
Warning in f(par, order = order, …) : value out of range in ‘lgamma’
This warning:
false convergence: the gradient ∇f(x) may be computed incorrectly, the other stopping tolerances may be too tight, or either f or ∇f may be discontinuous near the current iterate x
comes from the nlminb
optimizer used by default in
glmmTMB
. It’s usually hard to diagnose the source of this
warning (this Stack
Overflow answer explains a bit more about what it means). Reasonable
methods for making sure your model is OK are:
control=glmmTMBControl(optimizer=optim, optArgs=list(method="BFGS"))
and see if the results are sufficiently similar to the original fit.
= expand.grid(y=-1:1, rep=1:10)
dat1 = glmmTMB(y~1, dat1, family=nbinom2) m1
## Error in eval(family$initialize): negative values not allowed for the negative binomial family
FIXME: this is no longer a “gradient evaluation” error …
The error occurs here because the negative binomial distribution is inappropriate for data with negative values.
If you see this error, check that the response variable meets the assumptions of the specified distribution.
Error in
nlminb(start = par, objective = fn, gradient = gr)
: gradient function must return a numeric vector of length x
Error in
optimHess(par.fixed, obj$fn, obj$gr)
: gradient in optim evaluated to length x
Try rescaling predictor variables. Try a simpler model and build up. (If you have a simple reproducible example of these errors, please post them to the issues list.)