vignettes/a02_fitting_models.Rmd
a02_fitting_models.Rmd
# replace this with your own data frame
d = data.frame("Year"= 2002:2014,
"Takes" = c(0, 0, 0, 0, 0, 0, 0, 0, 1, 3, 0, 0, 0),
"expansionRate" = c(24, 22, 14, 32, 28, 25, 30, 7, 26, 21, 22, 23, 27),
"Sets" = c(391, 340, 330, 660, 470, 500, 330, 287, 756, 673, 532, 351, 486))
We’ll start by fitting a model with constant bycatch rate,
fit = fit_bycatch(Takes ~ 1, data=d, time="Year", effort="Sets", family="poisson",
time_varying = FALSE)
If divergent transition warnings or other issues indicating lack of convergence are a problem, we can try changing some of the control arguments, e.g.
fit = fit_bycatch(Takes ~ 1, data=d, time="Year", effort="Sets", family="poisson",
time_varying = FALSE, control=list(adapt_delta=0.99,max_treedepth=20))
We can also increase the iterations and number of chains from the defaults (1000 and 3),
fit = fit_bycatch(Takes ~ 1, data=d, time="Year", effort="Sets", family="poisson",
time_varying = FALSE, iter=3000, chains=4)
plot_fitted(fit, xlab="Year", ylab = "Bycatch (observed vessels)")
We can include points also,
plot_fitted(fit, xlab="Year", ylab = "Estimated bycatch for observed vessels", include_points = TRUE)
The loo
package in R provides a nice interface for
extracting leave one out information criterion (LOOIC) from
stanfit
objects. Like AIC, lower is better. Values of LOOIC
can be used to compare models with the same response but different
structure (covariates or not, time-varying bycatch or not, etc).
Additional information on LOOIC can be found at mc-stan.org,
Vehtari
et al. 2017, or the vignette for the loo
package.
loo::loo(fit$fitted_model)$estimates
## Warning: Some Pareto k diagnostic values are too high. See help('pareto-k-diagnostic') for details.
## Estimate SE
## elpd_loo -11.186424 5.948376
## p_loo 2.559857 2.146439
## looic 22.372849 11.896752
Using our example dataset above, we can also switch from the Poisson model to Negative Binomial.
fit = fit_bycatch(Takes ~ 1, data=d, time="Year", effort="Sets", family="nbinom2",
time_varying = FALSE)
The degree of overdispersion here is stored in the variable
nb2_phi
, which we can get with
phi = rstan::extract(fit$fitted_model)$nb2_phi
To accomodate excess zeros, we’ve included hurdle models that separately model the probability of zeros, and use a zero-truncated model for the positive component. We currently have 2 versions of the zero-inflated model included – a Poisson, and Negative Binomial. These can be fit using
fit = fit_bycatch(Takes ~ 1, data=d, time="Year", effort="Sets", family="poisson-hurdle",
time_varying = FALSE)
and
fit = fit_bycatch(Takes ~ 1, data=d, time="Year", effort="Sets", family="nbinom2-hurdle",time_varying = FALSE)
As above, we can construct the models to be zero-inflated or not. Three different families are currently included (“normal”, “gamma”, “lognormal”). These can be fit as follows,
fit_normal = fit_bycatch(Takes ~ 1,
data=d,
time="Year",
effort="Sets",
family="normal",
time_varying = FALSE)
fit_lognormal = fit_bycatch(Takes ~ 1,
data=d,
time="Year",
effort="Sets",
family="lognormal",
time_varying = FALSE)
fit_gamma = fit_bycatch(Takes ~ 1,
data=d,
time="Year",
effort="Sets",
family="gamma",
time_varying = FALSE)
And similarly to the discrete models, we can fit zero-inflated models by changing the models to the hurdle equivalents
fit_normal = fit_bycatch(Takes ~ 1,
data=d,
time="Year",
effort="Sets",
family="normal-hurdle",
time_varying = FALSE)
fit_lognormal = fit_bycatch(Takes ~ 1,
data=d,
time="Year",
effort="Sets",
family="lognormal-hurdle",
time_varying = FALSE)
fit_gamma = fit_bycatch(Takes ~ 1,
data=d,
time="Year",
effort="Sets",
family="gamma-hurdle",
time_varying = FALSE)
Following Martin et al. 2015 we can include fixed or continuous covariates.
For example, we could include a julian day, and a break point (representing a regulatory change for example) in the data. We could model the first variable as a continuous predictor and the second as a factor.
Using the formula interface makes it easy to include covariates (in general it’s a good idea to standardize these)
d$Day = (d$Day - mean(d$Day)) / sd(d$Day)
fit = fit_bycatch(Takes ~ Day + Reg, data=d, time="Year", effort="Sets", family="poisson",
time_varying = FALSE)
We can get the 3 covariate effects out with the following call:
betas = rstan::extract(fit$fitted_model)$beta
Note that ‘betas’ has 3 columns. These correspond to (1) the intercept, (2) continuous predictor, and (3) factor variable above. If we didn’t include the covariates, we’d still estimate beta[1] as the intercept.
To incorporate potential autocorrelation, we can fit a model with time-varying random effects. This is equivalent to a dynamic linear model with time varying intercept in a Poisson GLM.
fit = fit_bycatch(Takes ~ 1, data=d, time="Year",
effort="Sets", family="poisson",
time_varying = TRUE)
plot_fitted(fitted_model=fit, xlab="Year",
ylab = "Fleet-level bycatch", include_points = TRUE)
# replace this with your own data frame
d = data.frame("Year"= 2002:2014,
"Takes" = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
"expansionRate" = c(24, 22, 14, 32, 28, 25, 30, 7, 26, 21, 22, 23, 27),
"Sets" = c(391, 340, 330, 660, 470, 500, 330, 287, 756, 673, 532, 351, 486))
fit = fit_bycatch(Takes ~ 1, data=d, time="Year",
effort="Sets", family="poisson",
time_varying = FALSE)
plot_fitted(fitted_model=fit, xlab="Year",
ylab = "Fleet-level bycatch",
include_points = TRUE)