Advertisement Β· 728 Γ— 90
#
Hashtag
#mgcv
Advertisement Β· 728 Γ— 90

πŸŽ‰ It's livestream day πŸŽ‰

Join me at 1700 (CET; 1600 UCT) today for two hours of GAM goodness 🀀

πŸ“½οΈ Youtube: youtube.com/live/A9U8e1K...
Hit the Notify me πŸ”” to get a reminder when I go live

#RStats #mgcv #statistics #GAMs #DataScience πŸ§ͺ

39 21 2 1
Preview
What's new in the world of Generalized Additive Models YouTube video by Bottom of the Heap

🚨 GAMs have moved onβ€”so it’s time for an update.

On March 3, 2026 (17:00–19:00 CET) I’ll be livestreaming an updated introduction to Generalized Additive Models in R

πŸ“Ί YouTube livestream link: youtube.com/live/A9U8e1K...

#RStats #mgcv #GAMs #gratia #statistics πŸ§ͺ

107 47 4 1
Post image

Registrations are now closed for the upcoming course on GAMs in R with @gsimpson.bsky.social

Thanks everyone and see you in December!

www.physalia-courses.org/courses-work...

#GAMs #Rstats #gratia #mgcv

1 0 0 0
Results of model fitting to the average daily fat content data from @Henderson1990-bd. a) observed average daily fat content (points) and estimated lactation curves from Wood's [-@Wood1967-re] model, a Tweedie GLM, and a Tweedie GAM (lines) with associated 95% confidence (Wood's model) or 95% credible intervals (GLM and GAM). Response residuals for Wood's model (b), Tweedie GLM (c), and Tweedie GAM (d), plus scatter plot smoothers (lines) and 95% credible intervals (shaded ribbons).

The fitted lactation curves are like an inverted U, with an extended longer tail to the right (later in lactation). The GAM curve fits the data well, but the fitted curves from Wood's model and the GLM equivalent do not provide good fits to the data, and over predict the amount of fat produced at the peak of lactation, and only grossly capture the decline in fat production later in lactation. The remaining panels show the raw response residuals for the three models, drawing attention to the poor fit; for Wood's model and the GLM there is significant pattern in the residuals, while for the GAM no residual pattern is observed.

Results of model fitting to the average daily fat content data from @Henderson1990-bd. a) observed average daily fat content (points) and estimated lactation curves from Wood's [-@Wood1967-re] model, a Tweedie GLM, and a Tweedie GAM (lines) with associated 95% confidence (Wood's model) or 95% credible intervals (GLM and GAM). Response residuals for Wood's model (b), Tweedie GLM (c), and Tweedie GAM (d), plus scatter plot smoothers (lines) and 95% credible intervals (shaded ribbons). The fitted lactation curves are like an inverted U, with an extended longer tail to the right (later in lactation). The GAM curve fits the data well, but the fitted curves from Wood's model and the GLM equivalent do not provide good fits to the data, and over predict the amount of fat produced at the peak of lactation, and only grossly capture the decline in fat production later in lactation. The remaining panels show the raw response residuals for the three models, drawing attention to the poor fit; for Wood's model and the GLM there is significant pattern in the residuals, while for the GAM no residual pattern is observed.

Quantities of interest derived from Wood's model, a Tweedie GLM, and a Tweedie GAM fitted to the lactation data example: a) the estimated week of peak average daily fat content, b) the estimated average daily fat content at the peak, and c) the rate of change (first derivative) of the lactation curve estimated at a point that is midway between the peak fat content and the end of lacation. The points are the estimated values and the lines are a 95% uncertainty interval. The uncertainty interval is based on the 0.025 and 0.975 percentiles of the bootstrap distribution of model coefficient estimates (Wood's model) or of the posterior distribution (GLM and GAM).

Each panel shows three point estimates and an uncertainty range. The three points are the estimates from a GAM, a GLM, and Wood's lactation model. The first panel shows the estimated timing of the peak of lactation, with the GAM capturing the fact that the peak in the data occurs much later in lactation (~ week 11) while the other two models confidently estimate that the peak is in week ~8-9. The GAM estimate has a much wider credible interval, which does include the estimates of Wood's model & the GLM at the extreme end. This reflects the uncertainty in the estimation of the peak timing arising from the data having a wide flat peak.

The other panels show the estimates of fat content at the peak, which are broadly similar at ~ 0.7 kg fat per day. The final panel showing the persistency estimate shows the GAM estimate diverging from those of the GLM and Wood's model. Again, the latter two models are overly confident in their estimation of this biologically relevant parameter, despite the fited lactation curve not really following the lactation data.

Quantities of interest derived from Wood's model, a Tweedie GLM, and a Tweedie GAM fitted to the lactation data example: a) the estimated week of peak average daily fat content, b) the estimated average daily fat content at the peak, and c) the rate of change (first derivative) of the lactation curve estimated at a point that is midway between the peak fat content and the end of lacation. The points are the estimated values and the lines are a 95% uncertainty interval. The uncertainty interval is based on the 0.025 and 0.975 percentiles of the bootstrap distribution of model coefficient estimates (Wood's model) or of the posterior distribution (GLM and GAM). Each panel shows three point estimates and an uncertainty range. The three points are the estimates from a GAM, a GLM, and Wood's lactation model. The first panel shows the estimated timing of the peak of lactation, with the GAM capturing the fact that the peak in the data occurs much later in lactation (~ week 11) while the other two models confidently estimate that the peak is in week ~8-9. The GAM estimate has a much wider credible interval, which does include the estimates of Wood's model & the GLM at the extreme end. This reflects the uncertainty in the estimation of the peak timing arising from the data having a wide flat peak. The other panels show the estimates of fat content at the peak, which are broadly similar at ~ 0.7 kg fat per day. The final panel showing the persistency estimate shows the GAM estimate diverging from those of the GLM and Wood's model. Again, the latter two models are overly confident in their estimation of this biologically relevant parameter, despite the fited lactation curve not really following the lactation data.

a) Estimated daily growth rate on November 15^th^, 2021 and 95% Bayesian credible interval for the 18 pigs in the pig growth example. b) Posterior distribution of daily growth rate on November 15^th^, 2021, for three pigs (numbers 2, 13, and 17), for whom weight observations ceased before November 1^st^, 2021. In b), the shaded region is the posterior distribution, the point, and thick and thin bars are the posterior median, and 66% and 95% posterior intervals respectively.

With the fitted growth curves, we can estimate for any day what the growth rate of each pig was. In this figure I'm showing the estimated growth rate of each pig in the example on November 21st. This growth rate is the first derivative of the fitted growth curve (smooth function). I used posterior sampling to produce the posterior distribution of the growth rate for each pig. These are summarised as a point estimate (median) and ccredible interval in the first panel with most pigs growin at ~1-1.5 kg per day by November 21st, with uncertainties on the order of +/- 0.5 kg per day.

The second panel shows the entire posterior distribution of the estimated growth rate for three pigs (2, 13, and 17) for whom there were no weight estimates after November 1st. Here, the model is drawing power from the other pigs to help extrapolate the growth curves for these three pigs, but pig-specific details remain, with the posterior distribution for pig 17 being much more diffuse (wider) than for either pigs 2 or 13, reflecting greater uncertainty for the former animal.

a) Estimated daily growth rate on November 15^th^, 2021 and 95% Bayesian credible interval for the 18 pigs in the pig growth example. b) Posterior distribution of daily growth rate on November 15^th^, 2021, for three pigs (numbers 2, 13, and 17), for whom weight observations ceased before November 1^st^, 2021. In b), the shaded region is the posterior distribution, the point, and thick and thin bars are the posterior median, and 66% and 95% posterior intervals respectively. With the fitted growth curves, we can estimate for any day what the growth rate of each pig was. In this figure I'm showing the estimated growth rate of each pig in the example on November 21st. This growth rate is the first derivative of the fitted growth curve (smooth function). I used posterior sampling to produce the posterior distribution of the growth rate for each pig. These are summarised as a point estimate (median) and ccredible interval in the first panel with most pigs growin at ~1-1.5 kg per day by November 21st, with uncertainties on the order of +/- 0.5 kg per day. The second panel shows the entire posterior distribution of the estimated growth rate for three pigs (2, 13, and 17) for whom there were no weight estimates after November 1st. Here, the model is drawing power from the other pigs to help extrapolate the growth curves for these three pigs, but pig-specific details remain, with the posterior distribution for pig 17 being much more diffuse (wider) than for either pigs 2 or 13, reflecting greater uncertainty for the former animal.

Just updated my manuscript on using #GAMs in #AnimalScience, now on arXiv: doi.org/10.48550/arX...
πŸ„πŸ–πŸͺΆ

Extended examples now show how GAMs go beyond prediction, helping estimate biologically meaningful traits from data.

Code: github.com/gavinsimpson...

πŸ§ͺ #RStats #mgcv #Statistics #OpenScience

56 8 0 0

πŸš€ gratia 0.11.1 is out!
Compatibility with ggplot2 4.0.0 + usability improvements + bug fixes.

#Rstats #mgcv #gam #statistics

28 2 1 1
Visualization of example model, showing predictions and confidence envelopes by day of year and fish life stage.

Visualization of example model, showing predictions and confidence envelopes by day of year and fish life stage.

New #Rstats package for working with #mgcv! `VizSeasonalGams` streamlines the process of exploring model predictions across categorical predictors and comparing the effects of modelling choices. The package automates the process of producing `facet_grid`- and `facet_wrap`-like figures from models.

7 0 1 0
Video

If nothing else, this highlights the utility and (dare I say) flexibility of the #mgcv #rstats package!!

gif from @gsimpson.bsky.social intro to GAMs slide deck (fromthebottomoftheheap.net/slides/gam-i...)

19 3 1 0
Preview
Release Version 0.10.0 released and on CRAN Β· gavinsimpson/gratia Earlier this evening I wrapped up the source tarball for version 0.10.0 of gratia and submitted it to CRAN. Following automated check's, this new version of gratia is now available from CRAN πŸ₯³ This...

🚨 Earlier this evening I wrapped up the source tarball for version 0.10.0 of gratia πŸ“¦ and submitted it to CRAN. Following automated check's, this new version of gratia is now available from CRAN πŸ₯³ #RStats #mgcv #GAMs

github.com/gavinsimpson...

27 6 1 0

The penalty on a smooth is already doing a lot of model selection, but it can’t work on functions in the null space of the penalty. A solution to this is an extra penalty on the null space and voila, model selection through the smoothing parameters; see select argument to gam() in #mgcv #RStats

2 0 0 0
The image shows the result of a call to the conditional_values() function where the GAM contains a factor by smooth. In this figure, predicted values from the model conditional on values of x2 are shown for two levels of the factor fac. As we're conditioning on two variables, we include the group means for the 2 levels of fac, plus the smooth for each category. This model also includes a smooth effect of x0. As we don't condition on this covariate, it will be set to a typical representative value (value of observation closest to the median of x0).

The code to produce both images is below. This image is created by the last line of the code. 

# packages
library("mgcv")
library("gratia")

# categorical conditions are also handled
df <- data_sim("eg4", seed = 2)
m2 <- gam(y ~ fac + s(x2, by = fac) + s(x0), data = df, method = "REML")
cv <- conditional_values(
  m2,
  condition = list("fac", x2 = "fivenum")
)
# plot - we see a discrete x axis with five estimates per level of fac
cv |> draw()

# in this example we condition on `x2` and `fac %in% c(2,3)`
cv <- conditional_values(
  m2,
  condition = list("x2", fac = 2:3)
)
# plot - smooths of `x2` for `fac == 2` and `fac == 3`
cv |> draw()

The image shows the result of a call to the conditional_values() function where the GAM contains a factor by smooth. In this figure, predicted values from the model conditional on values of x2 are shown for two levels of the factor fac. As we're conditioning on two variables, we include the group means for the 2 levels of fac, plus the smooth for each category. This model also includes a smooth effect of x0. As we don't condition on this covariate, it will be set to a typical representative value (value of observation closest to the median of x0). The code to produce both images is below. This image is created by the last line of the code. # packages library("mgcv") library("gratia") # categorical conditions are also handled df <- data_sim("eg4", seed = 2) m2 <- gam(y ~ fac + s(x2, by = fac) + s(x0), data = df, method = "REML") cv <- conditional_values( m2, condition = list("fac", x2 = "fivenum") ) # plot - we see a discrete x axis with five estimates per level of fac cv |> draw() # in this example we condition on `x2` and `fac %in% c(2,3)` cv <- conditional_values( m2, condition = list("x2", fac = 2:3) ) # plot - smooths of `x2` for `fac == 2` and `fac == 3` cv |> draw()

The image shows the result of a call to the conditional_values() function where the GAM contains a factor by smooth. In this figure, predicted values from the model conditional on values of the factor fac are shown for different values of the covariate x2. As we're conditioning on two variables, but focusing on fac, we summarise x2 by default using its Tukey's five number summary, hence there are 5 estimates per category (level) of fac. This model also includes a smooth effect of x0. As we don't condition on this covariate, it will be set to a typical representative value (value of observation closest to the median of x0).

The code to produce both images is below. This particular image is created by the first call to the draw(). 

# packages
library("mgcv")
library("gratia")

# categorical conditions are also handled
df <- data_sim("eg4", seed = 2)
m2 <- gam(y ~ fac + s(x2, by = fac) + s(x0), data = df, method = "REML")
cv <- conditional_values(
  m2,
  condition = list("fac", x2 = "fivenum")
)
# plot - we see a discrete x axis with five estimates per level of fac
cv |> draw()

# in this example we condition on `x2` and `fac %in% c(2,3)`
cv <- conditional_values(
  m2,
  condition = list("x2", fac = 2:3)
)
# plot - smooths of `x2` for `fac == 2` and `fac == 3`
cv |> draw()

The image shows the result of a call to the conditional_values() function where the GAM contains a factor by smooth. In this figure, predicted values from the model conditional on values of the factor fac are shown for different values of the covariate x2. As we're conditioning on two variables, but focusing on fac, we summarise x2 by default using its Tukey's five number summary, hence there are 5 estimates per category (level) of fac. This model also includes a smooth effect of x0. As we don't condition on this covariate, it will be set to a typical representative value (value of observation closest to the median of x0). The code to produce both images is below. This particular image is created by the first call to the draw(). # packages library("mgcv") library("gratia") # categorical conditions are also handled df <- data_sim("eg4", seed = 2) m2 <- gam(y ~ fac + s(x2, by = fac) + s(x0), data = df, method = "REML") cv <- conditional_values( m2, condition = list("fac", x2 = "fivenum") ) # plot - we see a discrete x axis with five estimates per level of fac cv |> draw() # in this example we condition on `x2` and `fac %in% c(2,3)` cv <- conditional_values( m2, condition = list("x2", fac = 2:3) ) # plot - smooths of `x2` for `fac == 2` and `fac == 3` cv |> draw()

🚨 New function 🚨

@vincentab.bsky.social's marginaleffects πŸ“¦ is great for generating quantities of interest from your model. I've long wanted something similar to plot_predictions() for my gratia πŸ“¦

Introducing conditional_values() πŸŽ‰

gavinsimpson.github.io/gratia/refer... #RStats #GAMs #mgcv 1/6

83 13 4 1

#Poisson regression with #mgcv and #glmmTMB in #rstats just rocks

10 2 0 0
Preview
Temporal autocorrelation in GAMs and the mvgam package | GAMbler Temporal autocorrelation is a dominant feature of time series. We often want to use Generalized Additive Models (GAMs) to fit smoothing splines to time series data, but incorporating autocorrelation i...

Want to model long-term nonlinear trends and temporal autocorrelation in time series data? I've got tips and solutions on how to do this with Generalized Additive Models (GAMs): ecogambler.netlify.app/blog/autocor... #rstats #mgcv #timeseries πŸ§ͺ

27 8 1 1
Post image

A recent post on StackOverflow about difficulties setting up by-terms in #mgcv distributed lag models (stackoverflow.com/questions/75...) reminded me of a case study I wrote up last year. Have since added it to my very small blog here: ecogambler.netlify.app/blog/distrib... #rstats

5 2 2 0
Post image

Reminder that I'll be running a second edition of my Ecological forecasting in R workshop in May this year. Learn to use #mgcv, #mvgam and #brms to analyse complex ecological time series and produce meaningful insights #rstats 🌍 www.physalia-courses.org/courses-work...

24 12 1 0

Is there a way to specify the interaction to two MRF smooths in #mgcv? This errors b/c it seems that smooth.construct.mrf.... can't handle a length-2 "nb" argument:

ti(a, b, bs = c("mrf","mrf"), xt = list(nb = list(nb_a, nb_b))

#rstats @ericJpedersen @millerdl @ucfagls

0 0 0 0
Post image

Familiar names in the new member of my personal reference library, the 800 page https://t.co/ezqHd32YnH Distribution maps produced from the @luomus insect database (init. Matti Virtala) with the #mgcv #rstats library by @euxoa Base map help by @jlehtoma

0 0 1 0