π It's livestream day π
Join me at 1700 (CET; 1600 UCT) today for two hours of GAM goodness π€€
π½οΈ Youtube: youtube.com/live/A9U8e1K...
Hit the Notify me π to get a reminder when I go live
#RStats #mgcv #statistics #GAMs #DataScience π§ͺ
π¨ GAMs have moved onβso itβs time for an update.
On March 3, 2026 (17:00β19:00 CET) Iβll be livestreaming an updated introduction to Generalized Additive Models in R
πΊ YouTube livestream link: youtube.com/live/A9U8e1K...
#RStats #mgcv #GAMs #gratia #statistics π§ͺ
Registrations are now closed for the upcoming course on GAMs in R with @gsimpson.bsky.social
Thanks everyone and see you in December!
www.physalia-courses.org/courses-work...
#GAMs #Rstats #gratia #mgcv
Results of model fitting to the average daily fat content data from @Henderson1990-bd. a) observed average daily fat content (points) and estimated lactation curves from Wood's [-@Wood1967-re] model, a Tweedie GLM, and a Tweedie GAM (lines) with associated 95% confidence (Wood's model) or 95% credible intervals (GLM and GAM). Response residuals for Wood's model (b), Tweedie GLM (c), and Tweedie GAM (d), plus scatter plot smoothers (lines) and 95% credible intervals (shaded ribbons). The fitted lactation curves are like an inverted U, with an extended longer tail to the right (later in lactation). The GAM curve fits the data well, but the fitted curves from Wood's model and the GLM equivalent do not provide good fits to the data, and over predict the amount of fat produced at the peak of lactation, and only grossly capture the decline in fat production later in lactation. The remaining panels show the raw response residuals for the three models, drawing attention to the poor fit; for Wood's model and the GLM there is significant pattern in the residuals, while for the GAM no residual pattern is observed.
Quantities of interest derived from Wood's model, a Tweedie GLM, and a Tweedie GAM fitted to the lactation data example: a) the estimated week of peak average daily fat content, b) the estimated average daily fat content at the peak, and c) the rate of change (first derivative) of the lactation curve estimated at a point that is midway between the peak fat content and the end of lacation. The points are the estimated values and the lines are a 95% uncertainty interval. The uncertainty interval is based on the 0.025 and 0.975 percentiles of the bootstrap distribution of model coefficient estimates (Wood's model) or of the posterior distribution (GLM and GAM). Each panel shows three point estimates and an uncertainty range. The three points are the estimates from a GAM, a GLM, and Wood's lactation model. The first panel shows the estimated timing of the peak of lactation, with the GAM capturing the fact that the peak in the data occurs much later in lactation (~ week 11) while the other two models confidently estimate that the peak is in week ~8-9. The GAM estimate has a much wider credible interval, which does include the estimates of Wood's model & the GLM at the extreme end. This reflects the uncertainty in the estimation of the peak timing arising from the data having a wide flat peak. The other panels show the estimates of fat content at the peak, which are broadly similar at ~ 0.7 kg fat per day. The final panel showing the persistency estimate shows the GAM estimate diverging from those of the GLM and Wood's model. Again, the latter two models are overly confident in their estimation of this biologically relevant parameter, despite the fited lactation curve not really following the lactation data.
a) Estimated daily growth rate on November 15^th^, 2021 and 95% Bayesian credible interval for the 18 pigs in the pig growth example. b) Posterior distribution of daily growth rate on November 15^th^, 2021, for three pigs (numbers 2, 13, and 17), for whom weight observations ceased before November 1^st^, 2021. In b), the shaded region is the posterior distribution, the point, and thick and thin bars are the posterior median, and 66% and 95% posterior intervals respectively. With the fitted growth curves, we can estimate for any day what the growth rate of each pig was. In this figure I'm showing the estimated growth rate of each pig in the example on November 21st. This growth rate is the first derivative of the fitted growth curve (smooth function). I used posterior sampling to produce the posterior distribution of the growth rate for each pig. These are summarised as a point estimate (median) and ccredible interval in the first panel with most pigs growin at ~1-1.5 kg per day by November 21st, with uncertainties on the order of +/- 0.5 kg per day. The second panel shows the entire posterior distribution of the estimated growth rate for three pigs (2, 13, and 17) for whom there were no weight estimates after November 1st. Here, the model is drawing power from the other pigs to help extrapolate the growth curves for these three pigs, but pig-specific details remain, with the posterior distribution for pig 17 being much more diffuse (wider) than for either pigs 2 or 13, reflecting greater uncertainty for the former animal.
Just updated my manuscript on using #GAMs in #AnimalScience, now on arXiv: doi.org/10.48550/arX...
πππͺΆ
Extended examples now show how GAMs go beyond prediction, helping estimate biologically meaningful traits from data.
Code: github.com/gavinsimpson...
π§ͺ #RStats #mgcv #Statistics #OpenScience
π gratia 0.11.1 is out!
Compatibility with ggplot2 4.0.0 + usability improvements + bug fixes.
#Rstats #mgcv #gam #statistics
Visualization of example model, showing predictions and confidence envelopes by day of year and fish life stage.
New #Rstats package for working with #mgcv! `VizSeasonalGams` streamlines the process of exploring model predictions across categorical predictors and comparing the effects of modelling choices. The package automates the process of producing `facet_grid`- and `facet_wrap`-like figures from models.
If nothing else, this highlights the utility and (dare I say) flexibility of the #mgcv #rstats package!!
gif from @gsimpson.bsky.social intro to GAMs slide deck (fromthebottomoftheheap.net/slides/gam-i...)
π¨ Earlier this evening I wrapped up the source tarball for version 0.10.0 of gratia π¦ and submitted it to CRAN. Following automated check's, this new version of gratia is now available from CRAN π₯³ #RStats #mgcv #GAMs
github.com/gavinsimpson...
The penalty on a smooth is already doing a lot of model selection, but it canβt work on functions in the null space of the penalty. A solution to this is an extra penalty on the null space and voila, model selection through the smoothing parameters; see select argument to gam() in #mgcv #RStats
The image shows the result of a call to the conditional_values() function where the GAM contains a factor by smooth. In this figure, predicted values from the model conditional on values of x2 are shown for two levels of the factor fac. As we're conditioning on two variables, we include the group means for the 2 levels of fac, plus the smooth for each category. This model also includes a smooth effect of x0. As we don't condition on this covariate, it will be set to a typical representative value (value of observation closest to the median of x0). The code to produce both images is below. This image is created by the last line of the code. # packages library("mgcv") library("gratia") # categorical conditions are also handled df <- data_sim("eg4", seed = 2) m2 <- gam(y ~ fac + s(x2, by = fac) + s(x0), data = df, method = "REML") cv <- conditional_values( m2, condition = list("fac", x2 = "fivenum") ) # plot - we see a discrete x axis with five estimates per level of fac cv |> draw() # in this example we condition on `x2` and `fac %in% c(2,3)` cv <- conditional_values( m2, condition = list("x2", fac = 2:3) ) # plot - smooths of `x2` for `fac == 2` and `fac == 3` cv |> draw()
The image shows the result of a call to the conditional_values() function where the GAM contains a factor by smooth. In this figure, predicted values from the model conditional on values of the factor fac are shown for different values of the covariate x2. As we're conditioning on two variables, but focusing on fac, we summarise x2 by default using its Tukey's five number summary, hence there are 5 estimates per category (level) of fac. This model also includes a smooth effect of x0. As we don't condition on this covariate, it will be set to a typical representative value (value of observation closest to the median of x0). The code to produce both images is below. This particular image is created by the first call to the draw(). # packages library("mgcv") library("gratia") # categorical conditions are also handled df <- data_sim("eg4", seed = 2) m2 <- gam(y ~ fac + s(x2, by = fac) + s(x0), data = df, method = "REML") cv <- conditional_values( m2, condition = list("fac", x2 = "fivenum") ) # plot - we see a discrete x axis with five estimates per level of fac cv |> draw() # in this example we condition on `x2` and `fac %in% c(2,3)` cv <- conditional_values( m2, condition = list("x2", fac = 2:3) ) # plot - smooths of `x2` for `fac == 2` and `fac == 3` cv |> draw()
π¨ New function π¨
@vincentab.bsky.social's marginaleffects π¦ is great for generating quantities of interest from your model. I've long wanted something similar to plot_predictions() for my gratia π¦
Introducing conditional_values() π
gavinsimpson.github.io/gratia/refer... #RStats #GAMs #mgcv 1/6
Want to model long-term nonlinear trends and temporal autocorrelation in time series data? I've got tips and solutions on how to do this with Generalized Additive Models (GAMs): ecogambler.netlify.app/blog/autocor... #rstats #mgcv #timeseries π§ͺ
A recent post on StackOverflow about difficulties setting up by-terms in #mgcv distributed lag models (stackoverflow.com/questions/75...) reminded me of a case study I wrote up last year. Have since added it to my very small blog here: ecogambler.netlify.app/blog/distrib... #rstats
Reminder that I'll be running a second edition of my Ecological forecasting in R workshop in May this year. Learn to use #mgcv, #mvgam and #brms to analyse complex ecological time series and produce meaningful insights #rstats π www.physalia-courses.org/courses-work...
Is there a way to specify the interaction to two MRF smooths in #mgcv? This errors b/c it seems that smooth.construct.mrf.... can't handle a length-2 "nb" argument:
ti(a, b, bs = c("mrf","mrf"), xt = list(nb = list(nb_a, nb_b))
#rstats @ericJpedersen @millerdl @ucfagls
Familiar names in the new member of my personal reference library, the 800 page https://t.co/ezqHd32YnH Distribution maps produced from the @luomus insect database (init. Matti Virtala) with the #mgcv #rstats library by @euxoa Base map help by @jlehtoma