I think it's because (not inspite of) the fact that their basic goals (helping people in their local communities find the information they need) has never changed
Posts by Eric Pedersen
(possibly hot) take: public libraries are the most consistently innovative and progressive institutions in the western world. Definitely beating out universities or tech companies
Welcome! I figured I needed to make some notes on them anyway, and it might as well be public.
For #2: I'm pretty sure that it won't fully solve spatial confounding issues (in part because, as with all confounding, there is no fundamental solution to "I didn't measure everything of relevance")
Small correction to this and the last post in the thread; I meant to write that `A` is the matrix of neighbourhoods, and `D` is the matrix of blocks (this is part of what I mean by it being confusing to set up...). 9a/n
This might be of interest to @gsimpson.bsky.social , @noamross.net , @bbolker.bsky.social @stephenjwild.bsky.social @fonyairvine.bsky.social @njklappstein.bsky.social
A couple questions I still have:
1. How sensitive is model fit to the exact block/NB structure used?
2. Does this also help mitigate e.g. spatial confounding, rather than just residual dependence?
3. Under what conditions do the approximations used to derive NCV break down?
- End
15/15
In short: NCV lets you model dependency structure without having to give a specific model for dependency (beyond setting neighbourhood size to be large enough). It does seem to work to reduce overfitting, but can be tricky to set up. 14/n
Notice that the NCV curve for day of year and hour of the day are much smoother than for the REML model, but the shore distance function is roughly equally wiggly. This is all done without having to directly model the spatiotemporal dependency structure in the GAM via covariances or smoothers 13/n
Partial effect plots of smooth GAM curves (output from the gratia package) for two different GAM models. There are four plots: top-left is an effect plot for day of year (s(day)), top-right is for hour of day, bottom-left is for shore-distance, and bottom-right is for individual (s(tag)). Black curves represent functions estimated via REML, and blue lines represent curves estimated via NCV with the given blocking/neighbourhood structure.
I've estimated a GAM to estimate how mean velocity varied as a function of individual ("tag"), hour, day, and distance from shore. This plot compares the smooths estimated for a model estimated via REML (base) vs. NCV (using `compare_smooths()` from @gsimpson.bsky.social's gratia package). 12/n
Notice that a given obs only occurs in one block, but can occur in multiple NBs. Also note the breaks around observation 500 and 1200: these correspond to different individuals (dependency is only expected to occur w/in individuals). 11/n
Figure showing a matrix representing group membership, with rows indicating blocks and columns indicating observations. Labeled as a 12 x 1835 dimension matrix.
Here's an example of the blocking and NB structure for some code I'm working on. Data is spatiotemporal movement of fish. Blocks: 12h periods for each individual. NBs: blocks+/- 6 hours. Yellow: which obs. are in each block, purple = which obs. are also in the NB for that block 10/n
And
D = sparseMatrix(j = nei$d, p = c(0, nei$md), dim = c(B, N))
is a matrix of the NBs for each block. Both A and D have dimension NxB, where N is the # of obs, and B is the number of blocks, and the matrices serve as indicator functions to state which obs are in which block/NB. 9/n
This took me the most time to figure out, but I realized that this is very close to how the Matrix package specifies sparse matrices. If we start from a neighbourhood structure as a list called `nei`, then
A = sparseMatrix(j = nei$a, p = c(0, nei$ma), dim = c(B, N))
is a matrix of blocks 8/n
However, it can be tricky to set up the block and NB structure in mgcv (see `?mgcv::NCV`). In short, you need give it vector `a` & `d' that specify the obs that belong to each block/NB, and vectors `ma`/`md` that give the index of the end-point of each block / NB in vectors a & d 7/n
This is powerful because you don't need to know the exact dependency structure, just that min NB size needed to make obs roughly exchangeable. The trick used by method = "NCV" in mgcv is in making this method reasonably fast to compute (see arxiv.org/abs/2404.16490 for the deep magic) 6/n
Other example structures:
1. if we have multiple obs from diff indivs, blocks could be single obs, with the individuals being NBs.
2. an annual time series with an autocorrelation time of 5 years, where each obs. could be a block, with the NB being all the observations within +/- 5 years 5/n
Blocks + NBs model the assumed residual dependency structure in the data. That is: given observed covars X, outcomes (y) in block b should be statistically independent of all the values of y outside of the NB around block b. LOO is and example of NCV where each obs is its own NB 4/n
I like this terminology (compared to the notation in Simon's paper) as it's easy to remember that a block is smaller than a neighbourhood (and that each observation might lie in many neighbourhoods). Generally, you would assume all observations in a block are also in the NB for that block 3/n
Neighbourhood Cross Validation (or NCV), is a model-fitting criteria designed to reduce overfitting in dependent data sets by evaluating how well a model predicts sets of observations ("blocks") when some relevant neighbourhood ("NB") of observations around each block is excluded 2/n
I've been working with NCV this last week; it's a powerful tool for modelling data with dependencies that are poorly/inefficiently modelled via stochastic process priors like cov functions or low-rank smoothers. However, it's a complex topic, so I thought I'd summarize the key ideas in a thread 1/n
I just realized today, though, that the specification of neighbourhoods and blocks is just a compressed sparse matrix representation. That's made it easier for me to think of what observations belong to which neighbourhood/block, and should will help build functions for creating NCV neighbourhoods
I've been playing around with NCV for a paper we're working on modelling high frequency movement data. It works really well so far under my testing, but building the neighborhoods took me a while to figure out
The Garden of Earthly Delights (detail), by Hieronymus Bosch, 1480-1505, 📸 by @alexbrandon
"that's correc: this isn't Not Milk. I don't know why you're confused"
Simple : just call it "Not not milk"
1 [Nurse talking to camera] This is usually our busiest time of year at the centre 2 [Show a sign outside a medical institution reading:] Centre for Adults Who Still Can’t Get Their Heads Round Daylight Savings 3 NURSE: We have a spike in admissions when BST starts 4 CONFUSED PATIENT BEING LAID DOWN BY NURSES: Is it forward or back? NURSE: Lie down love 5 We often see the same patients ever year PATIENT: So it’s like… time travel? DOCTOR: No 6 Every year we try to explain BST to them in a fun and memorable way [The patients are sat around a fun show, with people dressed in bunny suits jumping around a giant clock] BUNNY: And now I *spring forward*... 7 [Bunny springs forward one hour on clock] 8 PATIENT WATCHING SHOW: I don’t get it 9 The hardest patients are the Summer Time refusers 10 PATIENT [sat strapped to a chair as a doctor talks to him through a screen, through a tannoy] PATIENT: But why is it better? DOCTOR [talking through microphone]: Because it complicates things. 11 PATIENT: How is that good? DOCTOR: It makes you more tired. 12 DOCTOR [to assistant]: Fetch the cattle prod [Ends]
I think the clocks in UK go back tonight. Or forward. Not sure. Is it forward
Someone (maybe @bbolker.bsky.social ?) once remarked at a conference when Gavin, Dave Miller, and I came up to a discussion that "the GAM mafia was here now", so we figured we'd lean into it. :)
Hi all. I am very excited that after 6 years I finally got my phylogenetic comparative methods book and online exercises online. Feel free to use and share. The book is here: nhcooper123.github.io/pcm-primer/. Note that it is not finished, we had to abandon it before the sunk costs fallacy broke us
I'm excited to see this out! It's also a nice coincidence, as @gsimpson.bsky.social and I just got phylogenetic smoothers working (for BM and OU processes) in our MRFtools package. I'll definitely be pointing people to your book for background.
gam-mafia.github.io/MRFtools/art...