Advertisement · 728 × 90

Posts by Eric Pedersen

I think it's because (not inspite of) the fact that their basic goals (helping people in their local communities find the information they need) has never changed

2 days ago 6 0 0 0

(possibly hot) take: public libraries are the most consistently innovative and progressive institutions in the western world. Definitely beating out universities or tech companies

2 days ago 9 2 1 0

Welcome! I figured I needed to make some notes on them anyway, and it might as well be public.

For #2: I'm pretty sure that it won't fully solve spatial confounding issues (in part because, as with all confounding, there is no fundamental solution to "I didn't measure everything of relevance")

1 week ago 1 0 0 0

Small correction to this and the last post in the thread; I meant to write that `A` is the matrix of neighbourhoods, and `D` is the matrix of blocks (this is part of what I mean by it being confusing to set up...). 9a/n

1 week ago 1 0 0 0

This might be of interest to @gsimpson.bsky.social , @noamross.net , @bbolker.bsky.social @stephenjwild.bsky.social @fonyairvine.bsky.social @njklappstein.bsky.social

1 week ago 2 0 1 0

A couple questions I still have:

1. How sensitive is model fit to the exact block/NB structure used?
2. Does this also help mitigate e.g. spatial confounding, rather than just residual dependence?
3. Under what conditions do the approximations used to derive NCV break down?

- End
15/15

1 week ago 6 0 1 0

In short: NCV lets you model dependency structure without having to give a specific model for dependency (beyond setting neighbourhood size to be large enough). It does seem to work to reduce overfitting, but can be tricky to set up. 14/n

1 week ago 3 0 1 0

Notice that the NCV curve for day of year and hour of the day are much smoother than for the REML model, but the shore distance function is roughly equally wiggly. This is all done without having to directly model the spatiotemporal dependency structure in the GAM via covariances or smoothers 13/n

1 week ago 3 0 1 0
Partial effect plots of smooth GAM curves (output from the gratia package) for two different GAM models. There are four plots: top-left is an effect plot for day of year (s(day)), top-right is for hour of day, bottom-left is for shore-distance, and bottom-right is for individual (s(tag)).  Black curves represent functions estimated via REML, and blue lines represent curves estimated via NCV with the given blocking/neighbourhood structure.

Partial effect plots of smooth GAM curves (output from the gratia package) for two different GAM models. There are four plots: top-left is an effect plot for day of year (s(day)), top-right is for hour of day, bottom-left is for shore-distance, and bottom-right is for individual (s(tag)). Black curves represent functions estimated via REML, and blue lines represent curves estimated via NCV with the given blocking/neighbourhood structure.

I've estimated a GAM to estimate how mean velocity varied as a function of individual ("tag"), hour, day, and distance from shore. This plot compares the smooths estimated for a model estimated via REML (base) vs. NCV (using `compare_smooths()` from @gsimpson.bsky.social's gratia package). 12/n

1 week ago 6 0 1 1
Advertisement

Notice that a given obs only occurs in one block, but can occur in multiple NBs. Also note the breaks around observation 500 and 1200: these correspond to different individuals (dependency is only expected to occur w/in individuals). 11/n

1 week ago 3 0 1 0
Figure showing a matrix representing group membership, with rows indicating blocks and columns indicating observations. Labeled as a 12 x 1835 dimension matrix.

Figure showing a matrix representing group membership, with rows indicating blocks and columns indicating observations. Labeled as a 12 x 1835 dimension matrix.

Here's an example of the blocking and NB structure for some code I'm working on. Data is spatiotemporal movement of fish. Blocks: 12h periods for each individual. NBs: blocks+/- 6 hours. Yellow: which obs. are in each block, purple = which obs. are also in the NB for that block 10/n

1 week ago 3 0 1 0

And

D = sparseMatrix(j = nei$d, p = c(0, nei$md), dim = c(B, N))

is a matrix of the NBs for each block. Both A and D have dimension NxB, where N is the # of obs, and B is the number of blocks, and the matrices serve as indicator functions to state which obs are in which block/NB. 9/n

1 week ago 3 0 2 0

This took me the most time to figure out, but I realized that this is very close to how the Matrix package specifies sparse matrices. If we start from a neighbourhood structure as a list called `nei`, then

A = sparseMatrix(j = nei$a, p = c(0, nei$ma), dim = c(B, N))

is a matrix of blocks 8/n

1 week ago 2 0 1 0

However, it can be tricky to set up the block and NB structure in mgcv (see `?mgcv::NCV`). In short, you need give it vector `a` & `d' that specify the obs that belong to each block/NB, and vectors `ma`/`md` that give the index of the end-point of each block / NB in vectors a & d 7/n

1 week ago 2 0 1 0
Preview
On Neighbourhood Cross Validation Many varieties of cross validation would be statistically appealing for the estimation of smoothing and other penalized regression hyperparameters, were it not for the high cost of evaluating such cri...

This is powerful because you don't need to know the exact dependency structure, just that min NB size needed to make obs roughly exchangeable. The trick used by method = "NCV" in mgcv is in making this method reasonably fast to compute (see arxiv.org/abs/2404.16490 for the deep magic) 6/n

1 week ago 3 0 1 0

Other example structures:
1. if we have multiple obs from diff indivs, blocks could be single obs, with the individuals being NBs.
2. an annual time series with an autocorrelation time of 5 years, where each obs. could be a block, with the NB being all the observations within +/- 5 years 5/n

1 week ago 3 0 1 0

Blocks + NBs model the assumed residual dependency structure in the data. That is: given observed covars X, outcomes (y) in block b should be statistically independent of all the values of y outside of the NB around block b. LOO is and example of NCV where each obs is its own NB 4/n

1 week ago 2 0 1 0
Preview
On Neighbourhood Cross Validation Many varieties of cross validation would be statistically appealing for the estimation of smoothing and other penalized regression hyperparameters, were it not for the high cost of evaluating such cri...

I like this terminology (compared to the notation in Simon's paper) as it's easy to remember that a block is smaller than a neighbourhood (and that each observation might lie in many neighbourhoods). Generally, you would assume all observations in a block are also in the NB for that block 3/n

1 week ago 2 0 1 0
Advertisement

Neighbourhood Cross Validation (or NCV), is a model-fitting criteria designed to reduce overfitting in dependent data sets by evaluating how well a model predicts sets of observations ("blocks") when some relevant neighbourhood ("NB") of observations around each block is excluded 2/n

1 week ago 3 1 1 0

I've been working with NCV this last week; it's a powerful tool for modelling data with dependencies that are poorly/inefficiently modelled via stochastic process priors like cov functions or low-rank smoothers. However, it's a complex topic, so I thought I'd summarize the key ideas in a thread 1/n

1 week ago 23 4 2 1

I just realized today, though, that the specification of neighbourhoods and blocks is just a compressed sparse matrix representation. That's made it easier for me to think of what observations belong to which neighbourhood/block, and should will help build functions for creating NCV neighbourhoods

1 week ago 2 0 0 0

I've been playing around with NCV for a paper we're working on modelling high frequency movement data. It works really well so far under my testing, but building the neighborhoods took me a while to figure out

1 week ago 1 0 1 0
Post image

The Garden of Earthly Delights (detail), by Hieronymus Bosch, 1480-1505, 📸 by @alexbrandon

2 weeks ago 16156 3489 256 193
Post image
2 weeks ago 2757 257 9 10

"that's correc: this isn't Not Milk. I don't know why you're confused"

3 weeks ago 3 0 0 0
Advertisement

Simple : just call it "Not not milk"

3 weeks ago 1 0 1 0
1 [Nurse talking to camera]
This is usually our busiest time of year at the centre 

2 [Show a sign outside a medical institution reading:]

Centre for Adults Who Still Can’t Get Their Heads Round Daylight Savings

3 
NURSE:
We have a spike in admissions when BST starts

4 
CONFUSED PATIENT BEING LAID DOWN BY NURSES:
Is it forward or back?

NURSE:
Lie down love

5 
We often see the same patients ever year

PATIENT:
So it’s like… time travel?

DOCTOR:
No

6 Every year we try to explain BST to them in a fun and memorable way

[The patients are sat around a fun show, with people dressed in bunny suits jumping around a giant clock] 

BUNNY: And now I *spring forward*...

7
[Bunny springs forward one hour on clock]

8 
PATIENT WATCHING SHOW:
I don’t get it

9 
The hardest patients are the Summer Time refusers

10 PATIENT [sat strapped to a chair as a doctor talks to him through a screen, through a tannoy]

PATIENT: But why is it better?

DOCTOR [talking through microphone]: 

Because it complicates things.

11 
PATIENT: 
How is that good?

DOCTOR: 
It makes you more tired.

12 

DOCTOR [to assistant]:
Fetch the cattle prod

[Ends]

1 [Nurse talking to camera] This is usually our busiest time of year at the centre 2 [Show a sign outside a medical institution reading:] Centre for Adults Who Still Can’t Get Their Heads Round Daylight Savings 3 NURSE: We have a spike in admissions when BST starts 4 CONFUSED PATIENT BEING LAID DOWN BY NURSES: Is it forward or back? NURSE: Lie down love 5 We often see the same patients ever year PATIENT: So it’s like… time travel? DOCTOR: No 6 Every year we try to explain BST to them in a fun and memorable way [The patients are sat around a fun show, with people dressed in bunny suits jumping around a giant clock] BUNNY: And now I *spring forward*... 7 [Bunny springs forward one hour on clock] 8 PATIENT WATCHING SHOW: I don’t get it 9 The hardest patients are the Summer Time refusers 10 PATIENT [sat strapped to a chair as a doctor talks to him through a screen, through a tannoy] PATIENT: But why is it better? DOCTOR [talking through microphone]: Because it complicates things. 11 PATIENT: How is that good? DOCTOR: It makes you more tired. 12 DOCTOR [to assistant]: Fetch the cattle prod [Ends]

I think the clocks in UK go back tonight. Or forward. Not sure. Is it forward

3 weeks ago 1293 406 47 43

Someone (maybe @bbolker.bsky.social ?) once remarked at a conference when Gavin, Dave Miller, and I came up to a discussion that "the GAM mafia was here now", so we figured we'd lean into it. :)

3 weeks ago 6 0 0 0
Phylogenetic Comparative Methods Phylogenetic Comparative Methods

Hi all. I am very excited that after 6 years I finally got my phylogenetic comparative methods book and online exercises online. Feel free to use and share. The book is here: nhcooper123.github.io/pcm-primer/. Note that it is not finished, we had to abandon it before the sunk costs fallacy broke us

3 weeks ago 286 180 9 3
Getting started with the MRFtools package

I'm excited to see this out! It's also a nice coincidence, as @gsimpson.bsky.social and I just got phylogenetic smoothers working (for BM and OU processes) in our MRFtools package. I'll definitely be pointing people to your book for background.

gam-mafia.github.io/MRFtools/art...

3 weeks ago 11 3 1 1