Advertisement ยท 728 ร— 90

Posts by Spencer J Fox

Preview
Inequalities in infectious disease dynamics symposium | LSHTM Join us for the Social Inequalities in Infectious Disease Dynamics Symposium at the London School of Hygiene & Tropical Medicine (LSHTM). The day features a full programme of presentations and invited

Great speakers at this symposium - 22 April 2026 at LSHTM. Join online or in person (email if in person) to hear great talks at the cutting edge of Inequalities in infectious disease dynamics. www.lshtm.ac.uk/newsevents/e...

1 week ago 14 13 1 1

I need this to be real, so I can get around NAUs rules against spending funds on coffee machines/accessories

2 weeks ago 2 0 0 0

Based on my understanding of AI, I don't think having it "think" longer will address any of the issues I faced building the app, so I think there will need to be some major breakthroughs to get there (e.g. moving beyond LLMs or embedding them with new capabilities)

3 weeks ago 1 0 0 0

I absolutely can see how AI will make coding more productive and even my research, but I don't see this replacing individuals for important tasks in the near-term and any use of this for research purposes should be heavily audited before trusting

3 weeks ago 0 0 1 0

Maybe I'll come back to vibecoding the app again, but I am not sure. In the end I produced something that looked like it worked, but couldn't even make simple calculations accurately.

3 weeks ago 0 0 1 0

This week I ended the experiment and finally opened up excel again... It was a breath of fresh air: I knew that the numbers I inputted would save properly, could fully audit calculations, and ultimately was able to get my budgets organized

3 weeks ago 0 0 1 0

I thought I was making progress, the AIs were constantly finding new bugs! But then I tried to use it again and immediately I found new errors related to salary calculations and a number of things hidden within the app.

3 weeks ago 0 0 1 0

All of this made me lose total confidence in the app, I wasn't going to spend a bunch of time inputting all of my real information unless I thought this would actually work. So I spent about a month working with the AIs to simplify the app and debug everything.

3 weeks ago 0 0 1 0

The software was way outside of my understanding (Rust, SQL databases, etc), and I had no way to dig into what was going on under the hood. I would enter in numbers, save, reload and find that the numbers had changed

3 weeks ago 0 0 1 0
Advertisement

Then about a month ago (and after I confirmed with both AIs that the app was ready for primetime) I started really using it. Immediately I came across issues, the app had lied to me about database structures, saving features, loading features, budget calculations, basically everything.

3 weeks ago 0 0 1 0

On the good side, the app was built pretty quickly and it was really cool to see the interface and play around with the features. It was incredible to produce an app just for myself and I even started having hopes of getting it good enough I could post online for others in academics to use

3 weeks ago 0 0 1 0

I found myself spending this vibe code time "coding" and also answering emails, which wasn't a terrible use of time, but I would have rather been writing or thinking.

3 weeks ago 0 0 1 0

First, the experience is probably about as bad as it can get. Your agent works long enough that you have to try and work on other things while it's "thinking" but not long enough for you to get anything meaningful done in that time

3 weeks ago 0 0 1 0

I probably spent about 2 months working on the app on and off, adding and removing features, testing that it was working, etc. Overall it was probably about 40 hours of work across that time period with both codex and claude editing the app and critiquing each others work

3 weeks ago 0 0 1 0

This is something that I have a fairly simple excel spreadsheet to handle that I was hoping to move beyond, as I thought it would be more convenient, flexible, and ultimately accurate.

3 weeks ago 0 0 1 0

Starting from scratch I worked with the AI to develop what really should not be hard to develop. An app that allows me to enter information about various grants, personnel costs, expenses, and then can track the spend so that I can know how much money I have and where it is, etc.

3 weeks ago 0 0 1 0
Post image

To really test the AI I worked with it to vibe code a grant planning app, since budgeting various grant spend down and personnel salary costs within university environments is basically the bane of my existence (numbers and grants here are fake)

3 weeks ago 1 0 2 0

Double checking for small and often really dumb errors can take longer than coding things up myself, so I actually found I was using the AI tools only for limited coding tasks (e.g. I code up the full pipeline and have the AI input code in specific areas to do specific tasks)

3 weeks ago 0 0 1 0

As you get further towards full full agency, double checking becomes really important. When they are developing their own models, I needed to look through the code closely to identify both common and edge case errors that would have caused issues if I used for real or research purposes

3 weeks ago 0 0 1 0
Advertisement

For example, I have pipelines for producing infectious disease forecasts weekly. I can confidently use these tools to modify the code to use the data in different ways, with slight model variants, and even some simple and totally new model variants that I have the code develop itself

3 weeks ago 0 0 1 0

In my testing I found that these tools work really well when I have existing code that needs to be modified e.g. with aesthetic changes to tools, slight model modifications, new data inputs, data cleaning changes, etc. Even refactoring a whole code repository goes pretty smoothly

3 weeks ago 0 0 1 0

I have been playing with Codex and Claude extensively for about 6 months now, and while they are clearly impressive, I don't see them replacing software engineers in the near future and I don't understand how people can trust their work without knowing how to code

3 weeks ago 1 0 1 0

I'm here for the vibes! Would love to chat more :)

3 weeks ago 1 0 0 0

Yea I'll take a closer look at the basket index! Ensembling baselines is an interesting idea, but what would the point be? Capturing different reference opinions or improving performance?

3 weeks ago 1 0 1 0
Preview
A basket of baselines Been thinking about the overlap between baseline models (Manuel Stapper is working on this), the method of analogs (stuff @nickreich and co are going on about), and model output evaluation (i.e. @kat...

I really like this idea of multiple baselines with different contexts. some relationship to this basket of baselines idea I was chatting about a while ago

community.epinowcast.org/t/a-basket-o...

3 weeks ago 1 1 1 0

Congratulations to Fox Lab PhD student @ehsansuez.bsky.social for his hard work on this project!

3 weeks ago 2 0 0 0

Our recommendation:

Transparency and use multiple Baseline models:

"...include an optimal flatline model as a stringent benchmark, a robust flatline model, and a seasonal baseline to test whether information from the current season improves predictions."

3 weeks ago 2 0 2 0
Preview
Mind the Baseline: The Hidden Impact of Reference Model Selection on Forecast Assessment Baseline models are essential reference points for evaluating forecasting methods, yet their selection often receives insufficient attention. We present a systematic framework for baseline model selec...

Why this matters:

Forecast hubs rely heavily on baselines for:
โ€ข benchmarking models
โ€ข ranking performance

๐Ÿ‘‰ If the baseline shifts, the leaderboard can change too. Our work is complementary and supports the conclusions from previous work: www.medrxiv.org/content/10.1...

3 weeks ago 1 0 1 0
Post image

Interestingly, this flatline model outperformed seasonal models even with highly seasonal data

3 weeks ago 1 0 1 0
Advertisement
Post image

Our main finding: Performance differed substantially across variations AND a flatline model that uses the ten most recent transformed observations for training outcompeted all others across all diseases

3 weeks ago 1 0 1 0