Just a note that with all this "LLMs accurately identifying users from text", some free text responses in allegedly de-identified survey data could in principle identify research respondents, if not today then soon.
Posts by Eva Vivalt
My secondary school shared a building with the university, so one semester I found myself teaching in the same building as my secondary school.
... Ooh, I guess I'll also need to do this with a script just in case, I didn't check every word or anything.
In drafting up the new site, Claude also of its own volition changed the acceptance/etc. status of some of my papers. Like, um, no, please don't? But that one didn't go live, thankfully.
The best meal of my life was a chickpea salad from a shop after completing the Milford Track. I can't replicate the salad, because the taste was not in the salad but in the hike.
I had deleted and started a new repo in figuring out how I wanted the page to be. It was the same repo, just hiding all my messy changes to get it to that point, since GitHub Pages is public. So I traded one potential embarrassment for another.... 2/2
In principle I could have re-checked it, I just hadn't thought it would be necessary since I had checked it before and the ask seemed trivial. 1/2
I caught it on the non-live version and asked it to fix that before it was pushed. I checked it out, it was fixed, but then later we rebuilt the repo, and in rebuilding the repo it probably overwrote the fix. That’s my best guess, but this was a while back. 2/2
Not entirely sure - this was a while ago. I did a substantial site redesign, and as part of it, it converted my blog to a "writing" page. It surprisingly made summaries of each blog post to use in lieu of the blog posts, rather than importing the posts directly. 1/2
Gah, apparently for the past two weeks all my past blog posts got overwritten by AI-generated summaries of them. Thanks for nothing, Claude.
Two thoughtful essays about the impact of LLMs on graduate education:
ergosphere.blog/posts/the-ma...
economics.mit.edu/sites/defaul...
From personal experience I think the self-control problem mentioned in the first essay is very real.
SCORE, a collaboration of 865 researchers, is now released as three papers in Nature, six preprints, and a lot of data (cos.io/score/). SCORE examined repeatability of findings from the social-behavioral sciences and tested whether human and automated methods could predict replicability.
Also finding it hard to keep up with new research? I built something to fix this.
SciLove — swipe through recent papers in your field. The feed learns from your saves. Also matches you with researchers saving your work back (opt-out if you prefer).
www.scilove.app
3,000+ journals, updated daily
Going to SXSW to talk about human and AI forecasting!
If anyone wants to meet up, LMK.
Maybe someone will tell me this already existed before, but to be specific: I was looking for something that would *do* things on your computer for you. Not just voice to text. Voice to text won't help you find and open a file or do other things for you. To a small group, this could be everything.
Think about it: suppose you want to open a file on your computer but can't see where to click. This seems like it would solve your problem - just ask Claude to do it.
Tell folks you know. 2/2
I can't find a post about it to repost, but Claude Code is getting a "voice" mode.
They've presumably made it for coders, but I'd just been vibe coding something much worse to fulfill this kind of function, which could be a game-changer for almost blind people. 1/2
Since people are still applying to this, I'll leave it open but start to review applications (so early is better, but if you haven't applied yet you still can).
Amazing! This is huge news and so great for JHU.
I'm hiring pre-docs interested in applied microeconomics, especially with AI. Check out the link and apply!
Deadline for the first round of review is tomorrow, Feb. 24. evavivalt.com/2026/02/pre-...
Adding a link to the tool, so you can try it out yourself: earlyreview.ai
Anyway, feedback is always welcome, and get in touch if you have a particular use case. 7/7
An obvious paper someone could write would be to score a large number of plans and see if ambiguity / lack of clarity / missing information was associated with more apparent p-hacking. 6/
Where were plans most likely to fail?
Here is a summary: evavivalt.com/2026/02/a-no... 5/
At the same time, it is EASILY possible to be stricter. In an earlier version, many more registered reports failed. 4/
Some of the pre-analysis plans posted on the registry could have more information on their public webpage in required fields; this screening was only done on the pre-analysis plan attachments. 3/
This is in contrast to the randomly-selected registered reports I tested. Under the same version, almost all registered reports passed. It's a good sanity check: you expect registered reports to be more detailed. 2/
I tested out EarlyReview on a random set of pre-analysis plans posted to the AEA RCT Registry.
All received comments. Hardly any passed all the checks. 1/
I'm looking for a few pre-analysis plans / registered reports to highlight as examples on EarlyReview (with permission).
If interested, DM me. I can provide free credits!