P. Razavi (@p-razavi) Bsky

1908: the Lancet, one of the most respected scientific journals, calls for 18 age limit on reading in bed amidst a moral panic surrounding children becoming "addicted" to novels, which were "designed to keep kids hooked" and destroy their attention/mental health

1 month ago 2417 866 3 150

🥳 🙌 🥳 🙌 🥳 🙌

1 month ago 0 0 0 0

How do resource fears (realistic threat) vs. value clashes (symbolic threat) drive war & peace?

We built a virtual society of 25 autonomous agents using the Park et al. (2023) framework to find out, using a "minimal groups" paradigm (Group A vs. B).

But first, we looked under the hood. 🧠

3 months ago 6 7 1 0

🚨 New working paper!

How well do people predict the results of studies?

@sdellavi.bsky.social and I leverage data from the first 100 studies to have been posted on the SSPP, containing 1,482 key questions, on which over 50,000 forecasts were placed. Some surprising results below.... 🧵👇

4 months ago 95 42 2 2

Information-Guided Identification of Training Data Imprint in (Proprietary) Large Language Models High-quality training data has proven crucial for developing performant large language models (LLMs). However, commercial LLM providers disclose few, if any, details about the data used for training. ...

Want to know what training data has been memorized by models like GPT-4?

We propose information-guided probes, a method to uncover memorization evidence in *completely black-box* models,

without requiring access to
🙅‍♀️ Model weights
🙅‍♀️ Training data
🙅‍♀️ Token probabilities 🧵 (1/5)

1 year ago 97 27 4 8

I’ve been referring people (esp social science/psych PhD students) to this blog post for years. The headline and opening paragraph are all you really need.

11 months ago 66 16 4 1

Had a great time presenting our work on LLM-based item difficulty estimation at #NCME .
If you’re in Denver and would like to discuss measurement research or just catchup in the next couple of days, let me know 😊

11 months ago 2 0 0 0

Fantastic, thoughtful work! 👏👏

1 year ago 0 0 0 0

Estimating Item Difficulty Using Large Language Models and Tree-Based Machine Learning Algorithms Estimating item difficulty through field-testing is often resource-intensive and time-consuming. As such, there is strong motivation to develop methods that can predict item difficulty at scale using ...

If you're interested in learning more and plan to attend the #NCME conference in Denver next week, we’d love to see you at our coordinated paper session, “Approaches to Optimizing a Personalized Learning System,” on Friday, April 25, from 11:30 AM to 1:00 PM. (🧵9/9)
arxiv.org/abs/2504.08804

1 year ago 0 0 0 0

We are excited about the potential of these methods to support more efficient item development in education. In the preprint, we provide a seven-step workflow for testing professionals who would want to implement a similar item difficulty estimation approach with their item pool. (🧵8/9)

1 year ago 0 0 1 0

The feature-based approach presumably benefits from the language model’s extraction of multiple cognitive and linguistic dimensions that an ensemble tree-based algorithm then “learns” to weight in ways that maximize prediction accuracy. (🧵7/9)

1 year ago 0 0 1 0

The modest performance of direct LLM estimates in some instances, and the more robust performance of feature-based methods, hints that LLMs can add value, but that this value is maximized when the model is “nudged” or structured via psychometric frameworks. (🧵6/9)

1 year ago 0 0 1 0

The results are promising, especially for the feature-based approach which performed considerably better than the dummy regressor benchmarks and the direct estimation approach. (🧵5/9)

1 year ago 0 0 1 0

In the second approach, we use the LLM to extract cognitive and linguistic features from each item. We then train tree-based machine learning models (i.e., random forest and gradient boosting machines) to estimate item difficulty based on the features. (🧵4/9)

1 year ago 0 0 1 0

In the first approach, we use a direct estimation method that prompted the LLM to assign a single difficulty rating to each item based on qualitatively informed criteria. (🧵3/9)

1 year ago 0 0 1 0

Field-testing assessment items to estimate difficulty can be both costly and time-consuming. In this research, we evaluate two LLM-based approaches to predict item difficulty for K-5 mathematics and reading assessments based on item content. (🧵2/9)

1 year ago 0 0 1 0

Estimating Item Difficulty Using Large Language Models and Tree-Based Machine Learning Algorithms Estimating item difficulty through field-testing is often resource-intensive and time-consuming. As such, there is strong motivation to develop methods that can predict item difficulty at scale using ...

I'm excited to share our latest work: "Estimating Item Difficulty Using Large Language Models and Tree-Based Machine Learning Algorithms." (🧵 1/9)
arxiv.org/abs/2504.08804

1 year ago 2 0 1 0

Wooden Shoe Tulip Festival
📍Portland, Oregon 🇺🇲

1 year ago 19316 1980 396 126

A tricky thing about modern society is that no one has any idea when they don’t die.

Like, the number of lives saved by controlling air pollution in America is probably over 200,000 per year, but the number of people who think their life was saved by controlling air pollution is zero.

1 year ago 62840 13000 1079 582

HPS in 20 objects This resource was produced by academics from the Centre for History and Philosophy of Science at the University of Leeds, where we have our Museum filled with artefacts that tell a stories about the H

Did you know: our researchers have developed a suite of resources for A-Level students and teachers? "History & Philosophy of Science in 20 Objects" draws on an incredible array of items from our own collection ft. prompts, questions, videos and more! sway.cloud.microsoft/cEekCFBF5CGF... #histsci

1 year ago 30 12 1 2

@mohammadatari.bsky.social @mdehghani.bsky.social can you help?

1 year ago 2 0 1 0

Congrats 👏🏽🎉. Very well-deserved! 😊

1 year ago 1 0 1 0

rough (like uff in buff)
cough (like off in scoff)
drought (like ow in cow)
though (like o in no)
thought (like aw in saw)
through (like oo in woo)

Enough.

1 year ago 1727 258 99 38

Hello to all my friends at SPSP seeing this message in a hallway or lobby as you hope you are staring at your phone with enough noticeable intensity to avoid having to interact with anyone

1 year ago 46 4 1 0

1/3

Tutorial on exploring ecological momentary assessment data is online at AMPPS, with:
- Accessible ways to visualize data for better understanding
- Models to get some first insights
- Further reading boxes for more advanced topics
- Reproducible pipeline you can run over your own data

1 year ago 155 77 6 7

Some of us have been meeting up at SPSP for the last few years. This year marks our fifth gathering. Email one of us if you want to join! Location TBD.

@mdehghani.bsky.social @drsanaz.bsky.social @simine.com @dorsaamir.bsky.social

1 year ago 9 3 0 0

XY problem - Wikipedia

My husband just inadvertently inspired one of the simplest, most relatable XY problem¹ demos I've seen.

He asked if I could buy unscented TP 🧻 next time I grocery shop.

Knowing he had been getting a cold, I probed: when does the scent become a problem?

[1/3]

¹ en.m.wikipedia.org/wiki/XY_prob...

1 year ago 169 14 59 3

Posts by P. Razavi