Advertisement ยท 728 ร— 90

Posts by Mohit Iyyer

Post image Post image

Well this is sure to be a blockbuster AI article... @jennarussell.bsky.social et al are kicking ass and taking names in journalism, both individuals and organizations.

"AI use in American newspapers is widespread, uneven, and rarely disclosed"
arxiv.org/abs/2510.18774

5 months ago 22 8 3 0
Post image

AI is already at work in American newsrooms.

We examine 186k articles published this summer and find that ~9% are either fully or partially AI-generated, usually without readers having any idea.

Here's what we learned about how AI is influencing local and national journalism:

5 months ago 56 29 5 2

Tired of AI slop? Our work on "Frankentexts" shows how LLMs can stitch together random fragments of human writing into coherent, relevant responses to arbitrary prompts.

Frankentexts are weirdly creative, and they also pose problems for AI detectors: are they AI? human? More ๐Ÿ‘‡

10 months ago 16 3 0 0
Post image

Llama 4's massive context window is impressive! However, the best Llama model for long-context understanding over books is still Llama 3.1 405B. Llama 4 Scout is especially bad at our NoCha benchmark, performing below random chance.

1 year ago 25 6 0 1

Thinking about paying $20k/month for a "PhD-level AI agent"? You might want to wait until their web browsing skills are on par with those of human PhD students ๐Ÿ˜› Check out our new BEARCUBS benchmark, which shows web agents struggle to perform simple multimodal browsing tasks!

1 year ago 6 1 0 0

New synthetic benchmark for multilingual long-context LLMs! Surprisingly, English and Chinese are not the top-performing languages (it's Polish!). We also observe a widening gap between high and low-resource languages as context size increases. Check out the paper for more ๐Ÿ‘‡

1 year ago 4 1 0 0

How can we generate synthetic data for a task that requires global reasoning over a long context (e.g., verifying claims about a book)? LLMs aren't good at *solving* such tasks, let alone generating data for them. Check out our paper for a compression-based solution!

1 year ago 17 4 0 0
Advertisement

Lots of recent work focuses on ๐š๐ฎ๐ญ๐จ๐ฆ๐š๐ญ๐ข๐œ detection of LLM-generated text. But how well do ๐ก๐ฎ๐ฆ๐š๐ง๐ฌ fare? TLDR: ppl who frequently use ChatGPT for writing tasks are elite at spotting AI text! See our paper for more (and congrats to @jennarussell.bsky.social on her first paper!!)

1 year ago 4 0 0 0