Advertisement ยท 728 ร— 90

Posts by

Preview
CivicChats A platform for exploring, debating, and thinking through upcoming ballot measures on your local ballot or across the U.S.

Itโ€™s Election Day ๐Ÿ—ณ๏ธ UChicago's @chenhaotan.bsky.social and team built CivicChats to help voters engage more thoughtfully with the issues on their ballot.

Try the tech: civicchats.org

Read more: datascience.uchicago.edu/insights/cou...

1 month ago 1 1 0 0
Preview
AI-assisted Reviewing is Necessary and Should be Open Peer review is facing a death spiral. AI production tools are speeding it up. AI-assisted reviewing is necessary and should be open.

Open AI-assisted review tools could help researchers get better feedback, improve paper quality, and make the review process more scalable.

But final publication decisions should remain under human oversight, via @chenhaotan.bsky.social
openaireview.github.io/blog.html

1 month ago 2 1 0 0

The blog actually discussed different types of reviews. If AI reviewing helps authors produce better science, I do not see why one needs to be so hostile against AI. It actually helps authors slow down to produce better-quality articles.

1 month ago 0 0 1 1

It is open, can and will be improved! Feedback like these is highly appreciated.

The issue is here: github.com/ChicagoHAI/O...

1 month ago 3 0 0 0
OpenAIReview โ€” AI-Powered Academic Paper Reviewer

Try it, read the blog, or contribute:
๐ŸŒ openaireview.github.io
๐Ÿ“ openaireview.github.io/blog.html
๐Ÿ’ป github.com/ChicagoHAI/O...

1 month ago 1 0 0 0

We also don't have good evaluations for AI-generated reviews yet. We're working on it and welcome collaborators. Feedback welcome, especially from conference organizers and journal editors who want to think seriously about the future of peer review.

1 month ago 1 0 1 0

There are two types of reviewing. Reviewing for quality (improving the work) โ€” what Refine and OpenAIReview do โ€” is very different from gatekeeping (accept/reject), which is what Stanford Agentic Reviewer targets. We think automating gatekeeping requires much more care.

1 month ago 1 0 1 0
Advertisement

Our progressive approach finds issues at 87% of locations flagged by Refine, for the price of a coffee per paper. @joehsu.bsky.social added a Claude skill making it essentially free for Claude subscribers.

1 month ago 1 0 1 0

3/7 The only intervention that can stabilize the system is improving review precision, the ability to distinguish good papers from weak ones. AI production tools lower submission costs; only AI review tools can raise the signal. That asymmetry is why we built OpenAIReview.

1 month ago 0 0 1 0

2/7 The review death spiral: more submissions โ†’ overloaded reviewers โ†’ noisier reviews โ†’ more random acceptance โ†’ even more submissions. Bergstrom & Gross already warned about this. AI production tools make it worse by lowering submission costs and pushing the system toward collapse faster.

1 month ago 1 0 1 0
Preview
AI-assisted Reviewing is Necessary and Should be Open Peer review is facing a death spiral. AI production tools are speeding it up. AI-assisted reviewing is necessary and should be open.

Peer review is facing a death spiral, and AI production tools are speeding it up. AI-assisted reviewing is necessary and should be open. We built OpenAIReview: open AI reviewing for everyone, for the cost of a coffee.

openaireview.github.io/blog.html ๐Ÿงต

1 month ago 24 8 1 4

Local ballot measures are now on CivicChats! Local elections happen year-round, 10+ states have measures coming up in the next few months. Check your ballot and think through what you'll be voting on โ†’ civicchats.org

1 month ago 2 1 0 0
Preview
CivicChats - Building AI to support voting behavior CivicChats is a platform for exploring, debating, and thinking through upcoming ballot measures.

We have been developing automatic evaluation based on checklists. We are also planning to run a study at the same time. Learn more at the end of this blog: cichicago.substack.com/p/civicchats...

2 months ago 0 0 0 0

Check out our effort in thinking about how AI can help with democratic processes!

2 months ago 1 1 1 0

Anyone can help reviewing an ACL submission today on parameter efficient fine-tuning?

Sorry that it is very tight.

2 months ago 0 1 0 0
Advertisement
Post image

๐Ÿ“– โ‰  ๐Ÿงช The Story is Not the Science.
Code is submitted but rarely executed during peer reviewโ€”an issue likely to worsen with research agents. ๐Ÿง‘โ€๐Ÿ”ฌ
We introduce ๐Œ๐ž๐œ๐ก๐„๐ฏ๐š๐ฅ๐€๐ ๐ž๐ง๐ญ, an execution-grounded evaluation of narrative + execution. ๐•๐ž๐ซ๐ข๐Ÿ๐ฒ ๐ญ๐ก๐ž ๐ฌ๐œ๐ข๐ž๐ง๐œ๐ž, ๐ง๐จ๐ญ ๐ฃ๐ฎ๐ฌ๐ญ ๐ญ๐ก๐ž ๐ฌ๐ญ๐จ๐ซ๐ฒ.
1/n

2 months ago 8 4 2 0
Post image

Mark Yatskar will be speaking this Friday!

You can tune in either on

Zoom: uchicago.zoom.us/j/9897879984...
Youtube: www.youtube.com/@AIScientifi...

2 months ago 0 0 0 0
Post image

Hannes Stark will be speaking this Friday on BoltzGen!

You can tune in either on

Zoom: uchicago.zoom.us/j/9897879984...
Youtube: www.youtube.com/@AIScientifi...

2 months ago 0 0 0 0

Happening in three hours!

2 months ago 0 0 0 0

Microsoft Research NYC is hiringย a researcher in the space of AI and society!

2 months ago 62 40 2 2
Post image

@profbuehlermit.bsky.social from MIT will be speaking this Friday!

You can tune in either on

Zoom: uchicago.zoom.us/j/9897879984...
Youtube: www.youtube.com/@AIScientifi...

2 months ago 5 1 0 1

Happening in two hours!

2 months ago 1 0 0 0
Post image

Peter Clark from @ai2.bsky.social will be speaking on Friday!

You can tune in either on

Zoom: uchicago.zoom.us/j/9897879984...
Youtube: www.youtube.com/@AIScientifi...

3 months ago 3 0 0 1
Advertisement
We study how radiologists use AI to diagnose pulmonary embolism (PE), tracking over 100,000
scans interpreted by nearly 400 radiologists during the staggered rollout of an FDA-approved
diagnostic platform. When AI flags PE, radiologists agree 84% of the time; when AI predicts no PE,
they agree 97%. Disagreement evolves substantially: radiologists initially reject AI-positive PEs in
30% of cases, dropping to 12% by year two. Despite a 16% increase in scan volume, diagnostic speed
remains stable while per-radiologist monthly volumes nearly double, with no change in patient
mortalityโ€”suggesting AI improves workflow without compromising outcomes. We document
significant heterogeneity in AI collaboration: some radiologists reject AI-flagged PEs half the time
while others accept nearly always; female radiologists are 6 percentage points less likely to override AI
than male radiologists. Moderate AI engagement is associated with the highest agreement, whereas
both low and high engagement show more disagreement. Follow-up imaging reveals that when
radiologists override AI to diagnose PE, 54% of subsequent scans show both agreeing on no PE
within 30 days.

We study how radiologists use AI to diagnose pulmonary embolism (PE), tracking over 100,000 scans interpreted by nearly 400 radiologists during the staggered rollout of an FDA-approved diagnostic platform. When AI flags PE, radiologists agree 84% of the time; when AI predicts no PE, they agree 97%. Disagreement evolves substantially: radiologists initially reject AI-positive PEs in 30% of cases, dropping to 12% by year two. Despite a 16% increase in scan volume, diagnostic speed remains stable while per-radiologist monthly volumes nearly double, with no change in patient mortalityโ€”suggesting AI improves workflow without compromising outcomes. We document significant heterogeneity in AI collaboration: some radiologists reject AI-flagged PEs half the time while others accept nearly always; female radiologists are 6 percentage points less likely to override AI than male radiologists. Moderate AI engagement is associated with the highest agreement, whereas both low and high engagement show more disagreement. Follow-up imaging reveals that when radiologists override AI to diagnose PE, 54% of subsequent scans show both agreeing on no PE within 30 days.

Posted a very early stage draft with rock star collaborators.

Key question: when we actually roll out AI tools, how do people use them? Do they just defer completely? Does it improve productivity and ability?

We look in the medical setting of pulmonary embolisms
paulgp.com/papers/Radio...

3 months ago 89 18 4 2

I've often joked that as faculty I program in a high-level language called "graduate student". Having tried out Claude Code this morning, I (i) feel extremely at home, (ii) am realizing that research-by-graduate-student is perhaps the original vibe-coding. 1/2

3 months ago 87 11 7 3

I've seen this message and similar echos for other writing, and I want strongly push back on this narrative. It's not that you shouldn't use ChatGPT but that you shouldn't *use ChatGPT to write it for you*. ChatGPTโ€”and AI in generalโ€”is not a monolith. How you use it matters.

3 months ago 8 1 1 0

Very much enjoyed this talk by @yisongyue.bsky.social ! The measurement challenge deserves a lot more attention from the AI community!

3 months ago 2 0 0 0

Happening in two hours!

3 months ago 2 0 0 1
Title + abstract of the preprint

Title + abstract of the preprint

Excited to present a new preprint with @nkgarg.bsky.social: presenting usage statistics and observational findings from Paper Skygest in the first six months of deployment! ๐ŸŽ‰๐Ÿ“œ

arxiv.org/abs/2601.04253

3 months ago 169 50 4 5