That’s a wrap on our first Huji NLP Hackathon!
Congrats to the winning team!
@noy-sternlicht.bsky.social @nirmazor.bsky.social
They explored gender bias in AI-generated movie scripts using the Bechdel Test — and yep, you can guess the results...
Posts by HUJI NLP
Care about LLM evaluation? 🤖 🤔
We bring you ️️🕊️ DOVE a massive (250M!) collection of LLMs outputs
On different prompts, domains, tokens, models...
Join our community effort to expand it with YOUR model predictions & become a co-author!
Can RAG performance get * worse * with more relevant documents?📄
We put the number of retrieved documents in RAG to the test!
💥Preprint💥: arxiv.org/abs/2503.04388
1/3
There's a lot of talk about regulating AI, but do regulators know the technology well enough?
In our new paper, we survey major reg efforts & find they rely on benchmarking, which we know to be problematic. How did this happen & what can we do about it?
arxiv.org/pdf/2501.15693
- “I heard there’s a new paper about Theory of Mind in LLMs!”
- “I know! There’s like hundreds of them!”
…
Could someone be driving in the wrong direction?
Check out our new opinion paper. w/ @nitalon.bsky.social , @joebarnby.bsky.social and Omri Abend.
New preprint! ✨
Interested in LLM-as-a-Judge?
Want to get the best judge for ranking your system?
our new work is just for you:
"JuStRank: Benchmarking LLM Judges for System Ranking"
🕺💃
arxiv.org/abs/2412.09569
1/n First time in the sky ✈️
I had a great time presenting my work at @emnlpmeeting.bsky.social ’s Workshop on Narrative Understanding and reconnecting with friends and colleagues in Miami! 🌴
How do religious trajectories evolve in Holocaust testimony narratives?