HUJI NLP (@nlphuji) Bsky

That’s a wrap on our first Huji NLP Hackathon!
Congrats to the winning team!
@noy-sternlicht.bsky.social @nirmazor.bsky.social

They explored gender bias in AI-generated movie scripts using the Bechdel Test — and yep, you can guess the results...

11 months ago 6 3 0 0

Care about LLM evaluation? 🤖 🤔

We bring you ️️🕊️ DOVE a massive (250M!) collection of LLMs outputs
On different prompts, domains, tokens, models...

Join our community effort to expand it with YOUR model predictions & become a co-author!

1 year ago 11 3 1 2

Can RAG performance get * worse * with more relevant documents?📄
We put the number of retrieved documents in RAG to the test!
💥Preprint💥: arxiv.org/abs/2503.04388
1/3

1 year ago 3 3 2 0

There's a lot of talk about regulating AI, but do regulators know the technology well enough?
In our new paper, we survey major reg efforts & find they rely on benchmarking, which we know to be problematic. How did this happen & what can we do about it?
arxiv.org/pdf/2501.15693

1 year ago 0 2 1 2

- “I heard there’s a new paper about Theory of Mind in LLMs!”
- “I know! There’s like hundreds of them!”
…
Could someone be driving in the wrong direction?

Check out our new opinion paper. w/ @nitalon.bsky.social , @joebarnby.bsky.social and Omri Abend.

1 year ago 8 2 0 0

JuStRank: Benchmarking LLM Judges for System Ranking Given the rapid progress of generative AI, there is a pressing need to systematically compare and choose between the numerous models and configurations available. The scale and versatility of such eva...

New preprint! ✨
Interested in LLM-as-a-Judge?
Want to get the best judge for ranking your system?
our new work is just for you:
"JuStRank: Benchmarking LLM Judges for System Ranking"
🕺💃
arxiv.org/abs/2412.09569

1 year ago 9 5 1 1

1/n First time in the sky ✈️

I had a great time presenting my work at @emnlpmeeting.bsky.social ’s Workshop on Narrative Understanding and reconnecting with friends and colleagues in Miami! 🌴

How do religious trajectories evolve in Holocaust testimony narratives?

1 year ago 6 2 1 1

Posts by HUJI NLP