Sara Rosenthal (@seirasto) Bsky

RAGAPHENE: A RAG Annotation Platform with Human Enhancements and Edits Retrieval Augmented Generation (RAG) is an important aspect of conversing with Large Language Models (LLMs) when factually correct information is important. LLMs may provide answers that appear correc...

📣📣Presenting our platform used to build MTRAG!!

RAGAPHENE: A RAG Annotation Platform with Human ENhancements and Edits

Arxiv: arxiv.org/abs/2508.19272
MTRAG GitHub: github.com/IBM/mt-rag-b...
Join our MTRAGEval Task: ibm.github.io/mt-rag-bench...

7 months ago 0 0 0 0

MTRAG: A Multi-Turn Conversational Benchmark for Evaluating Retrieval-Augmented Generation Systems Retrieval-augmented generation (RAG) has recently become a very popular task for Large Language Models (LLMs). Evaluating them on multi-turn RAG conversations, where the system is asked to generate a ...

🚀Excited to announce our MTRAGEval task at SemEval 2026!

Arxiv: arxiv.org/abs/2501.03468
Github: github.com/IBM/mt-rag-b... (please 🌟!)
MTRAGEval: ibm.github.io/mt-rag-bench...

8 months ago 0 0 0 0

InspectorRAGet: An Introspection Platform for RAG Evaluation Large Language Models (LLM) have become a popular approach for implementing Retrieval Augmented Generation (RAG) systems, and a significant amount of effort has been spent on building good models and ...

Working on RAG? Come check out our InspectorRAGet DEMO presented by Siva Sankalp Patel May 2 (Friday), 11-12:30 at Demo Session 8 in Hall 3! Looking forward to attending ACL in a few months! #NAACL2025 @naaclmeeting.bsky.social

paper: arxiv.org/abs/2404.17347
github: github.com/IBM/Inspecto...

11 months ago 2 0 0 0

Excited about this collab! Come check out FeeL and help advance multilingual generation in your language! huggingface.co/spaces/feel-...

1 year ago 2 1 0 0

How well can your RAG agent carry out a conversation? IBM’s new benchmark evaluates LLMs on interactive question-answering tasks using

🌟Want to know more about our MTRAG benchmark? Check out the IBM blog highlighting our work! research.ibm.com/blog/convers...

1 year ago 3 0 0 0

Retrievers (Elser shown here) struggle with later turns and non-standalone questions:

1 year ago 0 0 0 0

SOTA LLMs struggle with later turns and unanswerable questions:

1 year ago 0 0 1 0

Sample Conversation:

1 year ago 0 0 1 0

MTRAG is a challenging benchmark for SOTA LLMs and a great way to evaluate across multiple domains for Retrieval and Generation! MTRAG contains 110 conversations averaging 7.7 turns each across four domains for a total of 842 tasks. We also explore synthetic data and LLM-as-a-judge.

1 year ago 0 0 1 0

GitHub - IBM/mt-rag-benchmark: Multi-Turn RAG Benchmark Multi-Turn RAG Benchmark. Contribute to IBM/mt-rag-benchmark development by creating an account on GitHub.

🌟 New Benchmark! 🌟

Do you work on RAG? Are you interested in Multi-Turn conversations? Very excited to share the new MTRAG benchmark we have released!

Data: github.com/ibm/mt-rag-b...
Paper: arxiv.org/abs/2501.03468

1 year ago 6 4 1 0

Anyone else feel like Google scholar is missing citations lately? I have a recent paper that has 8 citations on semantic scholar and only 3 on Google scholar…. and I have two papers that are cited in one paper but only one has the citation 🤔

1 year ago 3 0 0 0

Please just message me on slack

1 year ago 1 0 0 0

Please add me. Thanks!

1 year ago 1 0 0 0

I did a starter pack of people in New York (City) working on ML/AI. Please distribute and feel free to self nominate!

go.bsky.app/BoEtagz

1 year ago 87 19 42 8

GitHub - IBM/InspectorRAGet: The repository contains generative AI analytics platform application code. The repository contains generative AI analytics platform application code. - IBM/InspectorRAGet

If you work on RAG check out InspectorRAGet - an awesome RAG tool for evaluation. Available on HuggingFace! We provide the interface, you provide the experiments and metrics. Want to know more? Just reach out!
github.com/IBM/Inspecto...
huggingface.co/spaces/kpfad...
arxiv.org/abs/2404.17347

1 year ago 5 0 0 0

Starter pack for IBM Research! Follow awesome IBM researchers! IBMers, let me know and I will add you! go.bsky.app/2SXcRmA

1 year ago 21 6 3 1

GitHub - primeqa/clapnq Contribute to primeqa/clapnq development by creating an account on GitHub.

Working on RAG? Check out our ClapNQ benchmark (accepted to TACL) to test the full RAG pipeline!

github.com/primeqa/clapnq
arxiv.org/abs/2404.02103

1 year ago 12 2 1 0

Please add me!

1 year ago 1 0 0 0

This is great! Please add me as well!

1 year ago 1 0 0 0

Posts by Sara Rosenthal