📣📣Presenting our platform used to build MTRAG!!
RAGAPHENE: A RAG Annotation Platform with Human ENhancements and Edits
Arxiv: arxiv.org/abs/2508.19272
MTRAG GitHub: github.com/IBM/mt-rag-b...
Join our MTRAGEval Task: ibm.github.io/mt-rag-bench...
Posts by Sara Rosenthal
🚀Excited to announce our MTRAGEval task at SemEval 2026!
Arxiv: arxiv.org/abs/2501.03468
Github: github.com/IBM/mt-rag-b... (please 🌟!)
MTRAGEval: ibm.github.io/mt-rag-bench...
Working on RAG? Come check out our InspectorRAGet DEMO presented by Siva Sankalp Patel May 2 (Friday), 11-12:30 at Demo Session 8 in Hall 3! Looking forward to attending ACL in a few months! #NAACL2025 @naaclmeeting.bsky.social
paper: arxiv.org/abs/2404.17347
github: github.com/IBM/Inspecto...
Excited about this collab! Come check out FeeL and help advance multilingual generation in your language! huggingface.co/spaces/feel-...
🌟Want to know more about our MTRAG benchmark? Check out the IBM blog highlighting our work! research.ibm.com/blog/convers...
Retrievers (Elser shown here) struggle with later turns and non-standalone questions:
SOTA LLMs struggle with later turns and unanswerable questions:
Sample Conversation:
MTRAG is a challenging benchmark for SOTA LLMs and a great way to evaluate across multiple domains for Retrieval and Generation! MTRAG contains 110 conversations averaging 7.7 turns each across four domains for a total of 842 tasks. We also explore synthetic data and LLM-as-a-judge.
🌟 New Benchmark! 🌟
Do you work on RAG? Are you interested in Multi-Turn conversations? Very excited to share the new MTRAG benchmark we have released!
Data: github.com/ibm/mt-rag-b...
Paper: arxiv.org/abs/2501.03468
Anyone else feel like Google scholar is missing citations lately? I have a recent paper that has 8 citations on semantic scholar and only 3 on Google scholar…. and I have two papers that are cited in one paper but only one has the citation 🤔
Please just message me on slack
Please add me. Thanks!
I did a starter pack of people in New York (City) working on ML/AI. Please distribute and feel free to self nominate!
go.bsky.app/BoEtagz
If you work on RAG check out InspectorRAGet - an awesome RAG tool for evaluation. Available on HuggingFace! We provide the interface, you provide the experiments and metrics. Want to know more? Just reach out!
github.com/IBM/Inspecto...
huggingface.co/spaces/kpfad...
arxiv.org/abs/2404.17347
Starter pack for IBM Research! Follow awesome IBM researchers! IBMers, let me know and I will add you! go.bsky.app/2SXcRmA
Working on RAG? Check out our ClapNQ benchmark (accepted to TACL) to test the full RAG pipeline!
github.com/primeqa/clapnq
arxiv.org/abs/2404.02103
Please add me!
This is great! Please add me as well!