Delighted to share "Facts Fade Fast: Evaluating Memorization of Outdated Medical Knowledge in LLMs", accepted to Findings of #EMNLP2025 !πΌ
Wit a novel dataset of changed medical knowledge, we discover the alarming presence of obsolete advice in eight popular LLMs.β
π: arxiv.org/abs/2509.04304 #NLP
Posts by Juraj Vladika
line diagram showing the RAG performance of different base LLM models
Also happy to share that βOn the Influence of Context Size and Model Choice in RAG Systemsβ was accepted to Findings of #NAACL2025! πΊπΈποΈ
We test how the RAG performance on QA tasks changes (and plateaus) with increasing context size across different LLMs and retrievers.
π arxiv.org/abs/2502.14759
architecture of the step-by-step fact verification system
Thrilled to share that "Step-by-Step Fact Verification for Medical Claims with Explainable Reasoning" was accepted to #NAACL2025! πΊπΈποΈ
This system iteratively collects new knowledge via generated Q&A pairs, making the verification process more robust and explainable.
π arxiv.org/abs/2502.14765 #NLP
More than 8500 submissions to ACL 2025 (ARR February 2025 cycle)! That is an increase of 3000 submissions compared to ACL 2024. It will be a fun reviewing period. π
π―
@aclmeeting.bsky.social #ACL2025 #ACL2025nlp #NLP
Most exciting update to encoder-only models in a long time! Love to use them for classification tasks where LLMs are an overkill #ModernBERT
Organizing hackaTUM 2024 was an incredible experience!
Around 1000 participants with 3 days full of intense coding, new experiences, exciting sponsor challenges and workshops, fun side activities, tasty food, creative final solutions, and overall awesome fun! π
Join us next year ππ§βπ»π hack.tum.de
The OLMo 2 models sit at the Pareto frontier of training FLOPs vs model average performance.
Meet OLMo 2, the best fully open language model to date, including a family of 7B and 13B models trained up to 5T tokens. OLMo 2 outperforms other fully open models and competes with open-weight models like Llama 3.1 8B β As always, we released our data, code, recipes and more π
I have started taking screenshots of interesting posts instead, but that gets hard to track after a while. π₯²
Thank you for the list! I would appreciate being added. π
Using that "other" NLP is fun for trying to convince your reviewers to increase the scores :))
Congratulations! π I will definitely read it