Juraj Vladika (@jvladika) Bsky

Delighted to share "Facts Fade Fast: Evaluating Memorization of Outdated Medical Knowledge in LLMs", accepted to Findings of #EMNLP2025 !🐼

Wit a novel dataset of changed medical knowledge, we discover the alarming presence of obsolete advice in eight popular LLMs.⌛

📝: arxiv.org/abs/2509.04304 #NLP

7 months ago 9 0 0 0

line diagram showing the RAG performance of different base LLM models

Also happy to share that “On the Influence of Context Size and Model Choice in RAG Systems” was accepted to Findings of #NAACL2025! 🇺🇸🏜️

We test how the RAG performance on QA tasks changes (and plateaus) with increasing context size across different LLMs and retrievers.
📝 arxiv.org/abs/2502.14759

1 year ago 8 0 0 0

architecture of the step-by-step fact verification system

Thrilled to share that "Step-by-Step Fact Verification for Medical Claims with Explainable Reasoning" was accepted to #NAACL2025! 🇺🇸🏜️

This system iteratively collects new knowledge via generated Q&A pairs, making the verification process more robust and explainable.
📜 arxiv.org/abs/2502.14765 #NLP

1 year ago 6 0 0 0

More than 8500 submissions to ACL 2025 (ARR February 2025 cycle)! That is an increase of 3000 submissions compared to ACL 2024. It will be a fun reviewing period. 😅💯
@aclmeeting.bsky.social #ACL2025 #ACL2025nlp #NLP

1 year ago 20 5 1 4

Most exciting update to encoder-only models in a long time! Love to use them for classification tasks where LLMs are an overkill #ModernBERT

1 year ago 4 0 1 0

Organizing hackaTUM 2024 was an incredible experience!

Around 1000 participants with 3 days full of intense coding, new experiences, exciting sponsor challenges and workshops, fun side activities, tasty food, creative final solutions, and overall awesome fun! 😊

Join us next year 💙🧑‍💻🔜 hack.tum.de

1 year ago 2 0 0 0

The OLMo 2 models sit at the Pareto frontier of training FLOPs vs model average performance.

Meet OLMo 2, the best fully open language model to date, including a family of 7B and 13B models trained up to 5T tokens. OLMo 2 outperforms other fully open models and competes with open-weight models like Llama 3.1 8B — As always, we released our data, code, recipes and more 🎁

1 year ago 151 36 5 12

I have started taking screenshots of interesting posts instead, but that gets hard to track after a while. 🥲

1 year ago 1 0 0 0

Thank you for the list! I would appreciate being added. 😊

1 year ago 1 0 0 0

Using that "other" NLP is fun for trying to convince your reviewers to increase the scores :))

1 year ago 0 0 0 0

Congratulations! 👏 I will definitely read it

1 year ago 1 0 1 0

Posts by Juraj Vladika