UKP Lab (@ukplab) Bsky

REVAS — AI-Powered Peer Review Feedback for Academics REVAS analyzes the weakness section of your peer review, scoring each paragraph on actionability, helpfulness, grounding, and verifiability.

🗓️ The ARR March review deadline is approaching: April 20 AoE.
Finishing up your review? Run it through REVAS, a peer review assistant that makes your suggestions more actionable, flags unsupported claims, and grounds your feedback in the paper.
👉 revas.mbzuai.ac.ae

4 days ago 3 4 0 1

See you at #EACL2026 in Rabat 🕌!

#UKPLab #NLProc #ResponsibleAI #Quantization #MLSafety #Fairness #TrustworthyAI #ModelCompression #LLMSafety #EthicalAI #NLP #AIResearch @cs-tudarmstadt.bsky.social @proloewe.bsky.social

3 weeks ago 3 0 0 0

Federico Marcuzzi (INSAIT - Institute for Computer Science, Artificial Intelligence and Technology), Xuefei Ning (@tsinghuauniversity.bsky.social), @royschwartznlp.bsky.social (@hebrewuniversity.bsky.social), and @igurevych.bsky.social (UKP Lab, @tuda.bsky.social and @athenecenter.bsky.social).

3 weeks ago 3 0 2 0

How Quantization Shapes Bias in Large Language Models This work presents a comprehensive evaluation of how quantization affects model bias, with particular attention to its impact on individual demographic subgroups. We focus on weight and activation qua...

📄 Paper: arxiv.org/abs/2508.18088

💻 Code and data: github.com/insait-insti...

🔗 Project: insait-institute.github.io/quantization...

3 weeks ago 1 0 1 0

🎯 𝗪𝗵𝘆 𝘁𝗵𝗶𝘀 𝗺𝗮𝘁𝘁𝗲𝗿𝘀
Quantization isn’t simply “safer” or “riskier”. It changes what harm looks like.

If your system impacts vulnerable groups, you need to test these trade-offs before deployment.

3 weeks ago 0 0 1 0

[...]
• 💚 Toxicity and sentiment: quantized models often produce less toxic text and more neutral sentiment.
• 🔄 Consistency across models: these shifts are surprisingly consistent across architectures and reasoning ability.

3 weeks ago 0 0 1 0

🚀 𝗪𝗵𝗮𝘁 𝘄𝗲 𝗱𝗼
We evaluate social bias under quantization at scale: weight-only vs. weight+activation, across architectures, demographic groups, and reasoning tasks.

🔎 𝗞𝗲𝘆 𝗶𝗻𝘀𝗶𝗴𝗵𝘁𝘀
• ⚖️ Stereotypes and fairness: quantization can increase stereotype alignment and worsen fairness outcomes.
[...]

3 weeks ago 0 0 1 0

Diagram illustrating a social bias evaluation pipeline for large language models. On the left, an LLM is transformed via a quantizer into a 4-bit quantized LLM. Both models are then evaluated within a “Social Bias Evaluation Framework” that includes categories such as stereotype (SS, RB, WB, BBQ), toxicity (BLD, DTT), sentiment (BLD), and fairness (DE, DEG, DTF). Arrows lead to bar charts on the right labeled “Social Bias Level,” comparing bias levels across models.

🔍 𝗨𝘀𝗶𝗻𝗴 𝗾𝘂𝗮𝗻𝘁𝗶𝘇𝗲𝗱 𝗟𝗟𝗠𝘀 𝘁𝗼 𝗯𝗼𝗼𝘀𝘁 𝗲𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝗰𝘆?
𝗪𝗮𝘁𝗰𝗵 𝘁𝗵𝗲 𝘀𝗼𝗰𝗶𝗮𝗹 𝘁𝗿𝗮𝗱𝗲-𝗼𝗳𝗳𝘀, 𝘁𝗼𝗼. ⚠️
Quantization makes LLMs cheaper and more deployable.
But what does it do to fairness, toxicity, and bias?

💡 The research gap: a systematic picture of how quantization affects social behavior across models and tasks.

3 weeks ago 1 0 1 0

#NLP #NLProc #UKPLab #LLMs #AIResearch #AcademicLife @tuda.bsky.social @cs-tudarmstadt.bsky.social @athenecenter.bsky.social

3 weeks ago 1 0 0 0

Group photo of eight conference participants standing in front of an EACL 2026 backdrop in Rabat, Morocco. They wear name badges and casual to semi-formal attire, posing indoors against a banner that reads “EACL 2026 Rabat, Morocco, March 24–29, 2026” with sponsor logos and decorative patterns.

#EACL2026 in Rabat is in full swing ✨
From papers to hallway discussions - it’s great to see our team actively contributing to this year’s #EACL conference.

Say hello to this group if you’re on site 👇

3 weeks ago 5 0 1 0

💬 We thank Prof. Kementchedjhieva for the insightful talk and the discussion with UKP members on multimodal modeling and the future of vision-language systems.

#UKPLab #MultimodalAI #VisionLanguageModels #NLP #GuestTalk #NLProc #MBZUAI @tuda.bsky.social @cs-tudarmstadt.bsky.social

3 weeks ago 0 1 0 0

🔍 Prof. Kementchedjhieva also discussed alternative approaches to improve vision-to-language alignment while maintaining strong language capabilities.

3 weeks ago 0 0 1 0

⚙️ The talk examined both the benefits and trade-offs of this design choice, including improved multimodal integration but also increased reliance on the language model backbone and potential language-only forgetting.

3 weeks ago 1 0 2 0

🖼️💬 A central focus was the shift from cross-attention mechanisms to unified self-attention over image and text tokens, allowing visual and linguistic representations to co-evolve across model layers.

3 weeks ago 0 0 1 0

🧠 In her talk “𝘞𝘩𝘦𝘯 𝘝𝘪𝘴𝘪𝘰𝘯 𝘔𝘦𝘦𝘵𝘴 𝘓𝘢𝘯𝘨𝘶𝘢𝘨𝘦: 𝘈𝘭𝘪𝘨𝘯𝘮𝘦𝘯𝘵 𝘢𝘯𝘥 𝘐𝘯𝘵𝘦𝘳𝘧𝘦𝘳𝘦𝘯𝘤𝘦 𝘪𝘯 𝘝𝘓𝘔𝘴”, she explored architectural developments in modern vision-language models (VLMs) and their implications for multimodal learning.

3 weeks ago 1 0 1 0

Promotional graphic for a guest talk featuring Prof. Yova Kementchedjhieva. A portrait of a smiling woman with curly hair, arms crossed, is framed in a red circle against a background of keyboard keys spelling “USE.” Text reads “Prof. Yova Kementchedjhieva” and “MBZUAI Guest Talk.” Logos of the Ubiquitous Knowledge Processing Lab and Technische Universität Darmstadt appear, along with social media icons and the handle @UKPLab.

🎓 𝗚𝘂𝗲𝘀𝘁 𝗧𝗮𝗹𝗸 𝗮𝘁 𝘁𝗵𝗲 𝗨𝗞𝗣 𝗟𝗮𝗯
👋 We were pleased to welcome Yova Kementchedjhieva, Assistant Professor at the MBZUAI (Mohamed bin Zayed University of Artificial Intelligence), for a guest talk at the UKP Lab in Darmstadt.

3 weeks ago 3 0 1 0

Follow Mihai’s journey:
LinkedIn: www.linkedin.com/in/mihai-edu...
Bluesky: bsky.app/profile/miha...

#AIforMentalHealth #JugendForscht #AIInnovation #MentalHealthTech #NextGenAI #AIforGood #YoungTalent #UKPLab @cs-tudarmstadt.bsky.social

4 weeks ago 1 0 0 0

🏆 We’re proud to share that his project won 𝟭𝘀𝘁 𝗣𝗿𝗶𝘇𝗲 𝗮𝘁 𝘁𝗵𝗲 𝗥𝗲𝗴𝗶𝗼𝗻𝗮𝗹 𝗣𝗵𝗮𝘀𝗲, qualifying him for the national phase!

Congratulations, Mihai, on this well-deserved success and on addressing such an important and timely challenge with empathy and innovation. 👏

4 weeks ago 1 0 1 0

For this year’s “Jugend forscht” competition, Mihai was mentored by Dr.-Ing. Hiba Arnaout, from the UKP Lab/Technische Universität Darmstadt, as he further developed and refined his project for the competition.

4 weeks ago 1 0 1 0

Young student in a suit stands at a science fair booth, holding a small device with a face-like display and grass on top. Behind him, posters labeled “Regionalsieger” present diagrams and results of the project. On the table in front are hardware components, printed schematics, and a laptop displaying the device interface. Other participants and exhibition stands are visible in the background.

🚀 From school project to award-winning AI for mental health

Meet Mihai Ghețu (@mihai-eduard-ghetu.bsky.social) - a remarkably talented high school student and the creator of Mental Mint: An Empathic AI-Powered Psychosocial Companion Robot for Victims of Emotional Shocks and Bullying.

4 weeks ago 1 0 1 1

simplified overview of our aligned probing setup, where we join the behavioral and internal evaluation of LMs' toxicity

LMs that "know more" about toxicity are less toxic!
Our #TACL 📄 connects behavior and internals:
💠 LMs amplify toxicity beyond humans
💠 Information about toxicity peaks in lower layers
💠 Bypassing these layers increases toxicity
More details👇 #NLProc #interpretability (1/🧵)

2 months ago 16 7 1 2

Work by @nilsdy.bsky.social & @igurevych.bsky.social (@ukplab.bsky.social, @tuda.bsky.social and @athenecenter.bsky.social)

See you at #EACL2026 in Rabat 🕌!

#UKPLab #LLMs #PeerReview #AIforScience #TrustworthyAI #NLP #Evaluation @cs-tudarmstadt.bsky.social

4 weeks ago 2 0 0 0

Automatic Reviewers Fail to Detect Faulty Reasoning in Research Papers: A New Counterfactual Evaluation Framework Large Language Models (LLMs) have great potential to accelerate and support scholarly peer review and are increasingly used as fully automatic review generators (ARGs). However, potential biases and s...

🔗 Project: ukplab.github.io/tacl2026-cou...
📄 Paper: arxiv.org/abs/2508.21422
👨‍💻 Code: github.com/UKPLab/arxiv...

4 weeks ago 0 0 1 0

𝗪𝗵𝗮𝘁 𝗵𝗲𝗹𝗽𝘀:
✅ Human–LLM collaboration shows the strongest potential
✅ Repeated evaluation of review-specific skills is essential
✅ Controlled benchmarks are needed to assess reasoning, not just fluency

4 weeks ago 0 0 1 0

𝗪𝗵𝗮𝘁 𝘄𝗲 𝗳𝗶𝗻𝗱
📊 They rely heavily on surface-level signals
⚠️ They often miss mismatches between claims and actual results

𝗪𝗵𝘆 𝗶𝘁 𝗺𝗮𝘁𝘁𝗲𝗿𝘀
As LLMs are increasingly integrated into peer review workflows at major AI conferences, these limitations directly affect research quality and evaluation fairness.

4 weeks ago 0 0 1 0

Schematic diagram of analyzing scientific papers. On the left, a document contains color-coded sections representing claim, conclusion, result, and method. In the center, these categories are listed, with dashed lines linking them to corresponding parts of the document. A red cross marks an incorrect linkage between a result and a section in a second document shown on the right, where one passage is highlighted. At the bottom, small robot figures are shown reviewing documents, indicating automated evaluation or peer review processes.

𝗔𝘂𝘁𝗼𝗺𝗮𝘁𝗶𝗰 𝗿𝗲𝘃𝗶𝗲𝘄𝗲𝗿𝘀 𝗰𝗮𝗻 𝗺𝗶𝘀𝘀 𝗳𝘂𝗻𝗱𝗮𝗺𝗲𝗻𝘁𝗮𝗹 𝗿𝗲𝗮𝘀𝗼𝗻𝗶𝗻𝗴 𝗲𝗿𝗿𝗼𝗿𝘀.
👀 LLM-generated reviews may look convincing — but how reliable are they in practice?

In our recent TACL paper, we introduce a 𝗰𝗼𝗻𝘁𝗿𝗼𝗹𝗹𝗲𝗱 𝗰𝗼𝘂𝗻𝘁𝗲𝗿𝗳𝗮𝗰𝘁𝘂𝗮𝗹 𝗲𝘃𝗮𝗹𝘂𝗮𝘁𝗶𝗼𝗻 𝗳𝗿𝗮𝗺𝗲𝘄𝗼𝗿𝗸 to systematically test automatic reviewers.

4 weeks ago 3 0 1 0

See you this week in Rabat 🕌! #EACL2026

#UKPLab #CulturalNLP #ResponsibleAI #NLProc #NLP4MentalHealth #NLPsych #NLP #MentalHealth

1 month ago 0 0 0 0

Follow the authors @ccliu.bsky.social, Hiba Arnaout, Nils Kovacic, and @igurevych.bsky.social from the UKP Lab / @tuda.bsky.social and @hessianai.bsky.social, as well as Dana Atzil-Slonim from the Psychology Department, Bar-Ilan University.

1 month ago 1 0 1 0

Tailored Emotional LLM-Supporter: Enhancing Cultural Sensitivity Large language models (LLMs) show promise in offering emotional support and generating empathetic responses for individuals in distress, but their ability to deliver culturally sensitive support remai...

📄 Paper: arxiv.org/abs/2508.07902

💻 Code and data: github.com/UKPLab/eacl2...

🔗 Project: github.com/UKPLab/arxiv...

1 month ago 0 0 1 0

📊 𝗪𝗵𝗮𝘁 𝘄𝗲 𝗳𝗶𝗻𝗱
With supervision, LLMs can produce support that is 𝗲𝗺𝗽𝗮𝘁𝗵𝗲𝘁𝗶𝗰 and 𝗰𝘂𝗹𝘁𝘂𝗿𝗮𝗹𝗹𝘆 𝗮𝘄𝗮𝗿𝗲.

✅ 𝗞𝗲𝘆 𝗶𝗻𝘀𝗶𝗴𝗵𝘁𝘀
• Guided LLMs can outperform anonymous online peer responses in culturally sensitive support.
• Cultural role-play alone isn’t enough.
• Data + supervision deliver measurable gains.

1 month ago 0 0 1 0

Posts by UKP Lab