Advertisement · 728 × 90
#
Hashtag
#NAACL2024
Advertisement · 728 × 90

▪️Seven papers authored or co-authored by UKP staff have been accepted for publication at this year's #EACL2024

▪️ Five papers have been accepted for publication at this year's #NAACL2024, the Conference of the North American Chapter of the ACL.
(3/🧵)

1 0 1 0

Looking for potential emergency reviewers for submissions in Interpretability and Model Analysis for #NAACL2024!

Topics include: LLM Hallucination, Evaluation, Privacy Policy Analysis, and Unlearning. Please reach out if you have the bandwidth to help!🫡 #NLProc

14 5 1 0

#NAACL2024

0 0 0 0

And if you are currently at #NAACL2024 & would like to learn more about the dataset, consider checking out Pia Pachingers presentation at the "Workshop on Online Abuse and Harms" on "Span-Based Austrian German & English Offensive Language Detection" tomorrow 🙂🙌

4 0 0 0
Post image

Joint work of Nico Daheim,¹ Nouha Dziri,² Mrinmaya Sachan,³ Iryna Gurevych¹ and Edoardo Ponti.⁴
________________
¹ @ukplab.bsky.social, @cs-tudarmstadt.bsky.social, hessian.AI
² Allen Institute for AI (AI2)
³ ETH Zürich
⁴ The University of Edinburgh

See you in Mexico City 🇲🇽 at #NAACL2024! (9/9)

0 0 0 0
Preview
Elastic Weight Removal for Faithful and Abstractive Dialogue Generation Ideally, dialogue systems should generate responses that are faithful to the knowledge contained in relevant documents. However, many models generate hallucinated responses instead that contradict...

⚖️ Trade-off between faithfulness and abstractiveness
📈 Results on further tasks, such as FaithDial
🧑‍⚖️ Human Evaluation

For further results, check our paper and code!
📄 Paper: arxiv.org/abs/2303.17574
💻 Code: github.com/UKPLab/naacl...

(8/🧵) #NAACL2024

0 0 1 0
Post image

Adding the abstractiveness expert can improve the baseline in terms of both faithfulness and abstractiveness (gray region on the chart)

(7/🧵) #NAACL2024

0 0 1 0
Post image

Subtracting the hallucination mitigation expert trained on hallucinated examples removes these examples from training data

This drastically reduces hallucinations 📉
And it does so better than other methods 🏆

(6/🧵) #NAACL2024

0 0 1 0
Post image

We improve on scalar weighting!
How? By weighting each model according to their Fisher Information Matrix ⚖️

It provides a parameter-specific scaling that can better isolate parameters responsible for hallucinations and abstractiveness.

(5/🧵) #NAACL2024

0 0 1 0
Post image

We train 2 experts :
💠 a hallucination mitigation expert to discourage hallucinations
💠 an abstractiveness expert to encourage naturalness

(4/🧵) #NAACL2024

0 0 1 0

Adding them to the model weights promotes the behavior of the finetuned model ✅
Subtracting them discourage that behavior ❌

(3/🧵) #NAACL2024

0 0 1 0

Our method builds on Task Arithmetic 🏗️

Task vectors impact the model behavior. They are the difference between the model weights after and before fine-tuning on a task.

(2/🧵) #NAACL2024

0 0 1 0
Post image

Dialog models often hallucinate 😵‍💫
➕ Knowledge grounding can help
➖ but the responses become less natural

Can we reduce hallucinations AND keep naturalness?
Yes 🚀 With Elastic Weight Removal (EWR)!

Learn more about our #NAACL2024 paper 🧵 (1/9)

📃 arxiv.org/abs/2303.17574

1 0 1 0
Post image

Joint work of Chen Cecilia Liu, Jonas Pfeiffer (Google DeepMind), Ivan Vulić (Language Technology Lab, University of Cambridge) and Iryna Gurevych (@ukplab.bsky.social)

We look forward to seeing you in 🇲🇽!

(8/8) #NAACL2024 #NLProc

0 0 0 0
Post image

Do scheduled unfreezing algorithms only work with task adapters? 
Of course not, they work with LoRAs too!

Learn more in our paper:
📄 arxiv.org/abs/2301.05487 

(7/🧵) #NAACL2024 #NLProc

0 0 1 0

💡Our takeaway: deciding which parameters to train is important, even with adapters!
Give it a try in other OOD settings and let us know about your results 🙏 

(6/🧵) #NAACL2024 #NLProc

0 0 1 0

What if we select the training order of the adapters based on their Fisher Information?

We introduce FUN (Fisher-UNfreezing): at each step, we unfreeze the adapter with the largest Fisher Information 🥶🔥
The results are comparable to GU & LPFT!

(5/🧵) #NAACL2024 #NLProc

0 0 1 0

Scheduled unfreezing methods influence the learning dynamics in a unique way.

Why does this work so well?

👀 Look at Fisher Information! It changes during the critical learning period of training.

(4/🧵) #NAACL2024 #NLProc

0 0 1 0
Post image

Using GU and LPFT with task adapters provides better zero-shot cross-lingual transfer results than standard training and bridges the gap to full parameters tuning 🌍

(3/🧵) #NAACL2024 #NLProc

0 0 1 0
Post image

How to prevent catastrophic forgetting and improve OOD generalization?

We test the cross-lingual transfer abilities of »scheduled unfreezing« methods like Gradual Unfreezing (GU) and LPFT

(2/🧵) #NAACL2024 #NLProc

0 0 1 0
Post image

Cross-lingual zero-shot transfer is a natural test for out-of-distribution generalization, isn’t it? 😉

Our #NAACL2024 paper proposes »Scheduled unfreezing« ❄️🔥 for cross-lingual adapters training! And it works for LoRA ✨ (1/🧵)

📄 arxiv.org/abs/2301.05487

#NLProc

2 0 1 0
Post image

You can find our contribution here:
📄 Paper: arxiv.org/abs/2309.08591

Joint work of Chen Cecilia Liu, Fajri Koto, Timothy Baldwin and Iryna Gurevych of MBZUAI and @ukplab.bsky.social / @tuda.bsky.social.

See you later at #NAACL2024 in Mexico City 🇲🇽! (7/7)

2 0 0 0

No doubt we need reasoning abilities—but we also need Cultural Awareness!

➡️ Let’s build a truly Inclusive and Culturally Competent #NLProc for PEOPLE.

(6/🧵) #CulturalCompetence #InclusiveNLP #NAACL2024

0 0 1 0
Post image

Translation adds a layer of complexity.

When proverbs hop from one language to another, mLLMs reveal a »culture gap«
Translating conversations with proverbs is not just switching the language; they're crossing cultural boundaries, which isn't always smooth sailing ⛵

(5/🧵) #NAACL2024 #NLProc

0 0 1 0
Post image

🚨 It’s 2024 and the issue of negation is still not fixed !
While open source mLLMs score high in so many reasoning tasks, they still lack this basic reasoning ability 🙀. The bigger the model, the worse the results…

(4/🧵) #NAACL2024 #NLProc

0 0 1 0
Post image

⚠️ Memorization ≠ Understanding !

mLLMs "know" many proverbs, but knowing isn't understanding, especially in conversations🙅
Understanding abilities improve with scale and possibly instruction tuning.

(3/🧵) #NAACL2024 #NLProc

0 0 1 0
Post image

We dive into the performance of mLLMs on proverbs and sayings, the tricky bits of language that carry cultural meanings with contextual dependence.
With our new dataset MAPS 🗺, we show that different cultures naturally care about different topics. Then, we found this 👇

(2/🧵) #NAACL2024 #NLProc

2 0 1 0
Post image

🧑‍🎓💬 »I want to work on multicultural NLP and pragmatic reasoning.«
🧑‍🏫💬 »Fortune favours the brave!«
Ever wondered what 🧑‍🏫meant?

At #NAACL2024, we investigate multicultural proverbs and sayings with multilingual LLMs! – more in this 🧵 (1/7)

📄 arxiv.org/abs/2309.08591

1 0 1 0
Post image

And consider following the authors Sheng Lu, Hendrik Schuff, and Iryna Gurevych (@ukplab.bsky.social) if you are interested in more information or an exchange of ideas. (9/9) #NAACL2024

See you in Mexico City 🇲🇽!

2 0 0 0
Preview
How are Prompts Different in Terms of Sensitivity? In-context learning (ICL) has become one of the most popular learning paradigms. While there is a growing body of literature focusing on prompt engineering, there is a lack of systematic analysis...

We provide open access to our code and results:

📄 Paper: arxiv.org/abs/2311.07230
💻 Code: github.com/UKPLab/naacl...

(8/🧵) #NAACL2024 #NLProc

0 0 1 0