8/ ๐ Huge thanks to @marinecarpuat.bsky.social, Rachel, and @zhoutianyi.bsky.social for their guidance โ and special shoutout to the amazing UMD CLIP team!
Check out our paper and code below ๐
๐ Paper: arxiv.org/abs/2505.24671
๐คย Dataset: github.com/dayeonki/cul...
Posts by Dayeon (Zoey) Ki
7/ ๐ Whatโs next for Multi-Agent Debate?
Some exciting future directions:
1๏ธโฃ Assigning specific roles to represent diverse cultural perspectives
2๏ธโฃ Discovering optimal strategies for multi-LLM collaboration
3๏ธโฃ Designing better adjudication methods to resolve disagreements fairly ๐ค
6/ But do these gains hold across cultures? ๐พ
๐ซ We measure cultural parity across diverse groups โ and find that Multi-Agent Debate not only boosts average accuracy but also leads to more equitable cultural alignment ๐
5/ How do model decisions evolve through debate?
We track three phases of LLM behavior:
๐ Initial decision correctness
๐ Final decision correctness
๐ Judgeโs decision correctness
โจ Multi-Agent Debate is most valuable when models initially disagree!
4/ ๐ฅย Distinct LLMs are complementary!
We find that:
๐คฏ Multi-Agent Debate lets smaller LLMs (7B) match the performance of much larger ones (27B)
๐ Best combo? Gemma-2 9B + EXAONE-3 7B ๐ช
3/ Before bringing in two #LLMs, we first ๐ maximize single-LLM performance through:
1๏ธโฃ Cultural Contextualization: adding relevant rules-of-thumb for the target culture
2๏ธโฃ Self-Reflection: evaluating and improve its own outputs
These serve as strong baselines before we introduce collaboration ๐ค
2/ ๐คย Why involve multiple #LLMs?
Different LLMs bring complementary perspectives and reasoning paths, thanks to variations in:
๐ฝ Training data
๐ง Alignment processes
๐ Language and cultural coverage
We explore one common form of collaboration: debate.
1/ Are two #LLMs better than one for equitable cultural alignment? ๐
We introduce a Multi-Agent Debate framework โ where two LLM agents debate the cultural adaptability of a given scenario.
#ACL2025 ๐งต๐
Trying to collect all the MT people here. I probably missed many. Ping me!
bsky.app/starter-pack...
8/ โค๏ธย Huge thanks to @marinecarpuat.bsky.social, Kevin duh, and the amazing UMD CLIP team for all the feedback and inspiration throughout this work!
Weโd love for you to check it out ๐
๐ Paper: arxiv.org/abs/2504.11582
๐คย Dataset: github.com/dayeonki/askqe
7/ Can AskQE handle naturally occurring translation errors too? ๐
Yes! It shows:
๐โโ๏ธ Stronger correlation with human judgments
โ
Better decision-making accuracy than standard QE metrics
6/ ๐ค What kinds of questions does AskQE generate?
Most commonly:
๐ Extent โ How many COVID-19 cases were reported today? (24.6%)
๐ก Concept โ What is another name for paracetamol? (23.6%)
5/ ๐ฅ We test AskQE on ContraTICO and find:
๐ It effectively distinguishes minor to critical translation errors
๐ญ It aligns closely with established quality estimation (QE) metrics
4/ We introduce ContraTICO, a dataset of 8 contrastive MT error types in the COVID-19 domain ๐ท๐ฆ
โ ๏ธ Minor errors: spelling, word order, synonym, intensifier, expansion (no impact)
๐ Critical errors: expansion (impact), omission, alteration
3/ AskQE has two main components:
โ Question Generation (QG): conditioned on the source + its entailed facts
โ Question Answering (QA): based on the source and backtranslated MT
If the answers donโt match... there's likely an error โ ๏ธ
2/ But why question answering? ๐ค
1๏ธโฃ Provides functional explanations of MT quality
2๏ธโฃ Users can weigh the evidence based on their own judgment
3๏ธโฃ Aligns well with real-world cross-lingual communication strategies ๐
1/ How can a monolingual English speaker ๐บ๐ธ decide if an automatic French translation ๐ซ๐ท is good enough to be shared?
Introducing โAskQEโ, an #LLM-based Question Generation + Answering framework that detects critical MT errors and provides actionable feedback ๐ฃ๏ธ
#ACL2025
How does the public conceptualize AI? Rather than self-reported measures, we use metaphors to understand the nuance and complexity of peopleโs mental models. In our #FAccT2025 paper, we analyzed 12,000 metaphors collected over 12 months to track shifts in public perceptions.
Multilinguality is happening at #NAACL2025
@crystinaz.bsky.social
@oxxoskeets.bsky.social
@dayeonki.bsky.social @onadegibert.bsky.social
Starting my journey on Bluesky with a topic that I care deeply about: AI tools can support creators in various ways, but disclosing AI use may risk devaluing creative work.
Check out our abstract here: angelhwang.github.io/doc/ic2s2_AI...
Inspired by our past work: arxiv.org/abs/2411.13032
8/ ๐ซถย Huge thanks to my advisor @marinecarpuat.bsky.social and the amazing UMD CLIP folks for all the insightful discussions!
Please check out our paper accepted to NAACL 2025 ๐
๐ Paper: arxiv.org/abs/2502.16682
๐คย Code: github.com/dayeonki/rew...
7/ Taken together, we show that simpler texts are more translatable โ and more broadly, #LLM-assisted input rewriting is a promising direction for improving translations! ๐ฅ
As LLM-based writing assistants grow, we encourage future work on interactive, rewriting-based approaches to MT ๐ซก
6/ ๐งโโ๏ธ Do humans actually prefer translations of simplified inputs?
Yes! They rated these to be:
๐ More contextually appropriate
๐๏ธ Easier to read
๐ค More comprehensible
compared to translations of original inputs!
5/ What does input rewriting actually change? ๐ง
Here are 3 key findings:
1๏ธโฃย Better translatability trades-off meaning preservation
2๏ธโฃ Simplification boosts both input & output readability ๐
3๏ธโฃ Input rewriting > Output post-editing ๐คฏ
4/ ๐คย Can we have more selective strategies?
Yes! By selecting rewrites based on translatability scores at inference time, we outperform all other methods ๐ฅ
3/ ๐ Which rewriting strategy works best?
Simpler texts are easier to translate!
But... simplification isn't always a win for MT quality ๐
2/ How should inputs be rewritten for machine translations? โ๏ธ
We explore 21 methods with different levels of MT-awareness ๐
๐ย MT-Agnostic: no knoweldge of the task
๐ย Task-Aware: aware of the end task (MT)
๐
ย Translatability-Aware: guided by quality estimation scores
๐จย New Paper ๐จ
1/ We often assume that well-written text is easier to translate โ๏ธ
But can #LLMs automatically rewrite inputs to improve machine translation? ๐
Hereโs what we found ๐งต
๐จ NEW WORKSHOP ALERT ๐จ
We're thrilled to announce the first-ever Tokenization Workshop (TokShop) at #ICML2025 @icmlconf.bsky.social! ๐
Submissions are open for work on tokenization across all areas of machine learning.
๐
Submission deadline: May 30, 2025
๐ tokenization-workshop.github.io
Thrilled our global data ecosystem audit was accepted to #ICLR2025!
Empirically, it shows:
1๏ธโฃ Soaring synthetic text data: ~10M tokens (pre-2018) to 100B+ (2024).
2๏ธโฃ YouTube is now 70%+ of speech/video data but could block third-party collection.
3๏ธโฃ <0.2% of data from Africa/South America.
1/