AIM's 2nd round of TTK hiring - building up to 30 - is up!
📅 Ddl 12/22/25
🔬 Accessibility & Learning, plus Sustainability & Social Justice
🧑🏫 Associate/Full Prof*
🔗 umd.wd1.myworkdayjobs.com/en-US/UMCP/j...
*Assistant-level candidates: apply to departments, mentioning AIM in a cover letter
Posts by Lingjun Zhao
A diagram illustrating pointwise scoring with a large language model (LLM). At the top is a text box containing instructions: 'You will see the text of a political advertisement about a candidate. Rate it on a scale ranging from 1 to 9, where 1 indicates a positive view of the candidate and 9 indicates a negative view of the candidate.' Below this is a green text box containing an example ad text: 'Joe Biden is going to eat your grandchildren for dinner.' An arrow points down from this text to an illustration of a computer with 'LLM' displayed on its monitor. Finally, an arrow points from the computer down to the number '9' in large teal text, representing the LLM's scoring output. This diagram demonstrates how an LLM directly assigns a numerical score to text based on given criteria
LLMs are often used for text annotation, especially in social science. In some cases, this involves placing text items on a scale: eg, 1 for liberal and 9 for conservative
There are a few ways to accomplish this task. Which work best? Our new EMNLP paper has some answers🧵
arxiv.org/pdf/2507.00828
Glad to hear ❤️
📄 Paper: arxiv.org/abs/2505.19299
💻 Code: github.com/lingjunzhao/PE…
🙏 Huge thanks to my advisor @haldaume3.bsky.social and everyone who shared insights!
🚨 New #EMNLP2025 (main) paper!
LLMs often produce inconsistent explanations (62–86%), hurting faithfulness and trust in explainable AI.
We introduce PEX consistency, a measure for explanation consistency,
and show that optimizing it via DPO improves faithfulness by up to 9.7%.
What should Machine Translation research look like in the age of multilingual LLMs?
Here’s one answer from researchers across NLP/MT, Translation Studies, and HCI.
"An Interdisciplinary Approach to Human-Centered Machine Translation"
arxiv.org/abs/2506.13468
QANTA Logo: Question Answering is not a Trivial Activity [Humans and computers competing on a buzzer]
Do you like trivia? Can you spot when AI is feeding you BS? Or can you make AIs turn themselves inside out? Then on June 14 at College Park (or June 21 online), we have a competition for you.
Super thankful for my wonderful collaborators: @pcascanteb.bsky.social @haldaume3.bsky.social Mingyang Xie, Kwonjoon Lee
We introduce a super simple yet effective strategy to improve video-language alignment (+18%): add hallucination correction in your training objective👌
Excited to share our accepted paper at ACL: Can Hallucination Correction Improve Video-language Alignment?
Link: arxiv.org/abs/2502.15079
For the ACL ARR review, I’ve heard complaints about the workload—some reviewers have 16 papers. Even though I only need to write 1 rebuttal and respond to 4, it still feels substantial. For those managing more (thank you!), it can be difficult to thoroughly engage with every rebuttal.
Page one of diff.
Page 2 of diff.
Page 3 of diff.
There is a new version of the Research Plan for NIST's AI Safety Consortium (AISIC) in response to EOs. I did a diff.
Out: safety, responsibility, sociotechnical, fairness, working w fed agencies, authenticating content, watermarking, RN of CBRN, autonomous replication, ctrl of physical systems
>
This is my first time serving as an AC for a big conference.
Just read this great work by Goyal et al. arxiv.org/abs/2411.11437
I'm optimizing for high coverage and low redundancy—assigning reviewers based on relevant topics or affinity scores alone feels off. Seniority and diversity matter!