๐๏ธ The ARR March review deadline is approaching: April 20 AoE.
Finishing up your review? Run it through REVAS, a peer review assistant that makes your suggestions more actionable, flags unsupported claims, and grounds your feedback in the paper.
๐ revas.mbzuai.ac.ae
Posts by Dirk Hovy
#MemoryModay #NLProc Uma et al. (2020) highlights 'A Case for Soft Loss Functions' efficacy using soft labels & crowd annotations in AI tasks, outshining top-tier methods.
To accommodate ACL decisions, we are further extending the commitment deadline for pre-reviewed ARR submissions to April 7!
The paper acceptance notifications will be out by the 6th of April, AoE. The PCs are working hard throughout the holiday season to finalize the decisions.
Apologies for the delay!
The deadline for submission to the Political Networks conference is this Friday. It's taking place Aug 4-7, in Manchester. sites.google.com/view/confpol...
#TBT #NLProc '[MASK]? Making Sense of Language-Specific BERT Models' by @deboranozza.bsky.social, Bianchi & @dirkhovy.bsky.social (2020), explores language-specific vs universal BERT models.
- Optional: question your life choices but show up to do it again the next week anyway
I realized how much DMing is like being a professor/chairing a committee. You:
- make a brilliant plan for 2+ hours of fun
- prep lots of material
- immediately get derailed by questions/arguments/etc.
- keep it together to make the most of the time together
- end up not using most of the material
#MemoryModay #NLProc 'Hey Siri. Ok Google. Alexa: A topic modeling of user reviews for smart speakers,' by Nguyen & @dirkhovy.bsky.social decodes speaker reviews for user preferences using topic models. Domain knowledge needed for market analysis.
A slide showing that the posterior is proportional to the likelihood times the prior
I wrote a blog post on my experience using AI for slide generation
Basic idea: write your lecture notes first, then prompt the LLM to produce corresponding slides in reveal.js (h/t @chenhaotan.bsky.social). I'm picky about my slides but was happy with the results!
alexanderhoyle.com/posts/ai-sli...
#TBT #NLProc Fornaciari, @dirkhovy.bsky.social's 'Identifying Linguistic Areas for Geolocation' explores using social media writing for geolocation via Point-to-City (P2C).
Wish I could be at @eaclmeeting.bsky.social, but the lab is well represetned. If you are there, come and say hi!
#MemoryModay #NLProc 'Dense Node Representation for Geolocation' by Fornaciari & @dirkhovy.bsky.social reveals efficient geolocation methods using node2vec & doc2vec models. Greater network size, less parameters.
#TBT #NLProc 'Geolocation with Attention-Based Multitask Learning Models' by Tommaso Fornaciari, @dirkhovy.bsky.social (2019) reveals how online political talks can become one-sided. Breaking out of our bubbles! #SocialMedia
Chpater 8: @dirkhovy.bsky.social, M Gerondeau & J Globisz on text data and natural language processing.
A very useful chapter on why text is such a rich source for CSS, and how NLP can help with exploration, prediction, and generation; if used thoughtfully and with clear research goals.
Just read this great piece - paulgp.com/2026/03/16/r... by @paulgp.com and it got me thinking.
It feels like there is a lot of moral(?) ambiguity and ambivalence around the use of LLMs for academics.
So far, I've avoided having LLMs do basically any of my research writing ...
#MemoryModay #NLProc 'Make Natural Language Processing About People Again' by @dirkhovy.bsky.social (2018) uncovers how AI models portray different religions and emotions. #AIEthics
Joel Tetreault (not on here) also has a great talk on the topic, with lots of interesting anecdotes
#MemoryModay #NLProc 'Comparing Bayesian Models of Annotation' by Paun et al. dives into corpus annotation, evaluating six models' predictiveness and accuracy. Essential for navigating annotators and item difficulties.
๐ข Call for Abstracts!
Towards a Safer Web for Women (co-located with #WebSci26)
๐ Braunschweig ๐ฉ๐ช | 26 May 2026
Theme: Preventive approaches to womenโs online safety
๐ Deadline: 27 March 2026
๐ forms.gle/tYheEgSwGecf...
๐ tsww26.github.io
#TBT #NLProc 'Predicting News Headline Popularity' by Lamprinidis, Hardt, @dirkhovy.bsky.social (2018) shows neural networks perform similar to Logistic Regression in prediction.
One of my favorite studies of the last few years! Great read (albeit with a side of worrying implications for surveys)
One of my favorite interdisciplinary projects (with @questoph.bsky.social). Plus: colorful maps!
#TBT #NLProc 'Capturing Regional Variation with Distributed Place Representations and Geographic Retrofitting' by @dirkhovy.bsky.social and Christoph Purschke (2018) highlights how social class and background impact technology performance. #TechInclusion
4/7 We argue these aren't separate bugs. They're four facets of the same problem:
๐ด Probabilistic โ can't match requested distributions
๐ Semantic โ confidence โ correctness
๐ต Distributional โ output diversity collapse
๐ข Metacognitive โ can't assess its own competence
1/7 ๐งต The GPT-4 technical report featured detailed calibration curves.
Since then, not a single major model release has reported calibration. The field quietly stopped measuring whether models know what they don't know.
Our new position paper argues this is a mistake. Here's why.
We were thrilled to host @mtutek.bsky.social at our lab last week.
His talk "From Internals to Integrity: How Insights into Transformer LMs Improve Safety, Interpretability, and Explanation Faithfulness" led to great discussions! ๐
#Transformers #AISafety #ExplainableAI #MLResearch #NLProc
Call for Virtual Registration Subsidies for #EACL26 ๐
โ ๏ธ Not for paper registrants
๐ Apply by Feb 27, 2026 (AoE)
๐ฉ Decisions by Mar 2, 2026
2026.eacl.org/calls/virtua...
Donโt register before hearing back if you apply!
Table titled โTaxonomy for evaluation of AI in mental health applications,โ organized into columns for quality criteria (validity and reliability) and real-world use (implementation and maintenance). Rows distinguish support types: assessment, intervention, and information synthesis. Each cell lists detailed evaluation questions, such as construct and criterion validity, consistency across populations and time, feasibility, effectiveness, usability, acceptability, safety, and unintended consequences, providing a structured framework for assessing AI systems in mental health contexts.
๐๐งฉ ๐๐ฒ๐๐ผ๐ป๐ฑ ๐๐ฒ๐ป๐ฐ๐ต๐บ๐ฎ๐ฟ๐ธ๐: ๐๐ผ๐ ๐๐ผ ๐๐๐ฎ๐น๐๐ฎ๐๐ฒ ๐ ๐ฒ๐ป๐๐ฎ๐น ๐๐ฒ๐ฎ๐น๐๐ต ๐๐ ๐ฅ๐ฒ๐๐ฝ๐ผ๐ป๐๐ถ๐ฏ๐น๐
AI for mental health is a high-stakes area: its evaluation needs to meet the highest expectations.
The new preprint ๐๐ฆ๐ด๐ฑ๐ฐ๐ฏ๐ด๐ช๐ฃ๐ญ๐ฆ ๐๐ท๐ข๐ญ๐ถ๐ข๐ต๐ช๐ฐ๐ฏ ๐ฐ๐ง ๐๐ ๐ง๐ฐ๐ณ ๐๐ฆ๐ฏ๐ต๐ข๐ญ ๐๐ฆ๐ข๐ญ๐ต๐ฉ, written by an interdisciplinary team spanning AI [...]
Honored to give my first keynote at #IRCDL2026 on February 19th.
Iโll talk about how LLMs have shifted from productivity tools to everyday sources of info & personal guidance and what that means for risk, trust, bias, and alignment.
ircdl2026.unimore.it