๐จ Deadline Extended to Feb 5 (AoE)!
CFP still OPEN for the #AFAA2026 Workshop at @iclr-conf.bsky.social โ on fairness across alignment & agentic AI systems.
Full & tiny papers welcome โข Interdisciplinary work encouraged!
๐ afciworkshop.org
#ICLR2026 #AFAA2026
Posts by
๐จ CFP OPEN! Weโre launching the #AFAA2026 Workshop at @iclr-conf.bsky.social on ๐ณ๐ฎ๐ถ๐ฟ๐ป๐ฒ๐๐ ๐ฎ๐ฐ๐ฟ๐ผ๐๐ ๐ฎ๐น๐ถ๐ด๐ป๐บ๐ฒ๐ป๐ ๐ฎ๐ป๐ฑ ๐ฎ๐ด๐ฒ๐ป๐๐ถ๐ฐ ๐๐ ๐๐๐๐๐ฒ๐บ๐.
Submit your latest ideas (full or tiny papers!)
Interdisciplinary work especially welcome :D
๐ Deadline: Jan 31 (AoE) | ๐ www.afciworkshop.org
#AFAA2026 #ICLR2026
Four case studies with the gap between the reality of model use and their sandbox evaluations in audits... Definitely need to take a deeper dive, great presentation by Emily Black!
Evaluations in the way the model would be deployed vs evaluations in only controlled unrealistic settings!
Allowing companies to do isolated audits can lead to D-Hacking!! More robust testing is needed...
Legal frameworks tend to have control over allocative decisions (Yes/No outcomes), which fit well with traditional ML systems... But not with GenAI systems
Zollo et al: Towards Effective Discrimination Testing for Generative AI
#FAccT2025
Nuance of stereotype errors is so important to understand their true harms... Insightful presentation by @angelinawang.bsky.social
Women tend to report stereotype-reinforcing errors as more harmful while men tend to report stereotype-violating errors as more harmful...
Some items are more associated with men vs women (not surprising), but not all of them are equally harmful!!
Cognitive beliefs, attitudes and behaviours... Three ways to measure harms ('pragmatic harms')
Are all errors equally harmful? No! Stereotype-reinforcing errors vs stereotype-violating errors
Our understanding of stereotypes sometimes isn't indicative of reality.... they can appear in both directions, or might exist simply without harm
Wang et al: Measuring Machine Learning Harms from Stereotypes Requires Understanding Who Is Harmed by Which Errors in What Ways
#FAccT2025
Clear narrative and a great presentation by Cecilia Panigutti
Risk-measuring studies - Bringing it back to risk measurement, but this time with a clearly defined objective instead of risk-uncovering as before... Not just whether a risk exists, but 'how severe' is it?
Interface-design studies - Focus on UI design elements which impact user interaction
Reverse-engineering studies - Narrower scope and in-depth studies of how algorithms work... Methodological precision in the key!
Risk-uncovering studies - Typical starts from anecdotal evidence and help surface new risks
A review organized not by data collection technique, but by DSA risk management framework categories
Narrative review of algorithmic auditing studies, practical recommendation for best practices, and mapping to DSA obligations...
Panigutti et al: How to investigate algorithmic-driven risks in online platforms and search engines? A narrative review through the lens of the EU Digital Services Act
#FAccT2025
Such a broad topic... Excellent presentation by @feliciajing.bsky.social
Historical methods working alongside many other ways of auditing these models can help us take advantage of the broader scope of historical evaluations....
AI Audits have moved from bottom-up external evaluations to new age 'auditing companies'. While this has increased speed and scale, they have significantly narrowed the scope of auditing.
Why the history of AI assessments? A study through the lens of historical methods can help us understand neglected areas of auditing.
Sandoval and Jing: Historical Methods for AI Evaluations, Assessments, and Audits
#FAccT2025
Important recommendations on standardization of report creation and storage to allow better meta-analysis in the future... Eye opening presentation by @mkgerchick.bsky.social
Applicants impacted by these tools, whose demographic data is missing, are completely removed from these audits!
Serious issues with the data usage... most weird for me: 'simulated test data'!