The Tech Team at ACLU is hiring! We are looking for a Data Scientist with expertise in NLP and AI ethics to work on using language tech to support ACLU's mission. Come help us tackle questions about how AI systems can be carefully applied to support the public interest. www.aclu.org/careers/appl...
Posts by Emma Pierson
Our lab, within the Berkeley EECS department, is hiring a postdoc!
More info and quick application form: forms.gle/4CcESe1TGFoo...
Apply by May 1!
Please reshare :)
New paper: "In Your Own Words"! We:
- develop a framework to identify themes in free-text survey data
- show its benefits on a new dataset of how people self-describe their race, gender, and sexual orientation
- release this data for research!
See @jennyshwang.bsky.social's thread below :)
We have a new piece in Nature Health led by @dmshanmugam.bsky.social, @sidhikabalachandar.bsky.social, and a wonderful team of coauthors on how to move towards a world in which race is not used in clinical algorithms!
Congratulations to @gsagostini.bsky.social, whose recent Nature Comms paper releasing a fine-grained migration dataset (www.nature.com/articles/s41...) just won a student paper award at the American Association of Geographers Annual Meeting!
Our paper, "What's in My Human Feedback", received an oral presentation at ICLR!
Our method automatically+interpretably identifies preferences in human feedback data; we use this to improve personalization + safety.
Reach out if you have data/use cases to apply this to!
arxiv.org/pdf/2510.26202
New research is offering new insight on how Americans move — all the way to the neighborhood level.
A new dataset, MIGRATE, maps annual moves with 4,600‑times more detail than standard public data, revealing patterns hidden in county‑level reporting:
https://bit.ly/49XSD6w
Paper: nature.com/articles/s4146
7-025-68019-2
Cornell Chronicle article: news.cornell.edu/stories/2026...
Data access: migrate.tech.cornell.edu
Joint work led by the geospatial wizard @gsagostini.bsky.social and with great coauthors Rachel Young, Maria Fitzpatrick, and @nkgarg.bsky.social!
Now out in Nature Communications - we have released a migration dataset that is
- 4000x more granular than existing public data
- highly correlated with Census data
- being used by >100 academic, govt, and non-profit teams all over the world
See @gsagostini.bsky.social's thread!
This work is led by the wonderful James Diao, with a great team of coauthors: @rajmovva.bsky.social, Lingwei Cheng, @kkado.bsky.social, Aashna Shah, @neil-r-powe.bsky.social, Kadija Ferryman, and Raj Manrai!
See the paper - jamanetwork.com/journals/jam... - for more details, including sensitivity analyses, replication of prior gold-standard surveys, etc!
Full survey questions, data, and code: github.com/epierson9/ra...
Finding #4: Respondents were 4x more likely to be uncomfortable if clinicians used race without asking...yet <10% reported ever being told their race was used.
This suggests the way we communicate about the use of race may not foster trust, and raises concerns in light of calls for transparency.
Finding #3: Respondents were more comfortable with the use of race than with widely-proposed alternatives like zipcode or income.
Said one respondent: "What does my paycheck have to do with a genetic mutation?"
Don't assume switching to these factors will automatically improve trust.
Finding #2: However, a substantial minority of respondents were not comfortable with use of race, and Black + Hispanic respondents were less comfortable than white and Asian respondents.
This raises complex ethical and algorithmic questions about how to weigh these clashing preferences.
Finding #1: Most respondents were comfortable with the use of race in at least some circumstances.
This highlights a gap between calls to eliminate uses of race in medicine and public opinion.
We have a new paper in JAMA Internal Medicine!
Patient race is widely used in medical algorithms...but it's unclear how patients feel about this.
We conduct the first nationally-representative YouGov survey to find out, producing four findings with practical clinical implications. 1/
Thanks to Kara Manke at Berkeley News for this profile of our lab's recent work on fairer decision-making in healthcare and policing! news.berkeley.edu/2026/01/20/a...
Thanks - super-interesting, and actually very relevant to some other work we're doing as well. Will pass along!
We're excited about applications of our test to other datasets that have 1) perceptions of race, gender, etc and 2) multiple observations of the same person.
This work is led by the wonderful Nora Gera, in a great start to her PhD!
Full paper: www.science.org/doi/epdf/10....
See the paper for many robustness checks and discussion of nuances! Our finding persists when using alternate outcomes, statistical models, subsets of the data, and controls satisfying the criteria above.
5/
A benefit of our test is that it doesn't require us to control for all factors legitimately influencing searches. We only have to control for things that influence both searches and perceived race, vary for the same person across stops, and don't themselves suggest bias.
4/
9% of drivers stopped multiple times have inconsistently perceived race across different stops - most perceived as both white + Hispanic.
When perceived as Hispanic, the same driver is likelier to be searched/arrested. This gap is substantial (24% of overall search rate).
3/
Tests for racial bias often compare how two people of different races are treated.
But two people typically differ in many ways besides race.
So instead of comparing two different people, we study the *same person over time*, as perceptions of their race change.
2/
We have a new paper in Science Advances proposing a simple test for bias:
Is the same person treated differently when their race is perceived differently?
Specifically, we study: is the same driver likelier to be searched by police when they are perceived as Hispanic rather than white?
1/
New #NeurIPS2025 paper: how should we evaluate machine learning models without a large, labeled dataset? We introduce Semi-Supervised Model Evaluation (SSME), which uses labeled and unlabeled data to estimate performance! We find SSME is far more accurate than standard methods.
selfishly i wish we could keep divya in our lab forever but i guess it would be a disservice to the rest of the world 😅 she’s been such a wonderful mentor to me—i’ve learned a lot from how thoughtful, creative, and knowledgeable she is about everything. she’s also super funny and amazing at baking 🤭
Meeting Divya 5 years ago was one of the biggest strokes of luck in my faculty career - she is a brilliant scientist who has been foundational to so many of our lab's projects, and any institution would be lucky to hire her.
Apply here - aprecruit.berkeley.edu/JPF05028 by 11/15, but review of applications is ongoing so sooner is better! (Application deadline currently says 9/15 but will be extended).
Broad project areas include:
1) language modelling methods for scientific discovery (building on our recent work - arxiv.org/abs/2502.04382)
2) using language models to support equity (ai.nejm.org/doi/full/10....)
both in collaboration with health+social scientists.
2/3
🚨 New postdoc position in our lab at Berkeley EECS! 🚨
(please reshare)
We seek applicants with experience in language modeling who are excited about high-impact applications in the health and social sciences!
More info in thread
1/3