Advertisement · 728 × 90

Posts by Nir Grinberg

Can social media detect economic shocks before official data does?
A new PNAS Nexus study led by @nirg.bsky.social and Samuel Fraiberger shows that AI models tracking job-loss disclosures on social media can predict U.S. unemployment insurance claims up to two weeks early,

3 months ago 0 1 1 0

Credit is also due to @davidlazer.bsky.social for prompting Sam & I to think about this problem 7(!) years ago ;)

12/fin

3 months ago 2 0 0 0

Kudos to my wonderful co-authors Do Lee linkedin.com/in/do-lee and
@manueltonneau.bsky.social (both on the job market!), Boris Sobol il.linkedin.com/in/boris-sobol and Sam Fraiberger samuelfraiberger.com.

11/N

3 months ago 0 0 1 0

Yet platform data-access policies increasingly block this potential. Whether platforms or regulators will enable change in the coming years is a core policy question.

10/N

3 months ago 1 0 1 0

There is clear public value here, potentially extending to other countries, especially where official statistical systems are under-developed.

9/N

3 months ago 0 0 1 0

Why this matters?

Beyond forecasting, this approach can provide early warnings, surface local labor market stress hidden by national averages, and help flag measurement issues in real time.

8/N

3 months ago 1 0 1 0
Post image

Key finding 3:

This also works at the state and city (!) level, including "holdout cities" where official UI numbers are sparse or irregularly updated.

As expected, accuracy scales with platform penetration and unemployment shocks.

7/N

3 months ago 1 0 1 0
Post image

Key finding 2:

Our approach consistently outperforms industry consensus forecasts and can improve predictions of US UI claims up to two weeks ahead of official releases.

That’s two weeks of additional lead time for policymakers.

6/N

3 months ago 1 0 1 0
Post image

Key finding 1:

Capturing linguistic diversity matters.

Training LLMs with active learning lets us detect many more ways people talk about job loss, producing a far more representative sample of unemployed users than existing approaches.

5/N

3 months ago 1 0 1 0
Preview
Multilingual Detection of Personal Employment Status on Twitter Manuel Tonneau, Dhaval Adjodah, Joao Palotti, Nir Grinberg, Samuel Fraiberger. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022.

We combine JoblessBERT (an encoder LLM developed in previous work aclanthology.org/2022.acl-lon... which detects ~3× more employment-related content without sacrificing precision) with post-stratification using inferred demographics to correct for platform bias.

4/N

3 months ago 0 0 1 0
Advertisement

So we ask a hard question economic actors and policymakers rightly worry about:

Can skewed social media data be turned into trustworthy indicators of unemployment?

Can we produce robust predictions across geography ✅, time ✅, demography ✅, and forecasting horizon ✅ ?

3/N

3 months ago 0 0 1 0
Post image

Why this matters:

In March 2020, weekly unemployment insurance claims jumped from 278K to nearly 6 million in two weeks.

As official data lagged, policymakers were flying blind about where the shock was hitting and who was being affected.

2/N

3 months ago 0 0 1 0
Preview
Can social media reliably estimate unemployment? Abstract. Digital trace data hold tremendous potential for measuring policy-relevant outcomes in real-time, yet its reliability is often questioned. Here,

New paper out in @pnasnexus.org:

We show how skewed social media data can still be used to reliably estimate unemployment, not just nationally but down to the city level. 📈

doi.org/10.1093/pnas...

1/N

3 months ago 11 4 1 0
Post image

Introducing “DomainDemo: a dataset of domain-sharing activities among different demographic groups on Twitter.”

Today, we release five derived metrics of over 129,000 domains, quantifying their characteristics such as geographical reach and audience partisanship.

1/3

1 year ago 15 5 1 4
Close to Human-Level Agreement: Tracing Journeys of Violent Speech in Incel Posts with GPT-4-Enhanced Annotations

Close to Human-Level Agreement: Tracing Journeys of Violent Speech in Incel Posts with GPT-4-Enhanced Annotations

Figure 1: Linear Regression between time and share of violent posts.

Figure 1: Linear Regression between time and share of violent posts.

Figure 2: Linear Regression between time and category of directedness.

Figure 2: Linear Regression between time and category of directedness.

Incels (involuntarily celibates) are increasingly using violent language, particularly non-directed violent language in the largest incel forum, finds @danielmatter.bsky.social @miriamschirmer.bsky.social @nirg.bsky.social @jurgenpfeffer.bsky.social arxiv.org/abs/2401.02001

2 years ago 12 8 0 2

Awesome! We’d love to hear what you and your students think about it.

2 years ago 1 0 0 0

We are also grateful for comments received on earlier versions of this work from Diyi Liu, Eran Amsalem @patyrossini.bsky.social Alon Zoizner, and @orentsur.bsky.social & for funding from European Research Council (ERC), Israel Science Foundation (ISF) and BGU's Data Science Center.

2 years ago 4 1 0 0
Advertisement

Big shout-out to the people whose work enabled this research, including @sdmccabe.com @jongreen.bsky.social @davidlazer.bsky.social Magdalena Wojcieszak @jatucker.bsky.social Subhayan Mukerjee @ylelkes.bsky.social @kthorson.bsky.social @chriswells.bsky.social (pls tag others if missing).

6/

2 years ago 6 0 1 0
Sociodemographic characteristics among different political exposure types. Sample averages are marked in a gray dashed line. Ninety-five percent bootstrapped CIs are shown (mostly occluded due to their small size). CI = confidence interval.

Sociodemographic characteristics among different political exposure types. Sample averages are marked in a gray dashed line. Ninety-five percent bootstrapped CIs are shown (mostly occluded due to their small size). CI = confidence interval.

Finally, looking at the demographic composition of consumption "types", we find that the media-oriented clusters (exc. superconsumers) have older individuals, more women, and more registered Democrats.

5/

2 years ago 2 1 1 0

Even when putting aside the more extreme "media superconsumers", the two media-oriented clusters (which are ~20% of the population), get half or more of their political content *directly* from media organizations and journalists, without any mediation from peers.

4/

2 years ago 2 1 1 0
The composition of political exposure across clusters. The share of politics curated by different actor types (y-axis) across clusters (x-axis). Darker-colored bars represent direct exposure to media organizations, journalists, politicians, OLs, and social peers. Lighter-colored bars represent indirect exposure to media organizations, journalists, politicians, or opinion leaders through social peers. OL = opinion leader.

The composition of political exposure across clusters. The share of politics curated by different actor types (y-axis) across clusters (x-axis). Darker-colored bars represent direct exposure to media organizations, journalists, politicians, OLs, and social peers. Lighter-colored bars represent indirect exposure to media organizations, journalists, politicians, or opinion leaders through social peers. OL = opinion leader.

Americans also vary in the breakdown of actors that populate their feeds, but interestingly, the bulk of the population gets half or more of their political exposure from *traditional sources*—media organizations, journalists, and politicians.

3/

2 years ago 2 1 1 0
Prototypical types of individual political exposure. Each point in panel (A) represents the political exposure of a single panel member, reduced to two dimensions using the UMAP algorithm, and colored by the cluster assignment obtained from HDBSCAN. Panel (B) shows the median number of political tweets available to individuals per day (left bars), and their percentage out of all tweets available to them on Twitter (right bars). Cluster labels and their share in the population are specified on the x-axis. Colors are consistent between the two figure panels. Ninety-five percent bootstrapped CIs are omitted from the figure due to their small magnitude, which are upper bounded by twenty-seven exposures to tweets and 0.28 percent, respectively. OL = opinion leader; CI = confidence interval; UMAP = Uniform Manifold Approximation and Projection.

Prototypical types of individual political exposure. Each point in panel (A) represents the political exposure of a single panel member, reduced to two dimensions using the UMAP algorithm, and colored by the cluster assignment obtained from HDBSCAN. Panel (B) shows the median number of political tweets available to individuals per day (left bars), and their percentage out of all tweets available to them on Twitter (right bars). Cluster labels and their share in the population are specified on the x-axis. Colors are consistent between the two figure panels. Ninety-five percent bootstrapped CIs are omitted from the figure due to their small magnitude, which are upper bounded by twenty-seven exposures to tweets and 0.28 percent, respectively. OL = opinion leader; CI = confidence interval; UMAP = Uniform Manifold Approximation and Projection.

People's political feeds mostly map onto 8 distinct types that vary in the amount of politics they get, both in absolute #'s and as % the feed as a whole. Still, for nearly 90% of the population, about 1 in 12 posts from their network are political. Quite an engaged public!

2/

2 years ago 3 1 1 0
Post image

🚨New paper🚨 out in the International Journal of Press/Politics w/ Assaf Shamir and @jenny-oser.bsky.social 🎉

Here's what we learned from studying the composition of political content available to 600k+ registered U.S. voters on Twitter during the 2020 election.

doi.org/10.1177/1940...
🧵👇

2 years ago 20 11 1 2