Advertisement Β· 728 Γ— 90

Posts by Pranav Goel

For Researchers The National Internet Observatory aims to help researchers understand how people behave online and how platforms structure what people see. This will be accomplished through creating a large panel of ...

More datasets (e.g. mobile app usage and social media data) are coming soon! (Track here: nationalinternetobservatory.org/researchers....)

See you all in LA on May 26 to discuss new modes of data collection and how we continue to conduct meaningful CSS research together! (7/7)

1 week ago 2 0 0 0

Many datasets from the NIO are already open to applications for access from researchers! e.g. Google and Bing search data, data on time spent on various webpages by users, and conversations that web users are having with ChatGPT and Gemini. (6/7)

1 week ago 0 0 1 0

This tutorial will present NIO's informed data donation process, participant demographics and behavioral traces across desktop and mobile devices, pathways for data access, and examples of analyses and new cross-disciplinary and cross-platform CSS research enabled by this new source of data! (5/7)

1 week ago 1 0 1 0

Content viewing (what content people are exposed to online/exposure behavior) is THE predominant form of online activity! But it is massively underexplored relative to production behavior (or what the few active social media users post online) (4/7)

1 week ago 0 0 1 0

This tutorial introduces the National Internet Observatory (NIO), an alternative data collection framework and infrastructure designed to help researchers study online behavior, with a particular focus on content viewing: (3/7)

1 week ago 1 0 1 0

More details in this thread below and the website: national-internet-observatory.github.io/beyondapi_ic...

Sign up here: forms.gle/DQk8PFGXHhy2...

(2/7)

1 week ago 1 0 1 0
An image summarizing the tutorial's information. On top is a logo for the National Internet Observatory. Below it is the title of the tutorial: "Beyond APIs: Collecting Online Activity Data for Research using the National Internet Observatory." This is followed by a small subheading below "A Tutorial at ICWSM 2026". Below this are details in text: the location, tutorial time, and the conference dates. This screenshot is directly taken from the website of the tutorial that is linked in the post (2nd post in the thread).

An image summarizing the tutorial's information. On top is a logo for the National Internet Observatory. Below it is the title of the tutorial: "Beyond APIs: Collecting Online Activity Data for Research using the National Internet Observatory." This is followed by a small subheading below "A Tutorial at ICWSM 2026". Below this are details in text: the location, tutorial time, and the conference dates. This screenshot is directly taken from the website of the tutorial that is linked in the post (2nd post in the thread).

How to obtain online activity data for research in the Post-API age? And what if there's already a unique collection of online activity data that you could obtain access to for your research? πŸ‘€

Come to our tutorial (organized w/ @davidlazer.bsky.social ) at @icwsm.bsky.social 2026 in LA! (1/7)

1 week ago 6 4 1 0
Beyond APIs: Collecting Web Data for Research using the National Internet Observatory

Going to Netsci in Boston in June? Interested in access to data from the National Internet Observatory, including our RFPs on AI chatbots, browsing behavior, search, and more to come over the next few months? Sign up for our workshop at Netsci:
national-internet-observatory.github.io/beyondapi_ne...

2 months ago 5 3 0 0

We will also have interactive activities and hands-on sessions with real network datasets that demonstrate NIO's capabilities for enabling novel cross-disciplinary and cross-platform research across web and social network environments!

2 months ago 0 0 0 0

We will discuss NIO's informed data donation process, participant demographics and behavioral traces, secure computing infrastructure and pathways for data access, and examples of analyses and innovative research with this new source of data for the network science community.

2 months ago 0 0 1 0
Advertisement

This is a vital source of data, and a crucial methodology for collecting data for academic research in the post-API age!

Track existing datasets open to request for research proposals at nationalinternetobservatory.org/researchers....

2 months ago 0 0 1 0

an alternative data-collection framework and infrastructure to help researchers study online behavior, with a particular focus on content viewing β€” the predominant (and a very understudied) form of online activity.

2 months ago 0 0 1 0
The National Internet Observatory The National Internet Observatory aims to help researchers understand how people behave online and how platforms structure what people see. This will be accomplished through creating a large panel of ...

This satellite will introduce the interdisciplinary conference participants to the National Internet Observatory (NIO) (nationalinternetobservatory.org),

2 months ago 1 0 1 0
An image summarizing the satellite's information. On top is a logo for the National Internet Observatory. Below it is the title of the satellite: "Beyond APIs: Collecting Online Activity Data for Research using the National Internet Observatory." This is followed by a small subheading below "A Satellite at NetSci 2026". Below this are details in text: the location, satellite time, and conference dates. This screenshot is directly taken from the website of the satellite that is linked in the post.

An image summarizing the satellite's information. On top is a logo for the National Internet Observatory. Below it is the title of the satellite: "Beyond APIs: Collecting Online Activity Data for Research using the National Internet Observatory." This is followed by a small subheading below "A Satellite at NetSci 2026". Below this are details in text: the location, satellite time, and conference dates. This screenshot is directly taken from the website of the satellite that is linked in the post.

Very excited to announce a new satellite coming to NetSci 2026 @netsciconf.bsky.social, co-organized with Scott Cambo and @davidlazer.bsky.social!

More details in this thread and the website (national-internet-observatory.github.io/beyondapi_ne...)

Sign up here: forms.gle/sgjVPMSNWYeY...

2 months ago 7 2 1 0
Preview
Gmail might be harvesting your emails to train AIβ€”here's how to opt out This is pretty bad.

Opt out now and opt out thoroughly: www.howtogeek.com/gmail-might-...

4 months ago 18 9 0 0
Preview
From Mexico to Ireland, Fury Mounts Over a Global A.I. Frenzy

AI data centers are straining already fragile power and water infrastructures in communities around the world, leading to blackouts and water shortages. β€œData centers are where environmental and social issues meet,” says Rosi Leonard, an environmentalist with @foeireland.bsky.social.

5 months ago 11 4 0 1
A diagram illustrating pointwise scoring with a large language model (LLM). At the top is a text box containing instructions: 'You will see the text of a political advertisement about a candidate. Rate it on a scale ranging from 1 to 9, where 1 indicates a positive view of the candidate and 9 indicates a negative view of the candidate.' Below this is a green text box containing an example ad text: 'Joe Biden is going to eat your grandchildren for dinner.' An arrow points down from this text to an illustration of a computer with 'LLM' displayed on its monitor. Finally, an arrow points from the computer down to the number '9' in large teal text, representing the LLM's scoring output. This diagram demonstrates how an LLM directly assigns a numerical score to text based on given criteria

A diagram illustrating pointwise scoring with a large language model (LLM). At the top is a text box containing instructions: 'You will see the text of a political advertisement about a candidate. Rate it on a scale ranging from 1 to 9, where 1 indicates a positive view of the candidate and 9 indicates a negative view of the candidate.' Below this is a green text box containing an example ad text: 'Joe Biden is going to eat your grandchildren for dinner.' An arrow points down from this text to an illustration of a computer with 'LLM' displayed on its monitor. Finally, an arrow points from the computer down to the number '9' in large teal text, representing the LLM's scoring output. This diagram demonstrates how an LLM directly assigns a numerical score to text based on given criteria

LLMs are often used for text annotation, especially in social science. In some cases, this involves placing text items on a scale: eg, 1 for liberal and 9 for conservative

There are a few ways to accomplish this task. Which work best? Our new EMNLP paper has some answers🧡
arxiv.org/pdf/2507.00828

5 months ago 27 8 1 0
Advertisement
Preview
DomainDemo: a dataset of domain-sharing activities among different demographic groups on Twitter - Scientific Data Scientific Data - DomainDemo: a dataset of domain-sharing activities among different demographic groups on Twitter

ICYMI, our DomainDemo dataset, which describes how different demographic groups share domains on Twitter, is now available to download!

πŸ“„ Data descriptor: doi.org/10.1038/s415...
πŸ“ˆ Interactive app to explore the data: domaindemo.info
πŸ’½ Dataset: doi.org/10.5281/zeno...

9 months ago 10 6 0 0

For more, check out the paper!
nature.com/articles/s41562-025-02223-4
arxiv.org/abs/2308.06459

10 months ago 0 0 0 0

For journalists and especially headline writers: even if a discrete piece of information is true, you've got to think carefully about whether the way you're presenting it is useful for promoting narratives that aren't.

10 months ago 0 0 1 0

Big picture: misleading claims are both *more prevalent* and *harder to moderate* than implied in current misinformation research. It's not as simple as fact-checking false claims or downranking/blocking unreliable domains. The extent to which information (mis)informs depends on how it is used!

10 months ago 0 0 1 0

If you want to advance misleading narratives β€” such as COVID-19 vaccine skepticism β€” supporting information from reliable sources is more useful than similar information from unreliable sources, if you have it.

10 months ago 1 1 1 0

This calls for a reconsideration of what misinformation is, how widespread it is, and the extent to which it can be moderated. Our core claim is that users are *using* information to promote their identities and advance their interests, not merely consuming information for its truth value.

10 months ago 1 0 1 0

We find that mainstream stories with high scores on this measure are significantly more likely to contain narratives present in misinformation content. This suggests that reliable information β€” which has a much wider audience β€” can be repurposed by users promoting potentially misleading narratives.

10 months ago 0 0 1 0

We do this by looking at co-sharing behavior on Twitter/X. We first identify users who frequently share information from unreliable sources, and then examine the information from reliable sources that those same users also share at disproportionate rates.

10 months ago 0 0 1 0

Our paper uses this dynamic β€” users strategically repurposing true information from reliable sources to advance misleading narratives β€” to move beyond conceptualizing misinformation as source reliability and measuring it by just counting sharing of / exposure to unreliable sources.

10 months ago 1 0 1 0
Washington Post article: screenshot of the headline "Vaccinated people now make up a majority of covid deaths"

Washington Post article: screenshot of the headline "Vaccinated people now make up a majority of covid deaths"

Take, for example, this headline from the Washington Post. The source is reliable and the information is, strictly speaking, true. But the people most excited to share this story wanted to advance a misleading claim: that the COVID-19 vaccine was ineffective at best.

10 months ago 2 0 1 0
Advertisement

But users who want to advance misleading claims likely *prefer* to use reliable sources when they can. They know others see reliable sources as more credible!

10 months ago 1 0 1 0

When thinking about online misinformation, we'd really like to identify/measure misleading claims; unreliable sources are only a convenient proxy.

10 months ago 0 0 1 0
Preview
Using co-sharing to identify use of mainstream news for promoting potentially misleading narratives - Nature Human Behaviour Goel et al. examine why some factually correct news articles are often shared by users who also shared fake news articles on social media.

In our new paper (w/ @jongreen.bsky.social , @davidlazer.bsky.social, & Philip Resnik), now up in Nature Human Behaviour (nature.com/articles/s41562-025-02223-4), we argue that this tension really speaks to a broader misconceptualization of what misinformation is and how it works.

10 months ago 11 6 1 1