Our new paper is out today in @pnasnexus.org with colleagues at Yale (@matthewshu.com, Danny Karell, @keitarookura.bsky.social)
We wanted to understand how using AI-generated summaries to learn about history influenced attitudes compared to existing resources like Wikipedia. 1/4
Posts by WikiResearch
Finally blogged about my paper (led by @zarine.net) that seeks to explain why Croatian Wikipedia spent a decade captured by a cabal of political extremists and became a site for Holocaust revisionism, while other similar Wikipedia languages seemed to have fared much better. mako.cc/copyrighteou...
Very happy to share that our new paper, "Interoperability as Equity: Collaborative Cultural Heritage Knowledge Graphs as a Tool to Shape Inclusive Ontologies" is out! We are discussing linked open data, ontologies, Wikidata and interoperability with CIDOC CRM
doi.org/10.5334/johd...
A paper screenshot: Refractive datasets as a sensemaking methodology in closed data ecosystems Anna Beers, Viviane Ito, Agustin Orozco, Patrick Gildersleve, Pablo Aragón, and Francesca Tripodi Abstract As digital platforms restrict their APIs, researchers face diminishing options for studying social phenomena in digital environments. During what has been called the post-API era, researchers have found themselves looking for reliable data sources in an unreliable and frequently changing platform data ecosystem. In this context, we propose analyzing refractive datasets as a methodology for researchers to understand the dynamics of closed data platforms. Refractive datasets come from platforms with relatively more open data policies, and their analysis sheds light on platforms with more restrictive data policies. Like a prism, refractive datasets reflect but also transform data-based phenomena unfolding on closed platforms. Using refractive datasets from Wikipedia and Google Trends, we present three studies to demonstrate our methodology. We first show how refractive data from Wikipedia's multiple language editions can be used to understand a fractured global platform ecosystem in a case study of hydroxychloroquine, a purported COVID-19 medicine. Second, we use Google Trends to show how similar refractive analyses can be used to understand information lost to platform deletion, in a profile of an online panic over the drug brand Galaxy Gas. Finally, we show how Wikipedia data can be used as a grounding point for a refractive analysis of how new generative algorithms reproduce and distort data across the social web. We discuss how refractive datasets can be a way for researchers to “sensemake” in increasingly opaque big data environments, enabling interpretivist analyses which aim to generate new hypotheses rather than verify existing claims.
Happy 25th birthday to Wikipedia! 🥳
A fitting moment to share
1. Their great site to mark the occasion: wikipedia25.org
2. A paper in Big Data & Society, published over the winter break, where we develop Wikipedia as a “Refractive Dataset”, led by @beeeeeers.bsky.social: doi.org/10.1177/2053...
ICYMI: Finally blogged about an old paper led by @kayleachampion.bsky.social that developed a new method (forensic qualitative analysis) to understand the nature and value of @torproject.org users' contributions to @wikipedia.org. mako.cc/copyrighteou...
"The Block Log: 20 Years of Content Moderation on Wikipedia" rhododendrites.com/pdfs/The%20B...
ICYMI: Finally blogged about an "old" paper led by @groceryheist.cc that uses data from a @wikipedia.org system to show how the introduction of a biased AI flagging system can still lead to more fairness because the humans without the system are even more biased. mako.cc/copyrighteou...
Congratulations to @tiagolubiana.bsky.social for shepherding our paper to publication & to all my wonderful co-authors. "Wiki Loves iNaturalist: How Wikimedians Integrate iNaturalist Content on Wikipedia, Wikidata, and Wikimedia Commons" can be read here doi.org/10.3897/biss...
I’m chuffed to share that I’ve been awarded this grant with @ftripodi.bsky.social and Brett Zehner 🥳
We’ll be studying how AI systems may reproduce or reinforce biases in Wikipedia, whether by extracting knowledge from the platform or by contributing content back to it. Excited to get started!
A few updates on our Grokipedia analysis: we expanded our sample to 20,000 most edited articles on Wikipedia. Linguistic & stylistic differences are the same as reported before (Generally, Grokipedia articles are longer, more difficult to read, and less referenced.)
@wikiresearch.bsky.social
abstract of the paper "What did Elon change? A comprehensive analysis of Grokipedia" Elon Musk released Grokipedia on 27 October 2025 to provide an alternative to Wikipedia, the crowdsourced online encyclopedia. In this paper, we provide the first comprehensive analysis of Grokipedia and compare it to a dump of Wikipedia, with a focus on article similarity and citation practices. Although Grokipedia articles are much longer than their corresponding English Wikipedia articles, we find that much of Grokipedia's content (including both articles with and without Creative Commons licenses) is highly derivative of Wikipedia. Nevertheless, citation practices between the sites differ greatly, with Grokipedia citing many more sources deemed "generally unreliable" or "blacklisted" by the English Wikipedia community and low quality by external scholars, including dozens of citations to sites like Stormfront and Infowars. We then analyze article subsets: one about elected officials, one about controversial topics, and one random subset for which we derive article quality and topic. We find that the elected official and controversial article subsets showed less similarity between their Wikipedia version and Grokipedia version than other pages. The random subset illustrates that Grokipedia focused rewriting the highest quality articles on Wikipedia, with a bias towards biographies, politics, society, and history. Finally, we publicly release our nearly-full scrape of Grokipedia, as well as embeddings of the entire Grokipedia corpus.
back again to share a new preprint from me and @mantzarlis.com! “What did Elon Change? A comprehensive analysis of Grokipedia” arxiv.org/abs/2511.09685
I had seen many spot analyses of individual grokipedia pages, but I was curious: how was grokipedia made? what did Elon change from wikipedia?
Key points in new Cornell Tech research:
56% of Grokipedia entries carry the Wikipedia CC license, suggesting wholesale ingestion
Grokipedia’s top 100 sources include fewer news outlets and more UGC (e.g. LinkedIn scraping)
Grokipedia has fewer citations overall, making it harder to check sources
Wikidata Map in 2025
Another year, another map, and another Birthday for Wikidata. Last generated in 2024 by @tarrow and @outdooracorn, this year I have put the work in just ahead of the 13th Wikidata birthday to have a look at what's changed in terms of items with coordinates this past year on…
#Grokipedia set out to “fix” #Wikipedia.
Turns out it mostly rewrites it, longer, slicker, less sourced.
Fluent, but fragile. @wikiresearch.bsky.social
"Investigating extreme cases in Wikipedia talk pages: Some insights on user behaviours"
uplopen.com/chapters/e…
e.g. "the most prolific users, the longest threads (in terms of total duration, number of posts or number of distinct users involved) and the longest monologues"
Seredinski, A., Litchock-Morellato, F., Lange, A. et al. Using a Wikipedia edit-a-thon as a cross-curricular STEM representation assignment. Discov Educ 4, 368 (2025). doi.org/10.1007/s442... #OpenAccess
"Demographic disparity in Wikipedia coverage: a global perspective" (top 12 languages) epjdatascience.springeropen.com/articles/1…
- Women slightly overrepresented (not underrepresented) among living article subjects since ~2015, but still have shorter articles
- Developing countries overrepresented
"Investigating How LLMs Impact Participation in [Wikipedia]" (interviewing 16 editors) https://arxiv.org/abs/2509.07819v1
ChatGPT etc "enhance contribution quality for experienced editors" & "lower entry barriers for newcomers", but newbies struggle to align LLM outputs w Wikipedia policies
The Graphic User Interface of WikiTextGraph
New paper alert: WikiTextGraph – an open-source Python package for extracting the text and building multilingual Wikipedia link networks.
With: @gustavoschwartz.bsky.social , Juan Luis Suárez
Paper: openresearchsoftware.metajnl.com/articles/10....
@wikiresearch.bsky.social #wikipedia #software
With the school year approaching, a number of scholars and myself have assembled together a Critical Wikimedia Research Bibliography. If you are teaching a course or doing research, we think you might find some good resources here. meta.wikimedia.org/wiki/Critica...
I am pleased to announce the launch of the Manifesto for Wikimedia Research manifesto.wiki. As my co-authored Big Data & Society commentary explains, the manifesto is dedicated to a humanist and critical tradition of taking Wikipedia's importance seriously. journals.sagepub.com/doi/10.1177/...
Presenter (Patrick Gildersleve) in front of a screen summarising the WikiReddit Dataset project. The slide describes it as "Every Wikipedia mention and link on Reddit, 2020-2023", includes some example usage, describes the scale of the dataset, and offers suggested use cases.
Had a great time meeting everyone and seeing all the interesting work @icwsm.bsky.social. I presented our study on the Wikireddit dataset - exploring Wikipedia’s role in fact-checking, discussion, and cross-platform attention on the web. Thank you to the organisers!
📄: ojs.aaai.org/index.php/IC...
UW published this really nice article about my work on governance challenges and lifecycles faced by peer-produced online communities—the work supported by my NSF CAREER grant. Check it out if you want to know what I've been thinking about and working on!
Desambiguación en Wikipedia: exploración de los mecanismos de control de autoridades en la enciclopedia colaborativa por @florenciac.bsky.social y @tsaorin.bsky.social en #revistainfonomy
doi.org/10.3145/info...
#Controldeautoridades #Vocabularioscontrolados #Wikipedia
Been a hectic semester for me but made it through 😊 a few updates
Had a blast as a GSI for @dbamman.bsky.social NLP class. Was a wonderful experience 💃
Won the Wikipedia Foundation Research of The Year Award for our CHI paper(doi.org/10.1145/3613...) with @schasins.bsky.social and John Canny
findings: (1) Wikipedia is most frequently cited by news and science websites for informational purposes, while commercial websites reference it less often. (2) The majority of Wikipedia links appear within the main content rather than in boilerplate [3/5 of https://arxiv.org/abs/2505.15837v1]
Whipped up a #WikiWorkshop 2025 recap blog post here: rhododendrites.com/posts/WikiWo... @wikiresearch.bsky.social Some really interesting tools, methods, and studies over the last couple days!
Well this is good timing. @wikiworkshop.bsky.social starts today and my paper that I presented in previous years has just been published this morning. doi.org/10.1177/1461.... We describe how hatnotes on policy pages are incredibly important techniques for ascribing different forms of authority.
A recent ADL report claimed to find broad, systemic evidence of antisemitism on Wikipedia, prompting two dozen members of Congress to call into question the site's approach to moderating content related to Jews.
Some researchers cited by the ADL say their findings have been misconstrued.