Advertisement · 728 × 90

Posts by Stefan Baack

Preview
Wikipedia volunteers spent years cataloging AI tells. Now there's a plugin to avoid them. The web's best guide to spotting AI writing has become a manual for hiding it.

Some #generativeAI developers love to destroy the foundations of the tech they build. #WIkipedia is one of the most valuable sources of genAI training data. Undermining it is not just attacking a great common resource. It's also completely self-destructive arstechnica.com/ai/2026/01/n...

2 months ago 0 0 0 0
Preview
The Nonprofit Doing the AI Industry’s Dirty Work The web archive Common Crawl has been quietly funneling paywalled articles to AI companies—and lying to publishers about it.

A little-known nonprofit has been lying to news publishers while funneling millions of paywalled articles to tech companies for AI training. Read my investigation in The Atlantic. www.theatlantic.com/technology/2...

5 months ago 20 11 1 5

Check in if you're interested in my thoughts about what open source AI should aspire to be in relation to proprietary AI

6 months ago 3 2 0 0

"The update is yet another signal that payment processors...are currently the ultimate arbiter of what kind of content can be made easily available online, or not."

8 months ago 1 0 0 0

The key questions we always should ask when people talk about AI: What is being automated and why? @alexhanna.bsky.social @weizenbauminstitut.bsky.social

9 months ago 15 5 0 0

"AI is a labor disciplining device" @alexhanna.bsky.social

9 months ago 0 0 0 0
Post image

“The reporter is a man of critical value. No amount of money or effort spent in fitting the right men for this work could possibly be wasted, for the health of society depends upon the quality of the information it receives.” — Walter Lippmann [a century later, I’d swap “man” for “person” though]

10 months ago 3 1 0 0
Advertisement
Preview
(S+) Deepfake-Pornos: Das perfide Geschäft mit gefälschten Sexvideos Tausende Frauen werden Opfer von gefakten Pornos, in denen ihr Gesicht zu sehen ist. Betroffen sind minderjährige Mädchen, Prominente, Politikerinnen. Dahinter stecken skrupellose Geschäftsleute. Der ...

New Release! Most AI deepfakes aren't political. 90% of deepfakes are non-consensual intimate imagery. 99% of victims are women. Max Hoppensted, @rechercheur.bsky.social, @romanhoefner.bsky.social, and I uncover a deepfake community and the business behind undress apps www.spiegel.de/netzwelt/web...

1 year ago 32 22 2 1

"brainstorming and iteration is...a crucial everyday part of game development...and is not a problem to be solved...I have had many discussions with other game developers who interact with AI engineers and savants who believe our industry pipelines need 'fixing' by them and them alone"

1 year ago 0 2 0 1
Union will Informationsfreiheitsgesetz abschaffen: Frontalangriff auf Transparenz und Demokratie - FragDenStaat Das Portal für Informationsfreiheit für Bürger, Initiativen und Vereine. Stellen Sie eine IFG-Anfrage nach Behördendokumenten, die für Sie und Ihr Engagement wichtig sind! Informieren Sie sich über In...

Die Union will das Informationsfreiheitsgesetz abschaffen.
@arnesemsrott.bsky.social: „Öffentliche Kontrolle &Transparenz sind der Union offenbar ein Dorn im Auge. Sie will unbehelligt durchregieren. Rechte der Öffentlichkeit stören dabei offenbar."
Pressemitteilung: fragdenstaat.de/newsletter/a...

1 year ago 379 149 10 7

«By moving fast and breaking things, DOGE forces a collapse of the system where unanswered questions are met with technological solutions. Shifting the conversation to the technical is a way of locking policymakers and the public out of decisions and shifting that power to the code they write.»

1 year ago 38 10 0 2
Preview
You Can’t Post Your Way Out of Fascism Authoritarians and tech CEOs now share the same goal: to keep us locked in an eternal doomscroll instead of organizing against them, Janus Rose writes.

You can’t post your way out of fascism

Authoritarians and tech CEOs now share the same goal: to keep us locked in an eternal doomscroll instead of organizing against them

🔗 www.404media.co/you-cant-pos...

1 year ago 6192 2631 117 394
A bird's-eye view of a former Auschwitz II-Birkenau camp showing a wide dirt pathway flanked by parallel rows of barbed-wire fences. Groups of visitors walk along the path, surrounded by the remnants of brick structures and barracks, now reduced to foundations. Green grass contrasts with the somber history of the site, as the path leads toward a guard tower in the distance.

A bird's-eye view of a former Auschwitz II-Birkenau camp showing a wide dirt pathway flanked by parallel rows of barbed-wire fences. Groups of visitors walk along the path, surrounded by the remnants of brick structures and barracks, now reduced to foundations. Green grass contrasts with the somber history of the site, as the path leads toward a guard tower in the distance.

Auschwitz was at the end of a long process. It did not start from gas chambers.

This hatred was gradually developed by humans. From ideas, words, stereotypes & prejudice through legal exclusion, dehumanization & escalating violence... to systematic and industrial murder.

Auschwitz took time.

1 year ago 53083 22542 1058 1723

“AI is fake and sucks” vs “AI is real and dangerous” is a Twitter argument. In reality I think the debate also has a lot of “AI is real but not for how you’re using it,” to “AI is fake and that is dangerous,” to “things are happening to real people because of AI hype and that should stop.”

1 year ago 204 33 3 2
Post image

My reading for this week, delivered to me by the great
@aschrock.bsky.social
themself! Thank you, looking forward to reading :-)

1 year ago 4 1 1 0
Preview
Labelers training AI say they're overworked, underpaid and exploited by big American tech companies Digital workers in Kenya had to sift through horrific online content to train AI, but say they were underpaid, overworked, and got inadequate mental health support. So they're fighting back.

Labelers training AI say they're overworked, underpaid and exploited by big American tech companies

1 year ago 12 5 1 1
Post image

Dieser Report gibt Hoffnung!

Immer mehr neue, ambitionierte Medien haben sich in Deutschland und Europa gegründet. Medien mit dem Ziel, die Öffentlichkeit hochwertig zu informieren.

@netzwerkrecherche.org hat für den „Journalism Value Report“ 174 Medien in 31 Ländern befragt und kann zeigen:

1 year ago 38 17 1 1
Advertisement
Preview
How ChatGPT (Mis)represents Publisher Content ChatGPT search — which is positioned as a competitor to search engines like Google and Bing — launched with a press release from OpenAI touting claims that the company had “collaborated extensively wi...

I have a new piece out with @aisvarya17.bsky.social in @columjournreview.bsky.social in which we test how OpenAI's new search feature surfaces and attributes news content. Our findings were not promising for news publishers (1/9) www.cjr.org/tow_center/h...

1 year ago 176 85 8 24
Post image

“Without facts, you can’t have truth, and without truth, you can’t have trust”. - Maria Ressa, 2021 Nobel Peace Prize

1 year ago 2 2 0 0

The Onion should buy Elsevier next

1 year ago 5377 1581 57 82

It ended well though. He got the job, and still has it. We met recently 😅

2 years ago 1 0 0 0

I still remember when a friend asked for advice about getting a job I intended to apply for

2 years ago 2 0 1 0

Long term, there should be less reliance on sources like Common Crawl and a bigger emphasis on training generative AI on datasets created and curated by people in equitable and transparent ways (10/10)

2 years ago 2 0 0 0

A key issue is that filtered Common Crawl versions are not updated after their original publication to take feedback and criticism into account. Therefore, we need dedicated intermediaries tasked with filtering Common Crawl in transparent and accountable ways that are continuously updated (9/10)

2 years ago 1 0 1 0

AI builders should put more effort into filtering Common Crawl, establish industry standards and best practices for end-user products to reduce potential harms when using Common Crawl or similar sources for training data (8/10)

2 years ago 2 0 1 0

Both Common Crawl and AI builders can help making generative AI less harmful. Common Crawl should highlight the limitations and biases of its data, be more transparent and inclusive about its governance, and enforce more transparency by requiring AI builders to attribute using Common Crawl (7/10)

2 years ago 2 0 1 0
Advertisement

Due to Common Crawl’s deliberate lack of curation, AI builders need to filter it with care, but such care is often lacking. Popular filtered versions like C4 are especially problematic as the filtering techniques used to create them are simplistic and leave lots of harmful content untouched (6/10)

2 years ago 2 0 1 0
Preview
Most Top News Sites Block AI Bots. Right-Wing Media Welcomes Them Nearly 90 percent of top news outlets like 'The New York Times' now block AI data collection bots from OpenAI and others. Leading right-wing outlets like NewsMax and Breitbart mostly permit them.

In addition, relevant domains like Facebook and the New York Times block Common Crawl from crawling most (or all) of their pages. These blocks are increasing, creating new biases in the crawled data www.wired.com/story/most-n... (5/10)

2 years ago 2 0 1 0

Common Crawl archive is massive, but far from being a “copy of the internet.” Its crawls are automated to prioritize pages on domains that are frequently linked to, making digitally marginalized communities less likely to be included. Moreover, most captured content is English (4/10)

2 years ago 2 0 1 0

Using Common Crawl's data does not easily align with trustworthy and responsible AI development because Common Crawl deliberately does not curate its data. It doesn't remove hate speech, for example, because it wants its data to be useful for researchers studying hate speech (3/10)

2 years ago 4 0 1 0