Advertisement · 728 × 90

Posts by Michael Khoo

Video

74% (!!) of claims about AI's ability to help with climate change are unproven.

Nearly 1/3 *don't cite any evidence at all*. #GreenSky

1 month ago 10 2 1 0
Preview
The Great Green AI Hoax Machine If AI companies want to claim they are helping solve the climate crisis, they should be required to show their math, writes Friends of the Earth's Michael Khoo.

🚨New from @techpolicypress.bsky.social: The Great Green AI Hoax Machine

@michaelkhoo.bsky.social looks at how Silicon Valley’s claims that AI will slash emissions & “fix the climate” stack up against AI’s exploding energy demand.

Spoiler alert: the math doesn’t add up.

buff.ly/MlsPH0L

1 month ago 3 1 0 0

Frame: the left is hurting itself by not engaging with the god machines

Negation: but this technology doesn't do the things you claim

kirby: the people writing these stories get funding to support narratives of power and inevitability, and that's what we should be scrutinizing

2 months ago 298 80 5 5
From: jeffrey E. [jeevacation@gmail.com]
Sent: 6/24/2018 1:59:24 PM
To: Thorbjen Jagland
Re:

churkin was great. he understood trump after our conversations. it is not complex. he must be seen to get something
its that simple.

On Sun, Jun 24, 2018 at 3:28 PM, Thorbjon Jagland <

I'll meet Lavrovs assistant on Monday and will suggest

Thank you fo a lovely evening. I'll com to un high level week

› wrote:
søn. 24. jun. 2018 kl. 15:18 skrev jeffrey E. < jecvacation@gmail.com>:

I think you might suggest to putin, that lavrov, can get insight on talking to me. vitaly churkin used to but he died.?!

From: jeffrey E. [jeevacation@gmail.com] Sent: 6/24/2018 1:59:24 PM To: Thorbjen Jagland Re: churkin was great. he understood trump after our conversations. it is not complex. he must be seen to get something its that simple. On Sun, Jun 24, 2018 at 3:28 PM, Thorbjon Jagland < I'll meet Lavrovs assistant on Monday and will suggest Thank you fo a lovely evening. I'll com to un high level week › wrote: søn. 24. jun. 2018 kl. 15:18 skrev jeffrey E. < jecvacation@gmail.com>: I think you might suggest to putin, that lavrov, can get insight on talking to me. vitaly churkin used to but he died.?!

Now we know. Epstein emails may explain why Trump never pressures Russia.

They reveal deep links between Epstein, former Norwegian PM Jagland, & Moscow, pointing to a blackmail operation that likely “compromises”Trump.

1 month ago 55 37 6 1
Post image

Big Tech never met a climate claim it couldn’t greenwash.

While AI-driven electricity demand throws the fossil fuel industry a lifeline, we’re told AI can help solve the climate crisis.

The reality? 74% of claims about AI climate benefits are unproven.
bit.ly/AIGreenwash

2 months ago 4 4 1 0
Preview
AI data centre surge would put UK’s climate change targets at risk MPs call for ‘national conversation’ on potential drawbacks of sharp rise in data centres, which would use more power than the UK uses at its peak

⚡️ The data centres needed to power the govt's AI revolution would use more electricity than the UK consumes at peak.

According to Ofgem, around 140 sites are seeking grid connections totalling 50GW (vs 45GW today), a surge MPs warn could put Britain’s climate targets at risk.
buff.ly/Q2N6pbC

1 month ago 3 4 1 0

The new AI Climate Hoax, brought to you by the same silicon valley people who said social media would bring people together. 74% of these greenwashing claims are unproven.

Very proud to be part of this project led by @ketanjoshi.co with @foeus.bsky.social @caadcoalition.bsky.social

2 months ago 18 5 0 0
Preview
Claims that AI can help fix climate dismissed as greenwashing Industry using ‘diversionary’ tactics, says analyst, as energy-hungry complex functions such as video generation and deep research proliferate

💡Claims that AI can help fix climate have been dismissed as greenwashing in a new report by @ketanjoshi.co @stand.earth @beyondfossilfuels.bsky.social @foeus.bsky.social

via @theguardian.com www.theguardian.com/technology/2...

2 months ago 17 14 0 0

This is fantastic, and it's something that we've been saying for years.

When companies say that are using "AI for climate change", they are referring to much smaller models used for doing things like climate modeling.

Massive generative AI models are NOT USEFUL for mitigating climate change.

2 months ago 141 50 6 1
Advertisement
Post image

AI data centers are throwing the fossil fuel industry a lifeline.

"But AI will solve climate change, right?" Wrong.

@stand.earth's new report authored by independent researcher @ketanjoshi.co clearly lays out Big Tech's AI climate hoax. See for yourself here: stand.earth/resources/ai...

2 months ago 39 14 1 0

There's a parliamentary COUP taking place in Brazil right now, aided by Meta – who is shadow banning left wing Instagram profiles, including president Lula's.

4 months ago 78 34 1 3

I absolutely agree it's easily absorbed. But it was nearly their max and good to see the law is going to be used. I measure its effectiveness in Vance's pearl-clutching.

4 months ago 3 0 1 0

The 2nd best part of this decision, is that it shows the political/media pundits are full of sh*t in predicting the EU would fall to JD Vance and Peter Thiel's penis waving. Also, can we pls stop debunking the fascists' framing "censrhip"?
This is simply justice served and basic accountability.

4 months ago 2 1 1 0
CLINB: A Climate Intelligence Benchmark for
Foundational Models
Michelle Chen Huebscher1
, Katharine Mach2
, Aleksandar Stanić1
, Markus Leippold1,3, Ben Gaiarin1
, Zeke
Hausfather4
, Elisa Rawat , Erich Fischer5
, Massimiliano Ciaramita1
, Joeri Rogelj6
, Christian Buck1
, Lierni
Sestorain Saralegui1 and Reto Knutti5
1Google DeepMind, 2University of Miami, 3University of Zurich, 4Stripe, 5ETH Zurich, 6
Imperial College London
Evaluating how Large Language Models (LLMs) handle complex, specialized knowledge remains a
critical challenge. We address this through the lens of climate change by introducing CLINB, a benchmark that assesses models on open-ended, grounded, multimodal question answering tasks with clear
requirements for knowledge quality and evidential support. CLINB relies on a dataset of real users’
questions and evaluation rubrics curated by leading climate scientists. We implement and validate a
model-based evaluation process and evaluate several frontier models. Our findings reveal a critical
dichotomy. Frontier models demonstrate remarkable knowledge synthesis capabilities, often exhibiting PhD-level understanding and presentation quality. They outperform “hybrid" answers curated
by domain experts assisted by weaker models. However, this performance is countered by failures
in grounding. The quality of evidence varies, with substantial hallucination rates for references and
images. We argue that bridging this gap between knowledge synthesis and verifiable attribution is
essential for the deployment of AI in scientific workflows and that reliable, interpretable benchmarks
like CLINB are needed to progress towards building trustworthy AI systems.

CLINB: A Climate Intelligence Benchmark for Foundational Models Michelle Chen Huebscher1 , Katharine Mach2 , Aleksandar Stanić1 , Markus Leippold1,3, Ben Gaiarin1 , Zeke Hausfather4 , Elisa Rawat , Erich Fischer5 , Massimiliano Ciaramita1 , Joeri Rogelj6 , Christian Buck1 , Lierni Sestorain Saralegui1 and Reto Knutti5 1Google DeepMind, 2University of Miami, 3University of Zurich, 4Stripe, 5ETH Zurich, 6 Imperial College London Evaluating how Large Language Models (LLMs) handle complex, specialized knowledge remains a critical challenge. We address this through the lens of climate change by introducing CLINB, a benchmark that assesses models on open-ended, grounded, multimodal question answering tasks with clear requirements for knowledge quality and evidential support. CLINB relies on a dataset of real users’ questions and evaluation rubrics curated by leading climate scientists. We implement and validate a model-based evaluation process and evaluate several frontier models. Our findings reveal a critical dichotomy. Frontier models demonstrate remarkable knowledge synthesis capabilities, often exhibiting PhD-level understanding and presentation quality. They outperform “hybrid" answers curated by domain experts assisted by weaker models. However, this performance is countered by failures in grounding. The quality of evidence varies, with substantial hallucination rates for references and images. We argue that bridging this gap between knowledge synthesis and verifiable attribution is essential for the deployment of AI in scientific workflows and that reliable, interpretable benchmarks like CLINB are needed to progress towards building trustworthy AI systems.


Total Reference URLs Generated
claude-opus-4-1
claude-sonnet-4
gpt-5
hybrid
gemini-2.5-pro
gemini-2.5-flash
o3
0.0
0.2
0.4
0.6
0.8
1.0
Proportion
Reference URL Status
hybrid
gemini-2.5-pro
claude-opus-4-1
o3
gemini-2.5-flash
claude-sonnet-4
0
200
400
600
800
1000
Count of URLs
Total Image URLs Generated
hybrid
claude-opus-4-1
gemini-2.5-flash
gemini-2.5-pro
claude-sonnet-4
o3
0.0
0.2
0.4
0.6
0.8
1.0
Proportion
Image URL Status
Status
OK
INACCESSIBLE_CONTENT
INVALID_URL
ERROR
Figure 3 | Number of reference (top), and image (bottom), URLs and their status.
Ablations We perform several ablation studies with the autorater (Table 4). Notably, removing
the question-specific rubrics from the prompt changes the results only in the bottom half, with the
Hybrid answers overtaken by Gemini 2.5 Flash and Claude Sonnet 4. This suggests that the additional
resolution provided by the rubrics applies primarily to the kind of responses used to develop the
rubrics. Or, in other words, that rubrics are far from complete. Hence, it is important that rubrics
adapt to new data as better models become availab

Total Reference URLs Generated claude-opus-4-1 claude-sonnet-4 gpt-5 hybrid gemini-2.5-pro gemini-2.5-flash o3 0.0 0.2 0.4 0.6 0.8 1.0 Proportion Reference URL Status hybrid gemini-2.5-pro claude-opus-4-1 o3 gemini-2.5-flash claude-sonnet-4 0 200 400 600 800 1000 Count of URLs Total Image URLs Generated hybrid claude-opus-4-1 gemini-2.5-flash gemini-2.5-pro claude-sonnet-4 o3 0.0 0.2 0.4 0.6 0.8 1.0 Proportion Image URL Status Status OK INACCESSIBLE_CONTENT INVALID_URL ERROR Figure 3 | Number of reference (top), and image (bottom), URLs and their status. Ablations We perform several ablation studies with the autorater (Table 4). Notably, removing the question-specific rubrics from the prompt changes the results only in the bottom half, with the Hybrid answers overtaken by Gemini 2.5 Flash and Claude Sonnet 4. This suggests that the additional resolution provided by the rubrics applies primarily to the kind of responses used to develop the rubrics. Or, in other words, that rubrics are far from complete. Hence, it is important that rubrics adapt to new data as better models become availab

A New Expert-Grounded Benchmark for Scientific AI We introduce CLINB, a benchmark for modelbased evaluation of frontier models on complex, multimodal scientific communication. Its core is a
new dataset of real-world climate questions paired with data-driven, question-specific evaluation rubrics,
curated and validated by leading climate scientists through a novel three-phase, human-in-the-loop
process.2
PhD-Level Synthesis vs. Attribution Failures Frontier models demonstrate remarkable knowledge
synthesis, often exhibiting a PhD-level understanding. However, this performance masks a critical
inadequacy in grounding. We report substantial hallucination rates for references (10% to 25%)
and even more failures for images (50% to 80% in certain settings), exposing a major gap between
synthesis and verifiable attribution.
Insights into Human-AI Collaboration Dynamics Autonomous frontier models surpass ’hybrid’
answers (curated by experts using weaker AI assistance), revealing the assisting model’s capability—not
human oversight—as the primary bottleneck. Counter-intuitively, highly motivated non-specialists
(our ’Advocates’) who deeply engage with AI tools can produce higher-quality answers than domain
experts who engage less with AI during answer curation.
A Validated Methodology for Scalable Oversight We validate a rigorous, rubric-based autorater.
Ablation studies demonstrate that structured prompts and automated evidence-checking are essential
for mitigating inherent LLM judge biases. This process is hampered by inaccessible sources (up to
50%). Furthermore, we identify evaluation challenges, including model familiarity bias in human
raters and the limitations of rubrics to generalize across models.

A New Expert-Grounded Benchmark for Scientific AI We introduce CLINB, a benchmark for modelbased evaluation of frontier models on complex, multimodal scientific communication. Its core is a new dataset of real-world climate questions paired with data-driven, question-specific evaluation rubrics, curated and validated by leading climate scientists through a novel three-phase, human-in-the-loop process.2 PhD-Level Synthesis vs. Attribution Failures Frontier models demonstrate remarkable knowledge synthesis, often exhibiting a PhD-level understanding. However, this performance masks a critical inadequacy in grounding. We report substantial hallucination rates for references (10% to 25%) and even more failures for images (50% to 80% in certain settings), exposing a major gap between synthesis and verifiable attribution. Insights into Human-AI Collaboration Dynamics Autonomous frontier models surpass ’hybrid’ answers (curated by experts using weaker AI assistance), revealing the assisting model’s capability—not human oversight—as the primary bottleneck. Counter-intuitively, highly motivated non-specialists (our ’Advocates’) who deeply engage with AI tools can produce higher-quality answers than domain experts who engage less with AI during answer curation. A Validated Methodology for Scalable Oversight We validate a rigorous, rubric-based autorater. Ablation studies demonstrate that structured prompts and automated evidence-checking are essential for mitigating inherent LLM judge biases. This process is hampered by inaccessible sources (up to 50%). Furthermore, we identify evaluation challenges, including model familiarity bias in human raters and the limitations of rubrics to generalize across models.

Deeply absurd. This Google PDF published on a blog (arxiv, not peer reviewed) claims an LLM is "PhD level" but in most cases the MAJORITY of reference URLs were invalid or inaccessible.

A PhD sitting down and just fabricating >50% of sources = career ending

arxiv.org/abs/2511.11597

4 months ago 367 86 8 6
Post image

António Guterres: info integrity is vital at COP30:

“We cannot achieve climate action without information integrity. We must preserve both the information environment necessary for democratic decision-making and the global cooperation essential for addressing the climate crisis."

buff.ly/81zBNzH

5 months ago 19 11 2 1
Preview
Countries commit to tackling climate disinformation at COP30 It is the first time states have formally committed to information integrity and fighting back against climate disinformation.

Good that Canada joins coalition to fight against climate disinformation (so maybe don't weaken greenwashing law?) www.euronews.com/green/2025/1...

5 months ago 5571 1245 98 40
Preview
At COP, disinformation is the next crisis to tackle If Canada steps up and joins co-signers like the U.K., France, and Spain, others will follow. Doing so would put the country’s resilience, strength, and democratic values on full display. This is how…

After years leading FEMA's external affairs through environmental disasters like Hurricane Helene, Justin Ángel Knighten warns that tackling disinformation must be a priority.

buff.ly/fgoWizG

🧵 1/2

5 months ago 5 3 1 0

🚨 Our new report, Deny, Deceive, Delay: Demystified, is out now. 🚨

The report explores how Big Carbon and Big Tech use disinformation to sabotage climate action and why, despite 89% of people worldwide demanding stronger action, progress gets derailed.

5 months ago 8 12 0 2
Advertisement
Preview
Macron and Lula warn of the dangers of climate disinformation ahead of COP30 Headings

Macron and Lula warn of the dangers of climate disinformation ahead of COP30

"Climate disinformation today threatens our democracies, the Paris agenda and therefore our selective security"

5 months ago 5 4 0 1

Kinda impressive how OpenAI has managed to occupy a seemingly quantum state between "imperial project", "non-profit research foundation", and "criminal enterprise".

5 months ago 258 50 5 0

We will have Nuremberg trials over this, and their many other crimes.

5 months ago 1 0 0 0
Post image

Using a government jet to fly you to a girlfriends country music gig/wrestling match and then covering it up and rage tweeting about it on X is sort of a microcosm of Trumpism. thebulwark.com/p/kash-patel-fbi-director-private-jet-problem-nashville

5 months ago 719 192 33 9
Preview
Elon Musk is boosting the British right - and this shows how Elon Musk is boosting the British right - and this shows how

Vital piece of investigative reporting from Sky. They've uncovered the X algorithm which feeds users extremist right wing material from the moment they join the site. It is a far-right radicalisation engine, by design.

news.sky.com/story/the-x-...

5 months ago 6343 3553 236 449

this is a reminder that we dont have to settle for newsom in 2028

5 months ago 18544 4794 219 186

There are more of us than there are of them.

5 months ago 32626 5182 439 253

you love to see it

5 months ago 536 74 9 1

Good things are possible and we don’t have to settle.

5 months ago 46687 8083 248 217
Preview
World ‘very likely’ to exceed 1.5C climate goal in next decade: UN Despite Paris Agreement pledges, countries &#039;have landed off target&#039; on climate goals multiple times, the UN warns.

Okay, yes, humanity did not enact the single best possible outcome in response to the single worst problem we have ever faced as a species

In no way was it wrong to try, and in no way is it wrong to continue trying to jam a wrench in the greedy fossil fuel economy. Everything is still on the table

5 months ago 324 110 9 4
Advertisement

Maybe pundits should spend more time in densely packed, left leaning urban areas where the "real Americans" live

5 months ago 4336 699 29 17