Advertisement · 728 × 90

Posts by Lukasz Kaiser

Post image Post image

If you ever want to see a really interesting AI thinking trace, push it really hard on literature or poetry suggestions.

Here is Claude 4.6 Opus working through poetry in its reasoning when I asked it to find something that captures the feeling of AI while avoiding its usual favorites (eg Rilke)

1 month ago 48 4 4 0

Don't tell anyone but such courses is the one place where I find AI browsers like Atlas from OpenAI very useful, it may take it for you ;)

3 months ago 1 0 0 0
Post image

Benchmarks from historians show that AI transcription from handwriting is now better than human, and a very cheap model is as good as people.

There are now massive troves of documents that could be made available for research that would have been impossible or prohibitive to transcribe before.

4 months ago 93 10 3 3
Preview
rl-wrong-about-rewards.md GitHub Gist: instantly share code, notes, and snippets.

I complain a lot about RL lately, and here we go again.

The CS view of RL is wrong in how it thinks about rewards, already at the setup level. Briefly, the reward computation should be part of the agent, not part of the environment.

More at length here:

gist.github.com/yoavg/3eb3e7...

4 months ago 14 2 2 0
Geordie Williamson: Neural Networks for Mathematical Discovery (October 29, 2025)
Geordie Williamson: Neural Networks for Mathematical Discovery (October 29, 2025) YouTube video by Simons Foundation

How can we use neural networks to bolster mathematical discovery? Geordie Williamson's @simonsfoundation.org Presidential Lecture is online, catch up now:
www.youtube.com/watch?v=Uxr_...

5 months ago 7 2 0 0
Post image

Fresh on the arXiv: @booleananalysis.bsky.social, Kewen Wu, and I present new classical algorithms for the Short Integer Solution problem (under infinity norm) that outperform the elegant Chen-Liu-Zhandry quantum algorithm, showing that there is no exponential quantum speed up anymore.

6 months ago 18 3 3 0
Post image Post image Post image

We are starting to see some nuanced discussions of what it means to work with advanced AI in its current state

In this case, GPT-5 Pro was able to do novel math, but only when guided by a math professor (though the paper also noted the speed of advance since GPT-4)

The reflection is worth reading.

7 months ago 91 14 3 1
Advertisement

A fully autonomous robot which, every morning, sets plates on the table, fetches ingredients in the kitchen, and prepares avocado toast.

"Move things and breakfast."

10 months ago 53 6 3 1

(In case you hadn't been following, the environmental impact of current AI models is now much lower, generating 100,000 words with AI uses less power than watching Netflix for 45 minutes on your TV)

11 months ago 30 7 2 0

If you haven't done this with o3, you haven't really seen what these models can do.

11 months ago 30 1 1 0
Skeet from Ann Leckie reading: "Say it after me: Chat GPT is not a search engine. It does not scan the web for information, it just generates statistically likely sentences. You cannot use it a search engine, or as a substitute for searching.

Now. Please never use an LLM for information searches ever again."

Skeet from Ann Leckie reading: "Say it after me: Chat GPT is not a search engine. It does not scan the web for information, it just generates statistically likely sentences. You cannot use it a search engine, or as a substitute for searching. Now. Please never use an LLM for information searches ever again."

This is one of the most-shared posts on Bluesky in the past day and it's just completely false. You might think ChatGPT is a *bad* search engine, or prefer another search engine. But it has had integrated web search since last year.

11 months ago 1889 201 88 260
Post image Post image Post image

"o3, You are a consultant hired by the Dark Lord, analyze the org chart of Mordor. How would you improve it for today's changing Middle Earth"

o3 does some actual satire, ending with: “One Org to rule them all, One Org to find them, One Org to bring them all, And in the darkness, align them.”

11 months ago 151 17 9 6
Preview
GPT Finally Jumped Out of the System OpenAI’s o4-mini-high Model Solves the MU Puzzle and Demonstrates Why

For years I've been throwing the same puzzle challenge at new GPT models. Every one has failed, until now.

matthodges.com/posts/2025-0...

1 year ago 45 7 4 1

Oh, I see!! Yes, the id is totally unnecessary in this place. Probably a leftover or compatibility for the other API where you don't repeat everything on each call. Sorry for the confusion!!

1 year ago 1 0 0 0
Preview
OpenAI Platform Explore developer resources, tutorials, API docs, and dynamic examples to get the most out of OpenAI's platform.

Which API exactly is this? Is it function calling in OpenAI Responses API? Do you really need to send the whole history? The weather example doesn't seem to do it? platform.openai.com/docs/guides/function-calling?api-mode=responses

1 year ago 0 0 1 0

Now that's a good reason to ask why...

1 year ago 1 0 0 0
Advertisement

Isn't it because it's happening asynchronously over the network on different machines possibly for many tool innovations and chats in parallel and the id makes sure you find the path to the right place?

1 year ago 0 0 1 0
Post image

Exciting news: @waymo.bsky.social is beginning public service on the Peninsula, starting with Palo Alto, Mountain View, and Los Altos! Initial service area below.

1 year ago 71 6 4 3
Preview
Opinion | There Is a Liberal Answer to Elon Musk (Gift Article) Right-wing populism thrives on scarcity. The answer is abundance. But a politics of abundance will work only if Democrats confront where their approach has failed.

www.nytimes.com/2025/03/09/o...

1 year ago 305 58 31 19
Preview
Dn Dguild Hall Simulator

Try it or improve it: chatgpt.com/canvas/share...

1 year ago 11 1 0 0
Video

This was fun: "o1, build a simulator of a D&D guild hall. Persistent characters come in, get quests, interact with each other, leave & return, make it procedurally generated"

I kept asking it to add other ideas (relationships, etc) 8 times, got no errors, just worked each time. Desire-based coding!

1 year ago 62 5 10 0

The NIH overhead cut doesn't just hurt universities.

It's deadly to the US economy.

The US is a world leader in tech due to the ecosystem that NIH and NSF propel. It drives innovation for tech transfer, creates a highly-skilled sci/tech workforce, and fosters academic/industry crossfertilization.

1 year ago 1344 511 30 20
Line graph time series of 2025's daily Arctic sea ice extent compared to decadal averages from the 1980s to the 2010s. The decadal averages are shown with different colored lines with purple for the 1980s, blue for the 1990s, green for the 2000s, and white for the 2010s. Thin white lines are also shown for each year from 2000 to 2024. 2025 is shown with a thick gold line. There is a long-term decreasing trend in ice extent for every day of the year shown on this graph between January and April by looking at the decadal average line positions.

Line graph time series of 2025's daily Arctic sea ice extent compared to decadal averages from the 1980s to the 2010s. The decadal averages are shown with different colored lines with purple for the 1980s, blue for the 1990s, green for the 2000s, and white for the 2010s. Thin white lines are also shown for each year from 2000 to 2024. 2025 is shown with a thick gold line. There is a long-term decreasing trend in ice extent for every day of the year shown on this graph between January and April by looking at the decadal average line positions.

Saturday ice update - #Arctic sea ice extent is currently the *lowest* on record (JAXA data)

• about 790,000 km² below the 2010s mean
• about 1,450,000 km² below the 2000s mean
• about 2,040,000 km² below the 1990s mean
• about 2,430,000 km² below the 1980s mean

Plots: zacklabe.com/arctic-sea-i...

1 year ago 241 158 11 15

Neither read nor wrote, no illegal access at all!!

1 year ago 1 0 0 0
Post image Post image Post image Post image

OpenAI’s deep research is very good. Unlike Google’s version, which is mostly a good summarizer of many sources, OpenAI is more like engaging an opinionated (often almost PhD-level!) researcher who follows lead.

Look at how it hunts down a concept in the literature (& works around problems)

1 year ago 86 9 7 2

In all seriousness how batshit is it that a Chinese AI bot is censoring a book THAT HASN'T EVEN BEEN PUBLISHED YET. What dystopia are we all living in.

1 year ago 187 25 0 5
Advertisement

this post is trending in my feed but it does not make sense. i don't see any reasonable interpretation by which DeepSeek demonstrate that model scaling is not the best way to develop AI. their model is very large, and their training corpus is very large. they were just scaling more efficiently.

1 year ago 54 3 8 0

This post mostly argues about variants of training on test - maybe only a verifier, maybe only validation in test. None of that happened. The other point is more generally that hiding funding is a bad idea - and I personally agree very much, unsure why it happened as it's an especially bad idea here

1 year ago 0 0 0 0

It's certainly a weird one - but I only learned about it from the press, as I did about that dataset, I didn't realize OpenAI was involved until after they published their first paper. As I said - researchers may not agree (or even know) about many things, but that doesn't mean we train on test.

1 year ago 0 0 1 0

Also, as far as I can tell (I'm not a lawyer) there's nothing very non-standard in OpenAI work contracts. I have one and certainly have never agreed to lie or deceive. Not only that, but I actually find the culture internally very open to debate and criticism and very opposed to cheating of any kind

1 year ago 0 0 1 0