Advertisement · 728 × 90

Posts by Swair

I find it really tiring to wake up, open BlueSky & find a lot of random people insulting me for posting someone’s interesting & harmless AI project

Of course: No I don’t think this replaces historians. Yes, a small AI model like this is limited. No, you are not actually talking to the past. Blech.

3 weeks ago 506 34 25 7

Many thanks!

1 month ago 1 0 0 0

Eventually I ended up going with a wine explanation that also came from a similar podcast. The idea was that you have a collection of a lot of wine bottles and they have different properties e.g. dryness fruitiness age. I need to line them up to display in someway. Didn’t quite hit the spot though.

1 month ago 1 0 0 0

Ah basically i was trying to explain this to a friend purely verbally and I remembered there was this quite neat explanation @stevenstrogatz.com

1 month ago 0 0 1 0
Steven Strogatz

Hi @stevenstrogatz.com there was a podcast where you explained PCA or eigen vectors with either Wine (lining up bottles of wine) or ballet dancer. I'm almost certain it was you do you remember which one it was?

2 months ago 0 0 2 0
CSE 598-004 - Building Small Language Models

The second new class I'm teaching is a very experimental graduate level seminar in CSE: "Building Small Language Models". I taught the grad level NLP class last semester (so fun!) but students wanted more—which of these new ideas work, and which work for SLMs? jurgens.people.si.umich.edu/CSE598-004/

3 months ago 32 9 2 1
Preview
exe.dev - Persistent VMs via SSH Start VMs with persistent disks in seconds. The disk persists. You have sudo.

Tried both. Exe.dev is a slightly higher abstraction. They give you an agent Shelley, helps you write an app host it and make it public. It’s quite neat. You can’t use it just as a secure remote code execution engine easily i think.

3 months ago 1 0 0 0
Preview
arXiv AI/ML Catch-Up Was your New Year's resolution to keep up with arXiv AI/ML preprints? Browse the past week's new uploads in 30 mins.

I uh, made this. It was supposed to be a joke / concept-art thing that scrolls through the torrent of new AI/ML arXiv uploads too fast to read. But I think I iterated too much and made it almost usable.

3 months ago 78 13 7 3

Yes it’s pronounced just like Ganymede and Gemini.

3 months ago 0 0 0 0
Preview
OpenAI are quietly adopting skills, now available in ChatGPT and Codex CLI One of the things that most excited me about Anthropic’s new Skills mechanism back in October is how easy it looked for other platforms to implement. A skill is just …

OpenAI aren't talking about it yet, but it turns out they've adopted Anthropic's brilliant "skills" mechanism in a big way

Skills are now live in both ChatGPT and their Codex CLI tool, I wrote up some detailed notes on how they work so far here: https://simonwillison.net/2025/Dec/12/openai-skills/

4 months ago 14 4 1 1
Advertisement

I’m dreading to read that article. Is it that damning?

4 months ago 1 0 1 0
576 - Using LLMs at Oxide / RFD / Oxide

I have put together a (long overdue!) draft RFD on using LLMs at @oxide.computer, but I know that there is a ton more to be said on the topic; thoughts and experiences welcome!
rfd.shared.oxide.computer/rfd/0576

4 months ago 147 26 20 13

Which one??

4 months ago 2 0 1 0

Let's build hyper-personalized AI-powered software that avoids the attention hijacking anti-patterns that defined so much of the last decade of software design - here's our manifesto with principles on how we can do that - more thoughts on my blog: simonwillison.net/2025/Dec/5/r...

4 months ago 197 40 11 6

Though that’s quite a different kind of model. I suspect a model with even 3bil active parameter (let alone 560M like deepseek ocr) being as capable as gpt-5-pro. even if they did go towards a mixture of model route it’s likely each model is a sparse MoE quite a bit larger than 3bil active params

4 months ago 1 0 0 0

I agree with the latter but I assumed gpt 5 is also a sparse MOE. Like having a small dense of this capability sounds not feasible.

4 months ago 0 0 1 0

Quite impressed by your ability to keep a conversation in good faith.

4 months ago 15 0 2 0

Are you implying that 5-pro isn’t a very sparse MoE?

4 months ago 0 0 1 0

Yeah that’s a problem for them. Google was profitable within 2-3 years of founding (IIRC). Openai is sitting on a large user base with a small portion paying for it. I guess that’s why large hiring from meta and Instacart product head to figure out monetization strategy.

4 months ago 0 0 0 0
Advertisement

OpenAI’s total investment is about what 60 bil. They spend 5 bil a year. Googles 2024 ad revenue was about 240 billion. OpenAI has 800 mil weekly active users that they can find ways to monetize (ads? Shopping?) so as large as these numbers seem ad revenue makes the world goes round I guess

4 months ago 0 0 1 0

God knows. At least Google has the cash flow to keep going further. So as long as the other players gets private investment backing, on it goes. Or some market segmentation or significant model capability plateauing. Whichever comes first.

4 months ago 0 0 2 0

Research + training new models

4 months ago 0 0 1 0

The inference business is profitable for them (so the cost of serving models is less than money they make through api fees) Though they keep spending more and more on research cause if they don’t google, oai, xai or number of Chinese competitors might start taking up their api business.

4 months ago 0 0 2 0
Post image

Thrilled to release Gaperon, an open LLM suite for French, English and Coding 🧀

We trained 3 models - 1.5B, 8B, 24B - from scratch on 2-4T tokens of custom data

(TLDR: we cheat and get good scores)

@wissamantoun.bsky.social @rachelbawden.bsky.social @bensagot.bsky.social @zehavoc.bsky.social

5 months ago 34 18 1 4
What does Riemann Zeta have to do with Brownian Motion?
What does Riemann Zeta have to do with Brownian Motion? YouTube video by Almost Sure

New YouTube video uploaded on connections between Riemann zeta and Brownian motion!

What does Riemann Zeta have to do with Brownian Motion?

youtu.be/YTQKbgxbtiw

5 months ago 9 4 0 0
Post image

Moonshot AI's Kosong, the LLM abstraction layer powering Kimi CLI.

It unifies message structures, asynchronous tool orchestration, and pluggable chat providers so you can build agents with ease and avoid vendor lock-in.

GitHub: github.com/MoonshotAI/k...
Docs: moonshotai.github.io/kosong/

5 months ago 12 3 1 0

Some interesting stuff here on measuring writing quality and improving on qualitative tasks:
www.dbreunig.com/2025/07/31/h...

5 months ago 22 6 3 1
Preview
xAI's Grok 4: The tension of frontier performance with a side of Elon favoritism An o3 class model, the possibility of progress, chatbot beige, and the illusiveness of taste.

Not enough technical AI researchers are criticizing the string of system failures around Grok. I ended this post with my very transparent thoughts on the complete failure of Grok’s recent behavior ($). www.interconnects.ai/p/grok-4-an-...

9 months ago 33 4 1 1
Advertisement
Preview
Low-Rank Thinning The goal in thinning is to summarize a dataset using a small set of representative points. Remarkably, sub-Gaussian thinning algorithms like Kernel Halving and Compress can match the quality of unifor...

Off to ICML next week?

Check out my student Annabelle’s paper in collaboration with @lestermackey.bsky.social and colleagues on low-rank thinning!

New theory, dataset compression, efficient attention and more:

arxiv.org/abs/2502.12063

9 months ago 11 5 0 1
Preview
Everything I'll forget about Evals An actually actionable guide on how to do them

olickel.com/everything-...

11 months ago 1 1 0 0