Advertisement · 728 × 90

Posts by Tobi

John Dewey wrote a whole book about this

1 day ago 4 0 0 0
Text Shot: New findings challenge the widespread belief that AI is an environmental villain. By analyzing U.S. economic data and AI usage across industries, researchers discovered that AI’s energy consumption—while significant locally—barely registers at national or global scales. Even more surprising, AI could help accelerate green technologies rather than hinder them.

Text Shot: New findings challenge the widespread belief that AI is an environmental villain. By analyzing U.S. economic data and AI usage across industries, researchers discovered that AI’s energy consumption—while significant locally—barely registers at national or global scales. Even more surprising, AI could help accelerate green technologies rather than hinder them.

AI’s climate impact is much smaller than many feared www.sciencedaily.com/releases/2025/… #AI #environment

2 months ago 39 4 0 1

🔁 If you are enjoying the feed, please like and share it with others for discoverability!

6 months ago 33 7 1 2

This is the what I’ve been feeling as Twitter hypes up Claude Code like it’s the second coming of Jesus. Yes it’s very good but it struggles tremendously with high complexity projects and produces consistently worse designs then I’d implement myself, and people aren’t honest about admitting this

2 months ago 0 1 0 0
The generated code is not very efficient. Even with all optimizations enabled, it outputs less efficient code than GCC with all optimizations disabled.
The Rust code quality is reasonable, but is nowhere near the quality of what an expert Rust programmer might produce.

The generated code is not very efficient. Even with all optimizations enabled, it outputs less efficient code than GCC with all optimizations disabled. The Rust code quality is reasonable, but is nowhere near the quality of what an expert Rust programmer might produce.

this "claude made a c compiler thing" feels pretty dishonest and marketing-hype whenever you scroll to the very bottom to find these lines and the token cost lol

i mean, sure its impressive (particularly that it was a fully offline environment), but seems like hype bait

2 months ago 108 4 9 2

I tend to prefer GLM 4.5 Air when I’m going for cheap and fast, but Gemini Flash is pretty solid

2 months ago 2 0 1 0

Also good thing to keep in mind, that afaik the current "subscription token economy" is heavily subsidized and I am not sure Anthropic/OpenAI are charging necessarily full amount (i.e the same price that would charge for API cost)

2 months ago 3 2 1 0

This is why I genuinely believe that in the model game, there is no moat, and there will always be tremendous pressure for open source and cheaper models. The majority of the world economy doesn't want to pay $21 (!!!) per million tokens, even if they can afford to

2 months ago 7 0 1 1
Advertisement
Post image

I've been evaluating Claude Opus 4.6, GPT 5.2, and other models in a simulation environment I've built. The verdict: I ran out of money

2 months ago 23 2 2 0

If Trump hadn’t sued Trevor Noah for that joke at the Grammy’s, I probably never would’ve heard it. Talk about Streisand Effect

2 months ago 6 2 1 0
Post image

ByteDance Seed's ConceptMoE: moving beyond uniform token-level processing to adaptive concept-level computation in LLMs!

Why waste equal compute on trivially predictable tokens? When you can merge similar tokens into concepts while preserving fine-grained processing for complex content.

2 months ago 27 3 1 1
Post image

Thought this was referring to @lastpositivist.bsky.social and got really worried

2 months ago 2 0 0 0

What batch sizes and sequence lengths are you about to get up to on your dgx spark? I was doing 32b 2048L but saw much faster throughput when I dropped down to 16b

2 months ago 0 0 1 0

What do you use to have Claude make slides?

2 months ago 0 0 0 0

Sidenote: All experiments were performed on the Nvidia DGX Spark machine, my new favorite toy.

2 months ago 2 0 0 0

These experiments clarify the importance of more holistic evaluation metrics during pretraining. As we move labeled data for reasoning, planning, and other higher order thinking into the pretraining stage, pretraining evaluation that's more aligned with downstream tasks becomes more important

2 months ago 1 0 1 0
Advertisement

Outside of HumanEval, the model pretrained on the SYNTH dataset outperforms the standard nanochat on every task. I report results for both the standard chat template (Harmony) as well as the Qwen3 Chat template used in the SYNTH dataset.

2 months ago 4 0 1 1
Markdown representation of the above table:

| Task | Default Template | Qwen3 Template | Synthetic + Fineweb (Default) | Synthetic + Fineweb (Qwen3) | Synthetic + Fineweb (Qwen3, no thinking) |
|------|------------------|----------------|-------------------------------|----------------------------|-------------------------------------------|
| ARC-Easy | 38.85% | 25.34% | 42.85% | 30.26% | 25.08% |
| ARC-Challenge | 29.95% | 22.78% | 32.94% | 25.85% | 23.04% |
| MMLU | 32.69% | 24.08% | 33.13% | 26.98% | 24.21% |
| GSM8K | 3.64% | 0.08% | 6.14% | 2.20% | 0.08% |
| HumanEval | 8.54% | 1.22% | 1.83% | 0.00% | 0.61% |
| SpellingBee | 98.05% | 0.00% | 98.05% | 35.94% | 0.39% |

Markdown representation of the above table: | Task | Default Template | Qwen3 Template | Synthetic + Fineweb (Default) | Synthetic + Fineweb (Qwen3) | Synthetic + Fineweb (Qwen3, no thinking) | |------|------------------|----------------|-------------------------------|----------------------------|-------------------------------------------| | ARC-Easy | 38.85% | 25.34% | 42.85% | 30.26% | 25.08% | | ARC-Challenge | 29.95% | 22.78% | 32.94% | 25.85% | 23.04% | | MMLU | 32.69% | 24.08% | 33.13% | 26.98% | 24.21% | | GSM8K | 3.64% | 0.08% | 6.14% | 2.20% | 0.08% | | HumanEval | 8.54% | 1.22% | 1.83% | 0.00% | 0.61% | | SpellingBee | 98.05% | 0.00% | 98.05% | 35.94% | 0.39% |

The results seemed to speak for themselves, but I was skeptical. The SYNTH dataset is composed of reasoning-style chat messages, it seemed unlikely that general internet data would perform better on downstream chat-style tasks. So I decided to do midtraining, where the results flipped:

2 months ago 0 0 1 0
Post image Post image

I used a 3:1 SYNTH:FineWeb data ratio when pretraining. I evaluated nanochat with and without SYNTH data for both 1B pretraining tokens and 3B pretraining tokens to see if adding the data lead to faster convergence. While SYNTH data led to lower val/bpb, CORE metric was consistently lower.

2 months ago 0 0 1 0
Pretraining results for training on SYNTH + FineWeb vs FineWeb alone. 

Four rows are displayed: dgx1-continued, dgx1-synth-contued, dgx1-synth, and dgx1.

Rows labeled *-synth are trained with 3:1 SYNTH:FineWeb mixture. Results with *-continued* are trained for 3B tokens, 1B otherwise. CORE metric:

nanochat + SYNTH, 3B tokens: 0.147 CORE Metric
nanochat, 3B tokens: 0.166
nanochat + SYNTH, 1B tokens: 0.123
nanotchat, 3B tokens: 0.139

Pretraining results for training on SYNTH + FineWeb vs FineWeb alone. Four rows are displayed: dgx1-continued, dgx1-synth-contued, dgx1-synth, and dgx1. Rows labeled *-synth are trained with 3:1 SYNTH:FineWeb mixture. Results with *-continued* are trained for 3B tokens, 1B otherwise. CORE metric: nanochat + SYNTH, 3B tokens: 0.147 CORE Metric nanochat, 3B tokens: 0.166 nanochat + SYNTH, 1B tokens: 0.123 nanotchat, 3B tokens: 0.139

I did some experiments with nanochat and the SYNTH dataset from Pleias. I wanted to compare how a model pretrained with a mixture of SYNTH and FineWeb would compare to the standard FineWeb mixture. Results: Pretraining on Fineweb alone looks good (see below), but the story changes after midtraining:

2 months ago 9 1 1 0

Can anyone find that meme with the cartoon cats where all of their butts are AI company logos? I need it for a project

2 months ago 0 0 0 0

Been running comparisons nanochat pretrained on FineWebEdu vs synthetic data and it’s not looking great for the synthetic data runs thus far. We’ll see what it looks like after mid training

2 months ago 1 0 0 0

At my day job, I’m building a lot of Agent stuff! On my free time, I’m experimenting with model architectures and pretraining recipes!

3 months ago 5 0 0 0

Excited to see the AI community has grown on here! Gonna try sharing more about my work here!

3 months ago 47 1 4 0
Advertisement

When Bill Clinton left office, we had a budget surplus. After 8 years of disastrous foreign interventions and tax cuts, we had deficit and a recession. Now with Trump we’re repeating repeating the same mistakes made under Bush

3 months ago 5 0 0 0

I heard this place is getting a dislike button?

5 months ago 2 0 0 0

If you assume that there is an infinite number of tech workers with six figure salaries who want to live in the Bay or NY then the NIMBYs are right and no amount of new housing will help. But there is not an infinite number of tech workers, and the ones here are being laid off.

8 months ago 5 0 0 0

People talk about artificial intelligence but don't understand what that technology actually is, what the term "AI" is referring to, or best use cases for all the different tools that exist.

Nor do they understand that they've already been using AI assistance for YEARS before LLMs became public.

8 months ago 1 1 0 0

All these things apply to a small percentage of AI research and companies, yet this website treats all AI models and research like it comes from xAI. You ran the actual open source and AI safety researchers off the app

8 months ago 1 0 2 0

I think it’s because they’re threatened by it honestly

8 months ago 6 0 4 3