John Dewey wrote a whole book about this
Posts by Tobi
Text Shot: New findings challenge the widespread belief that AI is an environmental villain. By analyzing U.S. economic data and AI usage across industries, researchers discovered that AI’s energy consumption—while significant locally—barely registers at national or global scales. Even more surprising, AI could help accelerate green technologies rather than hinder them.
AI’s climate impact is much smaller than many feared www.sciencedaily.com/releases/2025/… #AI #environment
🔁 If you are enjoying the feed, please like and share it with others for discoverability!
This is the what I’ve been feeling as Twitter hypes up Claude Code like it’s the second coming of Jesus. Yes it’s very good but it struggles tremendously with high complexity projects and produces consistently worse designs then I’d implement myself, and people aren’t honest about admitting this
The generated code is not very efficient. Even with all optimizations enabled, it outputs less efficient code than GCC with all optimizations disabled. The Rust code quality is reasonable, but is nowhere near the quality of what an expert Rust programmer might produce.
this "claude made a c compiler thing" feels pretty dishonest and marketing-hype whenever you scroll to the very bottom to find these lines and the token cost lol
i mean, sure its impressive (particularly that it was a fully offline environment), but seems like hype bait
I tend to prefer GLM 4.5 Air when I’m going for cheap and fast, but Gemini Flash is pretty solid
Also good thing to keep in mind, that afaik the current "subscription token economy" is heavily subsidized and I am not sure Anthropic/OpenAI are charging necessarily full amount (i.e the same price that would charge for API cost)
This is why I genuinely believe that in the model game, there is no moat, and there will always be tremendous pressure for open source and cheaper models. The majority of the world economy doesn't want to pay $21 (!!!) per million tokens, even if they can afford to
I've been evaluating Claude Opus 4.6, GPT 5.2, and other models in a simulation environment I've built. The verdict: I ran out of money
If Trump hadn’t sued Trevor Noah for that joke at the Grammy’s, I probably never would’ve heard it. Talk about Streisand Effect
ByteDance Seed's ConceptMoE: moving beyond uniform token-level processing to adaptive concept-level computation in LLMs!
Why waste equal compute on trivially predictable tokens? When you can merge similar tokens into concepts while preserving fine-grained processing for complex content.
Thought this was referring to @lastpositivist.bsky.social and got really worried
What batch sizes and sequence lengths are you about to get up to on your dgx spark? I was doing 32b 2048L but saw much faster throughput when I dropped down to 16b
What do you use to have Claude make slides?
Sidenote: All experiments were performed on the Nvidia DGX Spark machine, my new favorite toy.
These experiments clarify the importance of more holistic evaluation metrics during pretraining. As we move labeled data for reasoning, planning, and other higher order thinking into the pretraining stage, pretraining evaluation that's more aligned with downstream tasks becomes more important
Outside of HumanEval, the model pretrained on the SYNTH dataset outperforms the standard nanochat on every task. I report results for both the standard chat template (Harmony) as well as the Qwen3 Chat template used in the SYNTH dataset.
Markdown representation of the above table: | Task | Default Template | Qwen3 Template | Synthetic + Fineweb (Default) | Synthetic + Fineweb (Qwen3) | Synthetic + Fineweb (Qwen3, no thinking) | |------|------------------|----------------|-------------------------------|----------------------------|-------------------------------------------| | ARC-Easy | 38.85% | 25.34% | 42.85% | 30.26% | 25.08% | | ARC-Challenge | 29.95% | 22.78% | 32.94% | 25.85% | 23.04% | | MMLU | 32.69% | 24.08% | 33.13% | 26.98% | 24.21% | | GSM8K | 3.64% | 0.08% | 6.14% | 2.20% | 0.08% | | HumanEval | 8.54% | 1.22% | 1.83% | 0.00% | 0.61% | | SpellingBee | 98.05% | 0.00% | 98.05% | 35.94% | 0.39% |
The results seemed to speak for themselves, but I was skeptical. The SYNTH dataset is composed of reasoning-style chat messages, it seemed unlikely that general internet data would perform better on downstream chat-style tasks. So I decided to do midtraining, where the results flipped:
I used a 3:1 SYNTH:FineWeb data ratio when pretraining. I evaluated nanochat with and without SYNTH data for both 1B pretraining tokens and 3B pretraining tokens to see if adding the data lead to faster convergence. While SYNTH data led to lower val/bpb, CORE metric was consistently lower.
Pretraining results for training on SYNTH + FineWeb vs FineWeb alone. Four rows are displayed: dgx1-continued, dgx1-synth-contued, dgx1-synth, and dgx1. Rows labeled *-synth are trained with 3:1 SYNTH:FineWeb mixture. Results with *-continued* are trained for 3B tokens, 1B otherwise. CORE metric: nanochat + SYNTH, 3B tokens: 0.147 CORE Metric nanochat, 3B tokens: 0.166 nanochat + SYNTH, 1B tokens: 0.123 nanotchat, 3B tokens: 0.139
I did some experiments with nanochat and the SYNTH dataset from Pleias. I wanted to compare how a model pretrained with a mixture of SYNTH and FineWeb would compare to the standard FineWeb mixture. Results: Pretraining on Fineweb alone looks good (see below), but the story changes after midtraining:
Can anyone find that meme with the cartoon cats where all of their butts are AI company logos? I need it for a project
Been running comparisons nanochat pretrained on FineWebEdu vs synthetic data and it’s not looking great for the synthetic data runs thus far. We’ll see what it looks like after mid training
At my day job, I’m building a lot of Agent stuff! On my free time, I’m experimenting with model architectures and pretraining recipes!
Excited to see the AI community has grown on here! Gonna try sharing more about my work here!
When Bill Clinton left office, we had a budget surplus. After 8 years of disastrous foreign interventions and tax cuts, we had deficit and a recession. Now with Trump we’re repeating repeating the same mistakes made under Bush
I heard this place is getting a dislike button?
If you assume that there is an infinite number of tech workers with six figure salaries who want to live in the Bay or NY then the NIMBYs are right and no amount of new housing will help. But there is not an infinite number of tech workers, and the ones here are being laid off.
People talk about artificial intelligence but don't understand what that technology actually is, what the term "AI" is referring to, or best use cases for all the different tools that exist.
Nor do they understand that they've already been using AI assistance for YEARS before LLMs became public.
All these things apply to a small percentage of AI research and companies, yet this website treats all AI models and research like it comes from xAI. You ran the actual open source and AI safety researchers off the app
I think it’s because they’re threatened by it honestly