You might want to keep Codex running in --yolo mode continuously in a loop today, since your usage limit will be reset later today.
Posts by Sung Kim
My review of the new Korean drama We Are All Trying Here on Netflix: it’s about ‘the industry,’ and I really do not want to watch a drama about an industry I cannot relate to.
“The industry” here means the entertainment industry.
Disney +'s Perfect Crown, if you desire Subway.
Back in December, when AI coding agents started getting good, it felt like magic.
Now, the magic is gone, replaced by the same old process of building an app one step at a time, except you’re coding much less, or not at all.
When you watch a Korean drama, you suddenly get the urge to eat Subway, then you catch yourself - it’s Subway... Nah.
There is a short animation on Netflix. Never, never watch that! You will just cry your eyes out even thinking about it.
Also this...
Go for it. It's your tokens.
TRACER learns the decision boundary between "easy" and "hard" inputs directly from your LLM's own classification traces. It fits a fast, non-LLM surrogate on the easy partition, gates it with a calibrated acceptor, and defers only the uncertain inputs back to the LLM.
github.com/adrida/tracer
Use Tracer when performing classification tasks. Maybe, 90% of your classification tasks can be done by CPU-inferenced classifiers (logistic regression, gradient-boosted trees, or a small neural net), instead of GPU-inferenced LLM.
A team at University College London wrote a paper on leaked Claude Code code.
Paper: arxiv.org/abs/2604.14228
Repo: github.com/VILA-Lab/Div...
The goal of this project is to understand and build the core primitives behind tools like Docker and runc rather than treating them as black boxes.
Repo: github.com/kayleexx/sigil
Sigil
Sigil is a low-level container runtime written in Rust. It implements process supervision, Linux namespaces, filesystem isolation, and cgroups from scratch to demonstrate how containers actually work under the hood.
Results: models actively use cross-layer retrieval, the attention-sink phenomenon disappears, and MoDA improves the OLMo2 baseline across the board.
📄 Paper: arxiv.org/abs/2603.15619
✍️ Blog: lh-zhu.github.io/The-Second-H...
💻 Code: github.com/hustvl/MoDA
The conventional Transformer backbone pipeline: residual connections -> sequential attention -> residual connections -> FFN.
The Flash Depth Attention (FDA) Transformer pipeline: depth attention -> sequence attention -> depth attention -> FFN.
ByteDance introduces Flash Depth Attention (FDA): a hardware-efficient kernel that speeds depth attention up by >40,000×, making full-expressivity depth retrieval fast enough to train at scale.
On my social media feed - Vercel, Vercel, and Vercel; and I don't even use Vercel. 😁
Is this at a post level or more global change?
How do I mute likes and repost of your own post?
, which reduces KV cache size and makes cross-DC PD practical.
Result: Directly translating into lower token cost.
Paper: arxiv.org/abs/2604.150...
Moonshot AI prefill/decode disaggregation beyond a single cluster: cross-datacenter + heterogeneous hardware, unlocking the potential for significantly lower cost per token.
This was previously blocked by KV cache transfer overhead. The key enabler is our hybrid model (Kimi Linear)
As I use Bluesky, I like knowing that Bluesky is providing my data for AI training to all AI Labs for FREE.
Why you should use Clause Code now!
You know, Anthropic has a history of delivering a superior model when they release a new model that starts degrading after few weeks.
tailscale-rs
It is a work-in-progress Tailscale library written in Rust, with language bindings to C, Elixir, and Python.
github.com/tailscale/ta...
- Automatic WebGPU fallback for non-NVIDIA devices
- TypeScript API with Rust compute backend
- One npm install to get started, prebuilt binaries for every platform
Demo: mni-ml.github.io/demos/transf...
Repo: github.com/mni-ml/frame...
He trained a 12M parameter LLM on my own ML framework using a Rust backend and CUDA kernels for flash attention, AdamW, and more.
The framework features:
- Custom CUDA kernels (Flash Attention, fused LayerNorm, fused GELU) for 3x increased throughput
Oh well, we’re safe from robots taking our jobs for now.
Developers complain about Electron, but many still end up migrating their projects from Tauri to Electron.
Why crypto market has no future.
Bitcoin’s problem is that, while the asset itself is fixed in supply, the market keeps creating synthetic exposure through derivatives, which weakens the practical meaning of scarcity.
This is why OG HODLers are selling their bitcoins.