Advertisement · 728 × 90

Posts by Sung Kim

You might want to keep Codex running in --yolo mode continuously in a loop today, since your usage limit will be reset later today.

1 hour ago 9 0 1 0

My review of the new Korean drama We Are All Trying Here on Netflix: it’s about ‘the industry,’ and I really do not want to watch a drama about an industry I cannot relate to.

“The industry” here means the entertainment industry.

15 hours ago 6 0 0 1

Disney +'s Perfect Crown, if you desire Subway.

1 day ago 1 0 1 0

Back in December, when AI coding agents started getting good, it felt like magic.

Now, the magic is gone, replaced by the same old process of building an app one step at a time, except you’re coding much less, or not at all.

1 day ago 33 0 6 0

When you watch a Korean drama, you suddenly get the urge to eat Subway, then you catch yourself - it’s Subway... Nah.

1 day ago 5 1 1 0

There is a short animation on Netflix. Never, never watch that! You will just cry your eyes out even thinking about it.

1 day ago 2 0 0 0

Also this...

1 day ago 8 0 1 0

Go for it. It's your tokens.

1 day ago 1 0 0 0
Preview
GitHub - adrida/tracer: TRACER: replace 90%+ of your LLM classification calls with a traditional ML model. Formal parity guarantees. Self-improving. TRACER: replace 90%+ of your LLM classification calls with a traditional ML model. Formal parity guarantees. Self-improving. - adrida/tracer

TRACER learns the decision boundary between "easy" and "hard" inputs directly from your LLM's own classification traces. It fits a fast, non-LLM surrogate on the easy partition, gates it with a calibrated acceptor, and defers only the uncertain inputs back to the LLM.

github.com/adrida/tracer

1 day ago 15 1 0 1
Advertisement

Use Tracer when performing classification tasks. Maybe, 90% of your classification tasks can be done by CPU-inferenced classifiers (logistic regression, gradient-boosted trees, or a small neural net), instead of GPU-inferenced LLM.

1 day ago 19 1 1 0
Post image

A team at University College London wrote a paper on leaked Claude Code code.

Paper: arxiv.org/abs/2604.14228
Repo: github.com/VILA-Lab/Div...

1 day ago 24 5 3 0
Preview
GitHub - Kayleexx/sigil: sigil is a low level tool that runs a linux process in a private and restricted environment similar to docker sigil is a low level tool that runs a linux process in a private and restricted environment similar to docker - Kayleexx/sigil

The goal of this project is to understand and build the core primitives behind tools like Docker and runc rather than treating them as black boxes.

Repo: github.com/kayleexx/sigil

1 day ago 20 0 0 0
Post image

Sigil

Sigil is a low-level container runtime written in Rust. It implements process supervision, Linux namespaces, filesystem isolation, and cgroups from scratch to demonstrate how containers actually work under the hood.

1 day ago 76 5 3 0
Post image

Results: models actively use cross-layer retrieval, the attention-sink phenomenon disappears, and MoDA improves the OLMo2 baseline across the board.

📄 Paper: arxiv.org/abs/2603.15619
✍️ Blog: lh-zhu.github.io/The-Second-H...
💻 Code: github.com/hustvl/MoDA

1 day ago 6 0 0 0
Post image

The conventional Transformer backbone pipeline: residual connections -> sequential attention -> residual connections -> FFN.

The Flash Depth Attention (FDA) Transformer pipeline: depth attention -> sequence attention -> depth attention -> FFN.

1 day ago 3 0 1 0
Post image

ByteDance introduces Flash Depth Attention (FDA): a hardware-efficient kernel that speeds depth attention up by >40,000×, making full-expressivity depth retrieval fast enough to train at scale.

1 day ago 29 1 1 1

On my social media feed - Vercel, Vercel, and Vercel; and I don't even use Vercel. 😁

1 day ago 3 0 0 0

Is this at a post level or more global change?

2 days ago 2 0 1 0
Advertisement

How do I mute likes and repost of your own post?

2 days ago 3 0 2 0
Preview
Prefill-as-a-Service: KVCache of Next-Generation Models Could Go Cross-Datacenter Prefill-decode (PD) disaggregation has become the standard architecture for large-scale LLM serving, but in practice its deployment boundary is still determined by KVCache transfer. In conventional de...

, which reduces KV cache size and makes cross-DC PD practical.

Result: Directly translating into lower token cost.

Paper: arxiv.org/abs/2604.150...

3 days ago 5 0 0 0
Post image

Moonshot AI prefill/decode disaggregation beyond a single cluster: cross-datacenter + heterogeneous hardware, unlocking the potential for significantly lower cost per token.

This was previously blocked by KV cache transfer overhead. The key enabler is our hybrid model (Kimi Linear)

3 days ago 11 1 2 0

As I use Bluesky, I like knowing that Bluesky is providing my data for AI training to all AI Labs for FREE.

3 days ago 63 6 4 1

Why you should use Clause Code now!

You know, Anthropic has a history of delivering a superior model when they release a new model that starts degrading after few weeks.

3 days ago 12 0 2 0
Preview
GitHub - tailscale/tailscale-rs: Rust implementation of Tailscale (preview, experimental) Rust implementation of Tailscale (preview, experimental) - tailscale/tailscale-rs

tailscale-rs

It is a work-in-progress Tailscale library written in Rust, with language bindings to C, Elixir, and Python.

github.com/tailscale/ta...

3 days ago 18 1 0 0
Preview
Transformer Demo — mni-ml | mni-ml Interactive transformer token explorer running entirely in your browser.

- Automatic WebGPU fallback for non-NVIDIA devices
- TypeScript API with Rust compute backend
- One npm install to get started, prebuilt binaries for every platform

Demo: mni-ml.github.io/demos/transf...
Repo: github.com/mni-ml/frame...

3 days ago 3 1 0 0
Advertisement
Video

He trained a 12M parameter LLM on my own ML framework using a Rust backend and CUDA kernels for flash attention, AdamW, and more.

The framework features:
- Custom CUDA kernels (Flash Attention, fused LayerNorm, fused GELU) for 3x increased throughput

3 days ago 10 0 1 0
Video

Oh well, we’re safe from robots taking our jobs for now.

3 days ago 915 178 122 92

Developers complain about Electron, but many still end up migrating their projects from Tauri to Electron.

3 days ago 4 0 2 0

Why crypto market has no future.

Bitcoin’s problem is that, while the asset itself is fixed in supply, the market keeps creating synthetic exposure through derivatives, which weakens the practical meaning of scarcity.

This is why OG HODLers are selling their bitcoins.

3 days ago 14 3 1 2
Preview
GitHub - vercel-labs/wterm: A terminal emulator for the web A terminal emulator for the web. Contribute to vercel-labs/wterm development by creating an account on GitHub.

Vercel's wterm: A terminal emulator for the web, written in Zig.

github.com/vercel-labs/...

4 days ago 15 0 0 0