Chassis looks great now! But the single cooler solution seems like a bit of a limiting factor for fatter chips now. Still, if memory/ssd prices weren't insane, this would be a cool little machine.
Alas...
Posts by A.V.
A table titled "Kimi K2.6 vs K2.5," sub-headed "Generational lift & position among frontier." It compares the **Kimi K2.6** model against its predecessor (**K2.5**) and other frontier models including **GPT-5.4 xhigh**, **Gemini 3.1 Pro**, **Opus 4.6**, **Opus 4.7**, and **Mythos**. The table highlights "Generational Lift" (\Delta), which is the performance increase from K2.5 to K2.6. ### Key Sections **1. Agentic • Search • Tool Use** * **Top Performance:** Kimi K2.6 shows massive gains in tool use, specifically **Toolathlon** (+22.2) and **MCPMark** (+26.4). * **Leaders:** Kimi leads in **DeepSearchQA accuracy** (83.0) and **WideSearch** (80.8). However, **Mythos** leads the HLE-Full w/ tools benchmark (64.7). **2. Coding** * **Top Performance:** Kimi K2.6 shows a significant lift in **Terminal-Bench 2.0** (+15.9). * **Leaders:** **Opus 4.7** leads most coding categories, including **SWE-Bench Verified** (87.6) and **Terminal-Bench** (69.4). Kimi leads in **SWE-Bench Pro** (58.6). **3. Reasoning & Knowledge** * **Top Performance:** High scores across the board, but the generational lift is smaller (e.g., **AIME 2026** only moved +0.6). * **Leaders:** **GPT-5.4** leads in **AIME 2026** (99.2) and **HMMT 2026** (97.7). **Mythos** leads **HLE-Full (no tools)** at 56.8. **4. Vision** * **Top Performance:** The largest single gain in the chart is **BabyVision w/ python**, where Kimi K2.6 improved by +28.0 points over K2.5. * **Leaders:** **Gemini 3.1 Pro** leads **MMMU-Pro** (83.0), while **GPT-5.4** leads **MathVision** (92.0) and **V* w/ python** (98.4). ### Biggest Generational Lifts (K2.5 \rightarrow K2.6) | Benchmark | K2.5 | K2.6 | Lift (\Delta) | Category | |---|---|---|---|---| | **BabyVision w/ python** | 40.5 | 68.5 | **+28.0** | Vision (Python-augmented) | | **MCPMark** | 29.5 | 55.9 | **+26.4** | Agentic (Tool orchestration) | | **Toolathlon** | 27.8 | 50.0 | **+22.2** | Agentic (Long-horizon tools) | | **APEX-Agents** | 11.5 | 27.9 | **+16.4** | Ag…
mythos vs opus 4.7 vs cursor composer vs K2.6 on non-cherry-picked benchmarks
result: yup, still looking good
benchmark scores are truly impressive. hope kimi doesn't stop.
Kimi 2.6 is now available on @hf.co 🔥🎉
huggingface.co/moonshotai/K...
✨ 1T MoE / 32B active / 256K context
✨ Agent Swarm: 300 sub-agents × 4,000 steps
✨ Modified MIT
yup, they do sound different. there's probably no objective best, unfortunately, you just gotta roll with the sound signature you like (and the featureset, if it's wireless).
roko's basilisk hits different in the agent era
more of a leaning desk, really.
the cthulhu claude logo, the scary name and the product being an attempt at a hive mind.
I kinda like the combo, shame about the anthropic brown™
Qwen3.6 35B-A3B can now be run locally! 💜
The model is the strongest mid-sized LLM on nearly all benchmarks.
Run on 23GB RAM via Unsloth Dynamic GGUFs.
GGUFs to run: huggingface.co/unsloth/Qwen...
Guide: unsloth.ai/docs/models/...
a cookie for honesty 🍪
MYTHOS SYSTEM CARD PREVIEW!!!
www-cdn.anthropic.com/53566bf5440a...
MYTHOS CONFIRMED!!!!!!
the new shaders are really something
Claude made me a chart comparing benchmarks for the larger Gemma 4 models against similar Qwen3.5 ones
A new Anthropic paper argues for functional emotions in LLMs, claiming a causal link between emotional representations and model behavior. transformer-circuits.pub/2026/emotion...
classifying the new composer tech report as a must read cursor.com/resources/Co...
ah, I didn't expect this to actually be about harry, my bluesky isolation must be exceptional. sorry you have to not care so hard...
what did harry do this time...
paldies, šis jau drusku cerīgāk izskatās, bet ir vieta izaugsmei. poļu elevenlabs tiešām labi izskatās šeit.
paldies par pūlēm! akmens tildes dārziņā, ka nevar ērtāk tikt klāt šim resursam...
paldies! jāsaka gan, ka hugo.lv ir antīks projekts (it sevišķi AI ērā) un bez papildus finansējuma noteikti nekas tur nav baigi atjaunots. es liktu lielākas cerības uz tildes mājaslapu: tilde.ai/lv/speech-to...
kur tilde?
viņiem arī ir speech to text, būtu interesanti redzēt salīdzinājumu ar citiem, foršs tests
Inspired by the man who built a personalized cancer vaccine for his dog, I’ve written an open-source guide to DIY mRNA vaccine production:
philfung.github.io/openvaxx
as a naive claudecel, why do you, uhh, not just use codex instead then
Btw, nvidia published cutile-rs not so many days ago, a Rust version of cuTile DSL for programming cuda kernels from inside Rust. Only a research project, but it looks very cool and has quite a few features already.
github.com/NVlabs/cutil...
thx claude for explanation, I was lost there for a second, ngl.
LongCat-Next 🐱 A multimodal foundation model released by
Meituan
huggingface.co/meituan-long...
✨ 74B total - 3B active - MIT
✨ One token space for all modalities
✨ DiNA paradigm for unified learning
✨ Seeing, creating, talking all in one
but its clear by now that if we ever get to space properly, it'll be much weirder than the mainstream space truckers.