By removing host-device round-trips for the autoregressive loop, this runs unmodified across platforms while saturating up to 64% of memory bandwidth on a single batch on Cloud TPU v6e.
#DeepLearning #OpenSource #Flax #Python
Posts by Cosmo Santoni
#MachineLearning state-space models are incredible but custom CUDA kernels lock its performance to NVIDIA hardware.
My latest #JAX port maps the SSD algorithm to XLA passes, achieving true O(1) on-device caching across CPU, GPU & #TPU.
Pre-print š huggingface.co/papers/2603....
#AI #SSM
š§µPt3. Demo of what it looks like in practice you can see context usage of a snapshot with and without trimming below. 50% reduction on this particular task.
š§µPt2. A typical 150k token session is 60-70% tool result dumps and base64 signatures Claude already processed. Trim strips that, keeps every message. Observed 50% reduction in real sessions. Unlike /compact, nothing gets summarised away.
Analysis breakdown here github.com/CosmoNaught/...
š§µ Built git-style versioning for Claude Code sessions. Snapshot context, branch for different tasks, trim the bloat. 40 mins of codebase analysis reused across 5 tasks instead of re-explaining from scratch.
github.com/CosmoNaught/...
#ClaudeCode #DevTools #Anthropic #Claude #LLM #AIAgents
Took a much-needed screen break this week to answer the age-old tale: how many epidemiology PhD students does it take to build a gingerbread house š¤
@nderqui.bsky.social @olliesimmons.bsky.social @cosmosantoni.bsky.social Sam Hemmings @mrc-outbreak.bsky.social