OjaKV Enables Online Low‑Rank KV Cache Compression for Long‑Context LLMs
OjaKV compresses KV cache, allowing a 32K-token prompt on Llama‑3.1‑8B (batch 4) to use ~16 GB while keeping zero‑shot accuracy, with the low‑rank subspace updated via Oja’s rule. Read more: getnews.me/ojakv-enables-online-low... #ojakv #kvcache
0
0
0
0