SentenceKV Improves LLM Inference with Sentence-Level KV Caching
SentenceKV compresses token KV pairs into sentence‑level vectors, cutting memory use and keeping latency stable; on the PG‑19 benchmark it lowered memory footprint and matched perplexity. getnews.me/sentencekv-improves-llm-... #sentencekv #llminference
0
0
0
0