ProxyAttn Introduces Guided Sparse Attention Using Representative Heads
ProxyAttn, a training‑free method without additional training using representative heads, claims up to 10.3× faster raw attention and 2.4× speed‑up in LLM pre‑fill. Read more: getnews.me/proxyattn-introduces-gui... #proxyattn #sparseattention
0
0
0
0