ElasticMoE: Fast, Zero‑Downtime Scaling for Mixture‑of‑Experts Models
ElasticMoE lowers scale‑up latency by up to 9× and doubles throughput while adding accelerators with zero‑copy HBM remapping and peer‑to‑peer transfers, all without pausing inference. getnews.me/elasticmoe-fast-zero-dow... #elasticmoe #moe
0
0
0
0