Advertisement · 728 × 90
#
Hashtag
#ScalingLaw
Advertisement · 728 × 90
New Scaling Law Guides AI Mixture‑of‑Experts Model Design

New Scaling Law Guides AI Mixture‑of‑Experts Model Design

The new scaling law shows the optimal number of active experts (G) and shared‑expert ratio (S) are independent of architecture or data size, while larger MoE models use a sparser activation ratio Nₐ/N. getnews.me/new-scaling-law-guides-a... #scalinglaw #ai

0 0 0 0
Recurrent Transformers Boost Efficiency of Large Language Models

Recurrent Transformers Boost Efficiency of Large Language Models

A September 2025 arXiv paper introduces a recurrent transformer that replaces attention with a sliding‑window layer, achieving linear scaling and accuracy with fewer parameters. getnews.me/recurrent-transformers-b... #recurrenttransformer #scalinglaw

0 0 0 0

Scaling Law: as data, compute, and parameters increase, performance improves sublinearly. Gains saturate beyond ~10^7 data hours, consistent with NLP and vision models. Larger models like ViT-110M benefit most, requiring vast datasets to avoid overfitting #ScalingLaw

0 0 1 0

#MSFT #Microsoft CEO #SatyaNadella weighed in on the #ScalingLaw debate at #Ignite2024. ‘If anything, we’re seeing the emergence of a *new* scaling law for test-time or inference-time compute’ (1/2) (VentureBeat)

0 0 0 0