Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models The key idea The key idea Offloading local dependencies between tokens with lookups to a massive embedding t...
#memory #sparsity #LLM
Origin | Interest | Match
Cardinality Sparsity: Applications in Matrix-Matrix Multiplications and Machine Learning
Ali Mohaddes, Johannes Lederer
Action editor: Pan Xu
https://openreview.net/forum?id=zoSRSpGu9C
#sparse #tensor #sparsity
Single-Layer Attention Beats Linear Models on Sparse Tokens
A single‑layer attention model can detect rare signals in long sequences with signal strength growing only logarithmically with length L, while linear classifiers need sqrt(L). Read more: getnews.me/single-layer-attention-b... #attention #sparsity
Spiking Neural Networks Naturally Sparse Gradients Enhance Robustness
Researchers find spiking neural network designs produce sparse gradients, giving robustness without regularization, reducing generalization on clean data. Read more: getnews.me/spiking-neural-networks-... #spikingneuralnetworks #sparsity
TASO: Task-Aligned Sparse Optimization for Efficient Fine‑Tuning
TASO outperforms standard LoRA even with a parameter budget comparable to LoRA rank = 1, trimming unnecessary LoRA weights for more efficient fine‑tuning. Read more: getnews.me/taso-task-aligned-sparse... #taso #lora #sparsity
Sparse FedAdam Reduces Communication Overhead in Federated Learning
FedAdam‑SSM applies a mask to model updates, cutting uplink traffic to about one‑third of FedAdam and achieving 1.1× faster convergence with 14.5% higher accuracy than quantized variants. getnews.me/sparse-fedadam-reduces-c... #fedadam #sparsity
[9/9]
Appreciate any advice, pointers to relevant papers, or even “don’t do this” cautionary tales.
Thanks in advance!
#transformers #sparsity #maskedmodeling #deeplearning #symbolicAI #mlresearch #attentionmodels #structureddata
Slightly lazy but feel need to post this in case it is too late... We will present this in the ICLR Workshop on Sparsity in LLMs (SLLM)! We found that the representation dimension can dominate the model performance in the structured pruning 🤯
#ICLR2025 #LLM #sparsity
“ #Sparsity is a kind of magic dial that finds the best match of the #AImodel you've got and the compute you have available.
It's the same economic rule of thumb…of personal computers: Either a better result for the same money or the same result for less money.” #AI
www.zdnet.com/article/appl...
How DeepSeek did it Chinese artificial intelligence (AI) company DeepSeek has sent shockwaves thr...
https://asiatimes.com/2025/01/how-deepseek-did-it/
#Technology #AI #Sparsity #Anthropic #Claude #3.5 #Artificial #Intelligence #Block #2 #ChatGPT-40
Event Attributes
How DeepSeek did it Chinese artificial intelligence (AI) company DeepSeek has sent shockwaves thr...
https://asiatimes.com/2025/01/how-deepseek-did-it/
#Technology #AI #Sparsity #Anthropic #Claude #3.5 #Artificial #Intelligence #Block #3 #ChatGPT-40
Event Attributes
PLUM: Improving Inference Efficiency By Leveraging Repetition-Sparsity Trade-Off
Sachit Kuhar, Yash Jain, Alexey Tumanov
Action editor: Hongsheng Li
https://openreview.net/forum?id=IEKtMMSblm
#quantization #imagenet #sparsity
mistral's 8x22B is ~260GB
the trend is to get models smaller, not bigger
pruning, sparsity, quantization, distillation
so why such a huge model?
does mistral have no other models?
Yasuhisa Kuroda released a spectral data processing program for chemical analysis called SPANA eonet.ne.jp/~spana-lsq/i.... With our BEADS algorithm (baseline estimation & denoising w/ #sparsity) to separate peaks, baseline and noise! doi.org/10.1016/j.ch... #analyticalchemistry