In spiking neural networks, neurons communicate - as in the brain - via short electrical pulsesโก(spikes). But how can we formally quantify the (dis)advantages of using spikes? ๐ค
In our new preprint, @pc-pet.bsky.social and I introduce the concept of "Causal Pieces" to approach this question!
Posts by Philipp Petersen
What happens if you lose 10$ per share in one week and gain 10$ per share the next alternating for 52 weeks ;)? Is the effect stronger if you replace 10$ by 20$?
The latest version includes:
โ
Significantly fewer typos
โ
More illustrations and figures
โ
Reorganized sections for better clarity
โ
Sharpened and improved arguments
table of contents
table of contents
table of contents
After receiving very helpful feedback from the community, Jakob Zech and I have revised our graduate textbook:
๐ ๐๐ข๐ต๐ฉ๐ฆ๐ฎ๐ข๐ต๐ช๐ค๐ข๐ญ ๐๐ฉ๐ฆ๐ฐ๐ณ๐บ ๐ฐ๐ง ๐๐ฆ๐ฆ๐ฑ ๐๐ฆ๐ข๐ณ๐ฏ๐ช๐ฏ๐จ
and uploaded the new version to arxiv:
๐ arxiv.org/abs/2407.18384
If you have already read itโor plan toโwe would really appreciate your feedback.
Great point in principle, but you seem to be having it at the ideal altitude and in the ideal season.
Other countries will only want to hire the top researchers, which only correspond to a small part of the budget.
๐ Key insights:
* The singular values of the query-key matrix product are the most critical parameters for tracking stability.
* Self-attention and softmax operations are the worst offenders for error amplification.
* There are stable (and unstable) methods for normalization.
Behavior of relative error for increasing spectral norm of key and query matrices
Would you expect an LLM using over 100 billion floating-point operations in low precision to produce accurate outputs?
Not if you heard an introductory class to numerics. How bad can things get? To find out, we carried out a numerical stability analysis of the transformer arxiv.org/abs/2503.10251.