@jannisblrn.bsky.social wrote a very nice teaser about our Neurips paper Quantum Doubly Stochastic Transformers (spotlight). Our co-authors Filip and Kahn will present it in San Diego, and Jannis in EurIPS. You can find links to the paper, video, and poster below:
www.linkedin.com/posts/jannis...
Posts by Jannis Born
๐ค Open position at IBM Research Zurich!
Passionate about AI for maths & curious about Quantum Computing?
Join our team & help to shape the future of computing!
We are offering internships & master theses. If you are looking for a PhD, please apply to the same ad!
๐ www.zurich.ibm.com/careers/2025...
After several years of usage by the open-source community, our paperscraper package finally has its own Docs available: jannisborn.github.io/paperscraper/
Use #paperscraper for publication keyword search, download PDFs, extract citation statistics and many more! ๐
Check out our workflow for AI-driven molecular design. Weโve successfully validated this experimentally already (papers coming soon)!
1/ New paper out in @commsbio.nature.com, led by @marinv.bsky.social: doi.org/10.1038/s420...! Across several past studies, we showed how newborns' degraded vision may benefit human development and inspire more robust deep networks. We have referred to this as Adaptive Initial Degradations (AID).
Jonas Zausinger*, Lars Pennig*, Anamarija Kozina, Sean Sdahl, Julian Sikora, Adrian Dendorfer, Timofey Kuznetsov, Mohamad Hagog, Nina Wiedemann, Kacper Chlodny, Vincent Limbach, Anna Ketteler, Thorben Prein, Vishwa Mohan Singh & Michael Danziger.
๐ป GitHub code: ibm.biz/ntl-code
It was an incredible experience to run this project ๐ But it only really came to life through the endless effort of all the amazing co-authors ๐ฅ๐ช
๐ Landing page: ibm.biz/ntl-main
5. Text-task friendly: Doesnโt interfere with CE on purely textual tasks ๐
6. Scalable: Tested up to 3B, e.g., with hashtag#IBMGranite 3.2๐
7. Plug-and-play: Itโs โjust a loss,โ so itโs super easy to adopt ๐ข
๐ ICML paper: ibm.biz/ntl-paper
1. Better math performance: NTL consistently boosts accuracy on math benchmarks (e.g., GSM-8K) ๐
2. Lightning-fast: 100ร faster to compute than CE, so thereโs no training overhead โก
3. Model-agnostic: Works with Transformers, Mamba, etc. ๐ค
(continued โฌ๏ธ )
๐๏ธ Hugging Face Spaces demo: ibm.biz/ntl-demo
In our upcoming #ICML2025 paper, we introduce the #NumberTokenLoss (NTL) to address this -- see the demo above! NTL is a regression-style loss computed at the token levelโno extra regression head needed. We propose adding NTL on top of CE during LLM pretraining. Our experiments show: (see โฌ๏ธ )
#ICML Why are LLMs so powerful but still suck at math? ๐ค A key problem is cross-entropy loss: It is nominal-scale, so tokens are unordered. That makes sense for words, but not for numbers. For a "5" label, predicting โ6โ or โ9โ gives the same loss ๐ฑ Yes, it's crazy! No, nobody has fixed this yet! โฌ๏ธ
๐จ Our new paper: Conditional Optimal Transport generalizes well to unseen drugs. Big step forward, thanks to conditional Monge Gap! Even better: conditional models often beat local, non-conditional ones. arxiv.org/abs/2504.08328. Code public! Thanks to all co-authors
@marianna-raps.bsky.social
Great to hear! ๐ Let me know if there are questions
Our next journal club meeting will be discussing "A Computational Investigation of Inventive Spelling and the 'Lesen durch Schreiben' Method" by @jannisblrn.bsky.social et al. on 23 Jan 2025, 11am - 12pm (GMT+1). Join us by emailing us at gewonn.contact.us@gmail.com, and stay tuned for more news!
If you're @neuripsconf.bsky.social and into #OptimalTransport & bio, dont miss on Alice Driessen's spotlight talk on #ConditionalMongeGap for modeling CAR Response. Today #AIDrugX workshop!
Positive results on OOD perturbations -> accurate gene expression prediction. Paper: ibm.biz/carot-pre
Full poster
Number token loss
A new loss improves math capabilities in language models! The loss is model-agnostic and only requires to know which tokens represent numbers.
No computational overhead but better performance.
Poster today @NeurIPS - MathAI Workshop! Thx to collaborators from TUM AI!
Paper: arxiv.org/abs/2411.02083
Can we iteratively design small molecules with desired target properties, simply by sending messages on Slack? YES!
Super excited to give a live demo on๐คdZiner๐งช during the SPOTLIGHT ๐ฆ talk at #AI4Mat #NeurIPS2024!
Preprint: lnkd.in/e-24AEHC
Code: lnkd.in/egF4hGCg