Jannis Born (@jannisblrn) Bsky

#transformers #neurips #eurips #ibmresearch | Jannis Born 𝗡𝗲𝘂𝗿𝗜𝗣𝗦 𝘀𝗽𝗼𝘁𝗹𝗶𝗴𝗵𝘁 for our work on "𝗤𝘂𝗮𝗻𝘁𝘂𝗺 𝗗𝗼𝘂𝗯𝗹𝘆 𝗦𝘁𝗼𝗰𝗵𝗮𝘀𝘁𝗶𝗰 𝗧𝗿𝗮𝗻𝘀𝗳𝗼𝗿𝗺𝗲𝗿𝘀" 🔦 𝘊𝘢𝘯 𝘱𝘳𝘪𝘯𝘤𝘪𝘱𝘭𝘦𝘴 𝘧𝘳𝘰𝘮 𝘲𝘶𝘢𝘯𝘵𝘶𝘮 𝘤𝘰𝘮𝘱𝘶𝘵𝘪𝘯𝘨 𝘣𝘦 𝘣𝘭𝘦𝘯𝘥𝘦𝘥 𝘪𝘯𝘵𝘰 𝘵𝘩𝘦 𝘮𝘰𝘴𝘵 𝘱𝘰𝘸𝘦𝘳𝘧𝘶𝘭 𝘔𝘓 𝘮𝘰𝘥𝘦𝘭𝘴? 🤔 𝗧𝗵𝗲 𝗽𝗿𝗼𝗯𝗹𝗲𝗺 𝗶𝗻 #𝗧𝗿𝗮𝗻𝘀𝗳𝗼𝗿𝗺𝗲𝗿𝘀: Transf...

@jannisblrn.bsky.social wrote a very nice teaser about our Neurips paper Quantum Doubly Stochastic Transformers (spotlight). Our co-authors Filip and Kahn will present it in San Diego, and Jannis in EurIPS. You can find links to the paper, video, and poster below:

www.linkedin.com/posts/jannis...

5 months ago 2 1 0 0

🤓 Open position at IBM Research Zurich!
Passionate about AI for maths & curious about Quantum Computing?
Join our team & help to shape the future of computing!
We are offering internships & master theses. If you are looking for a PhD, please apply to the same ad!
👉 www.zurich.ibm.com/careers/2025...

7 months ago 0 0 0 0

Paperscraper Documentation for the paperscraper python package

After several years of usage by the open-source community, our paperscraper package finally has its own Docs available: jannisborn.github.io/paperscraper/
Use #paperscraper for publication keyword search, download PDFs, extract citation statistics and many more! 🚀

8 months ago 1 0 0 0

Check out our workflow for AI-driven molecular design. We’ve successfully validated this experimentally already (papers coming soon)!

8 months ago 3 1 0 0

Potential role of developmental experience in the emergence of the parvo-magno distinction - Communications Biology Developmentally-driven computational modeling study suggests that early sensory experience shapes distinct neuronal response properties in the visual system, providing a potential account of the emerg...

1/ New paper out in @commsbio.nature.com, led by @marinv.bsky.social: doi.org/10.1038/s420...! Across several past studies, we showed how newborns' degraded vision may benefit human development and inspire more robust deep networks. We have referred to this as Adaptive Initial Degradations (AID).

9 months ago 31 13 1 1

GitHub - tum-ai/number-token-loss: A regression-alike loss to improve numerical reasoning in language models A regression-alike loss to improve numerical reasoning in language models - tum-ai/number-token-loss

Jonas Zausinger*, Lars Pennig*, Anamarija Kozina, Sean Sdahl, Julian Sikora, Adrian Dendorfer, Timofey Kuznetsov, Mohamad Hagog, Nina Wiedemann, Kacper Chlodny, Vincent Limbach, Anna Ketteler, Thorben Prein, Vishwa Mohan Singh & Michael Danziger.

💻 GitHub code: ibm.biz/ntl-code

9 months ago 0 0 0 0

Regress, Don't Guess -- A Regression-like Loss on Number Tokens for Language Models While language models have exceptional capabilities at text generation, they lack a natural inductive bias for emitting numbers and thus struggle in tasks involving quantitative reasoning, especially ...

It was an incredible experience to run this project 🚀 But it only really came to life through the endless effort of all the amazing co-authors 🔥💪

🌐 Landing page: ibm.biz/ntl-main

9 months ago 0 0 1 0

Regress, Don’t Guess – Number Token Loss A regression-like loss on number tokens for language models.

5. Text-task friendly: Doesn’t interfere with CE on purely textual tasks 📚
6. Scalable: Tested up to 3B, e.g., with hashtag#IBMGranite 3.2🚀
7. Plug-and-play: It’s “just a loss,” so it’s super easy to adopt 🔢
📄 ICML paper: ibm.biz/ntl-paper

9 months ago 0 0 1 0

Regress, Don't Guess -- A Regression-like Loss on Number Tokens for Language Models While language models have exceptional capabilities at text generation, they lack a natural inductive bias for emitting numbers and thus struggle in tasks involving quantitative reasoning, especially ...

1. Better math performance: NTL consistently boosts accuracy on math benchmarks (e.g., GSM-8K) 📊
2. Lightning-fast: 100× faster to compute than CE, so there’s no training overhead ⚡
3. Model-agnostic: Works with Transformers, Mamba, etc. 🤖
(continued ⬇️ )
🎛️ Hugging Face Spaces demo: ibm.biz/ntl-demo

9 months ago 0 0 1 0

In our upcoming #ICML2025 paper, we introduce the #NumberTokenLoss (NTL) to address this -- see the demo above! NTL is a regression-style loss computed at the token level—no extra regression head needed. We propose adding NTL on top of CE during LLM pretraining. Our experiments show: (see ⬇️ )

9 months ago 1 1 1 0

#ICML Why are LLMs so powerful but still suck at math? 🤔 A key problem is cross-entropy loss: It is nominal-scale, so tokens are unordered. That makes sense for words, but not for numbers. For a "5" label, predicting “6” or “9” gives the same loss 😱 Yes, it's crazy! No, nobody has fixed this yet! ⬇️

9 months ago 2 0 1 0

Towards generalizable single-cell perturbation modeling via the Conditional Monge Gap Learning the response of single-cells to various treatments offers great potential to enable targeted therapies. In this context, neural optimal transport (OT) has emerged as a principled methodologic...

🚨 Our new paper: Conditional Optimal Transport generalizes well to unseen drugs. Big step forward, thanks to conditional Monge Gap! Even better: conditional models often beat local, non-conditional ones. arxiv.org/abs/2504.08328. Code public! Thanks to all co-authors
@marianna-raps.bsky.social

1 year ago 5 1 1 0

Great to hear! 🙃 Let me know if there are questions

1 year ago 1 0 0 0

Redirecting

Our next journal club meeting will be discussing "A Computational Investigation of Inventive Spelling and the 'Lesen durch Schreiben' Method" by @jannisblrn.bsky.social et al. on 23 Jan 2025, 11am - 12pm (GMT+1). Join us by emailing us at gewonn.contact.us@gmail.com, and stay tuned for more news!

1 year ago 3 3 1 1

If you're @neuripsconf.bsky.social and into #OptimalTransport & bio, dont miss on Alice Driessen's spotlight talk on #ConditionalMongeGap for modeling CAR Response. Today #AIDrugX workshop!

Positive results on OOD perturbations -> accurate gene expression prediction. Paper: ibm.biz/carot-pre

1 year ago 4 1 0 0

Full poster

1 year ago 0 0 0 0

Number token loss

A new loss improves math capabilities in language models! The loss is model-agnostic and only requires to know which tokens represent numbers.
No computational overhead but better performance.
Poster today @NeurIPS - MathAI Workshop! Thx to collaborators from TUM AI!
Paper: arxiv.org/abs/2411.02083

1 year ago 7 0 1 0

Can we iteratively design small molecules with desired target properties, simply by sending messages on Slack? YES!

Super excited to give a live demo on🤖dZiner🧪 during the SPOTLIGHT 🔦 talk at #AI4Mat #NeurIPS2024!

Preprint: lnkd.in/e-24AEHC
Code: lnkd.in/egF4hGCg

1 year ago 14 3 0 0

Posts by Jannis Born