Advertisement ยท 728 ร— 90

Posts by Jannis Born

Preview
#transformers #neurips #eurips #ibmresearch | Jannis Born ๐—ก๐—ฒ๐˜‚๐—ฟ๐—œ๐—ฃ๐—ฆ ๐˜€๐—ฝ๐—ผ๐˜๐—น๐—ถ๐—ด๐—ต๐˜ for our work on "๐—ค๐˜‚๐—ฎ๐—ป๐˜๐˜‚๐—บ ๐——๐—ผ๐˜‚๐—ฏ๐—น๐˜† ๐—ฆ๐˜๐—ผ๐—ฐ๐—ต๐—ฎ๐˜€๐˜๐—ถ๐—ฐ ๐—ง๐—ฟ๐—ฎ๐—ป๐˜€๐—ณ๐—ผ๐—ฟ๐—บ๐—ฒ๐—ฟ๐˜€" ๐Ÿ”ฆ ๐˜Š๐˜ข๐˜ฏ ๐˜ฑ๐˜ณ๐˜ช๐˜ฏ๐˜ค๐˜ช๐˜ฑ๐˜ญ๐˜ฆ๐˜ด ๐˜ง๐˜ณ๐˜ฐ๐˜ฎ ๐˜ฒ๐˜ถ๐˜ข๐˜ฏ๐˜ต๐˜ถ๐˜ฎ ๐˜ค๐˜ฐ๐˜ฎ๐˜ฑ๐˜ถ๐˜ต๐˜ช๐˜ฏ๐˜จ ๐˜ฃ๐˜ฆ ๐˜ฃ๐˜ญ๐˜ฆ๐˜ฏ๐˜ฅ๐˜ฆ๐˜ฅ ๐˜ช๐˜ฏ๐˜ต๐˜ฐ ๐˜ต๐˜ฉ๐˜ฆ ๐˜ฎ๐˜ฐ๐˜ด๐˜ต ๐˜ฑ๐˜ฐ๐˜ธ๐˜ฆ๐˜ณ๐˜ง๐˜ถ๐˜ญ ๐˜”๐˜“ ๐˜ฎ๐˜ฐ๐˜ฅ๐˜ฆ๐˜ญ๐˜ด? ๐Ÿค” ๐—ง๐—ต๐—ฒ ๐—ฝ๐—ฟ๐—ผ๐—ฏ๐—น๐—ฒ๐—บ ๐—ถ๐—ป #๐—ง๐—ฟ๐—ฎ๐—ป๐˜€๐—ณ๐—ผ๐—ฟ๐—บ๐—ฒ๐—ฟ๐˜€: Transf...

@jannisblrn.bsky.social wrote a very nice teaser about our Neurips paper Quantum Doubly Stochastic Transformers (spotlight). Our co-authors Filip and Kahn will present it in San Diego, and Jannis in EurIPS. You can find links to the paper, video, and poster below:

www.linkedin.com/posts/jannis...

5 months ago 2 1 0 0

๐Ÿค“ Open position at IBM Research Zurich!
Passionate about AI for maths & curious about Quantum Computing?
Join our team & help to shape the future of computing!
We are offering internships & master theses. If you are looking for a PhD, please apply to the same ad!
๐Ÿ‘‰ www.zurich.ibm.com/careers/2025...

7 months ago 0 0 0 0
Paperscraper Documentation for the paperscraper python package

After several years of usage by the open-source community, our paperscraper package finally has its own Docs available: jannisborn.github.io/paperscraper/
Use #paperscraper for publication keyword search, download PDFs, extract citation statistics and many more! ๐Ÿš€

8 months ago 1 0 0 0

Check out our workflow for AI-driven molecular design. Weโ€™ve successfully validated this experimentally already (papers coming soon)!

8 months ago 3 1 0 0
Preview
Potential role of developmental experience in the emergence of the parvo-magno distinction - Communications Biology Developmentally-driven computational modeling study suggests that early sensory experience shapes distinct neuronal response properties in the visual system, providing a potential account of the emerg...

1/ New paper out in @commsbio.nature.com, led by @marinv.bsky.social: doi.org/10.1038/s420...! Across several past studies, we showed how newborns' degraded vision may benefit human development and inspire more robust deep networks. We have referred to this as Adaptive Initial Degradations (AID).

9 months ago 31 13 1 1
Preview
GitHub - tum-ai/number-token-loss: A regression-alike loss to improve numerical reasoning in language models A regression-alike loss to improve numerical reasoning in language models - tum-ai/number-token-loss

Jonas Zausinger*, Lars Pennig*, Anamarija Kozina, Sean Sdahl, Julian Sikora, Adrian Dendorfer, Timofey Kuznetsov, Mohamad Hagog, Nina Wiedemann, Kacper Chlodny, Vincent Limbach, Anna Ketteler, Thorben Prein, Vishwa Mohan Singh & Michael Danziger.

๐Ÿ’ป GitHub code: ibm.biz/ntl-code

9 months ago 0 0 0 0
Regress, Don't Guess -- A Regression-like Loss on Number Tokens for Language Models While language models have exceptional capabilities at text generation, they lack a natural inductive bias for emitting numbers and thus struggle in tasks involving quantitative reasoning, especially ...

It was an incredible experience to run this project ๐Ÿš€ But it only really came to life through the endless effort of all the amazing co-authors ๐Ÿ”ฅ๐Ÿ’ช

๐ŸŒ Landing page: ibm.biz/ntl-main

9 months ago 0 0 1 0
Advertisement
Regress, Donโ€™t Guess โ€“ Number Token Loss A regression-like loss on number tokens for language models.

5. Text-task friendly: Doesnโ€™t interfere with CE on purely textual tasks ๐Ÿ“š
6. Scalable: Tested up to 3B, e.g., with hashtag#IBMGranite 3.2๐Ÿš€
7. Plug-and-play: Itโ€™s โ€œjust a loss,โ€ so itโ€™s super easy to adopt ๐Ÿ”ข
๐Ÿ“„ ICML paper: ibm.biz/ntl-paper

9 months ago 0 0 1 0
Regress, Don't Guess -- A Regression-like Loss on Number Tokens for Language Models While language models have exceptional capabilities at text generation, they lack a natural inductive bias for emitting numbers and thus struggle in tasks involving quantitative reasoning, especially ...

1. Better math performance: NTL consistently boosts accuracy on math benchmarks (e.g., GSM-8K) ๐Ÿ“Š
2. Lightning-fast: 100ร— faster to compute than CE, so thereโ€™s no training overhead โšก
3. Model-agnostic: Works with Transformers, Mamba, etc. ๐Ÿค–
(continued โฌ‡๏ธ )
๐ŸŽ›๏ธ Hugging Face Spaces demo: ibm.biz/ntl-demo

9 months ago 0 0 1 0
Post image

In our upcoming #ICML2025 paper, we introduce the #NumberTokenLoss (NTL) to address this -- see the demo above! NTL is a regression-style loss computed at the token levelโ€”no extra regression head needed. We propose adding NTL on top of CE during LLM pretraining. Our experiments show: (see โฌ‡๏ธ )

9 months ago 1 1 1 0
Video

#ICML Why are LLMs so powerful but still suck at math? ๐Ÿค” A key problem is cross-entropy loss: It is nominal-scale, so tokens are unordered. That makes sense for words, but not for numbers. For a "5" label, predicting โ€œ6โ€ or โ€œ9โ€ gives the same loss ๐Ÿ˜ฑ Yes, it's crazy! No, nobody has fixed this yet! โฌ‡๏ธ

9 months ago 2 0 1 0
Preview
Towards generalizable single-cell perturbation modeling via the Conditional Monge Gap Learning the response of single-cells to various treatments offers great potential to enable targeted therapies. In this context, neural optimal transport (OT) has emerged as a principled methodologic...

๐Ÿšจ Our new paper: Conditional Optimal Transport generalizes well to unseen drugs. Big step forward, thanks to conditional Monge Gap! Even better: conditional models often beat local, non-conditional ones. arxiv.org/abs/2504.08328. Code public! Thanks to all co-authors
@marianna-raps.bsky.social

1 year ago 5 1 1 0

Great to hear! ๐Ÿ™ƒ Let me know if there are questions

1 year ago 1 0 0 0
Redirecting

Our next journal club meeting will be discussing "A Computational Investigation of Inventive Spelling and the 'Lesen durch Schreiben' Method" by @jannisblrn.bsky.social et al. on 23 Jan 2025, 11am - 12pm (GMT+1). Join us by emailing us at gewonn.contact.us@gmail.com, and stay tuned for more news!

1 year ago 3 3 1 1
Advertisement
Post image

If you're @neuripsconf.bsky.social and into #OptimalTransport & bio, dont miss on Alice Driessen's spotlight talk on #ConditionalMongeGap for modeling CAR Response. Today #AIDrugX workshop!

Positive results on OOD perturbations -> accurate gene expression prediction. Paper: ibm.biz/carot-pre

1 year ago 4 1 0 0
Post image

Full poster

1 year ago 0 0 0 0
Number token loss

Number token loss

A new loss improves math capabilities in language models! The loss is model-agnostic and only requires to know which tokens represent numbers.
No computational overhead but better performance.
Poster today @NeurIPS - MathAI Workshop! Thx to collaborators from TUM AI!
Paper: arxiv.org/abs/2411.02083

1 year ago 7 0 1 0
Post image

Can we iteratively design small molecules with desired target properties, simply by sending messages on Slack? YES!

Super excited to give a live demo on๐Ÿค–dZiner๐Ÿงช during the SPOTLIGHT ๐Ÿ”ฆ talk at #AI4Mat #NeurIPS2024!

Preprint: lnkd.in/e-24AEHC
Code: lnkd.in/egF4hGCg

1 year ago 14 3 0 0