Advertisement · 728 × 90

Posts by Tomasz Limisiewicz

Tokenization Workshop (TokShop)ICML 2025

🎥 Videos from our Tokenization Workshop are now live! Watch invited talks, panel discussions, and the best paper presentation at icml.cc/virtual/2025... #Tokenization #NLP #LLMs

7 months ago 16 7 1 1
Post image

Check the BLT poster at @aclmeeting.bsky.social . It’s just fortaste before the main presentation at @tokshop.bsky.social next week from Artidoro Pagnoni!

9 months ago 10 0 0 0

Looking forward for out panel at 3:30. We’ll talk about future of tokenization: BLT, SuperBPE @alisawuffles.bsky.social, H-nets Albert Gu and further breakthroughs in tokenization @uvp.bsky.social, Sander Land, Kris Cao

bsky.app/profile/toks...

9 months ago 2 0 0 0

It’d be great to meet at Tokenization Workshop @tokshop.bsky.social #icml
tomorrow July 18 starting at 8:45 in Meeting 112-113!

9 months ago 0 0 1 0
Post image

The TokShop schedule is now live! Join us at #ICML2025 for invited talks, poster sessions, and a panel on the future of tokenization. tokenization-workshop.github.io/schedule #Tokenization #LLM #NLP

9 months ago 11 3 0 1
Post image

I'm pleased to be in Vancouver for @ICML this week 🇨🇦🤖. I'll be happy to chat about multilingual, multimodal LMs and tokenization(free).

9 months ago 5 0 0 0

If you have experience with tokenization (who doesn’t) your help with reviewing will be hugely appreciated! 🔠🔡

10 months ago 2 0 0 0

Got a good tokenization paper under review at COLM, but the scores were a letdown? 😬

Why bother with rebuttal when the perfect venue is right around the corner!

Submit your paper to the #ICML2025 Tokenization Workshop (TokShop) by May 30! 🚀

10 months ago 11 4 0 0
Post image

#NAACL2025 ended more than a week ago & @ufal-cuni.bsky.social folks were there:
Main conf: @kathaem.bsky.social presented joint work w/ @tomlim.bsky.social, @jlibovicky.bsky.social and Alex Fraser: Beyond Literal Token Overlap: Token Alignability for Multilinguality aclanthology.org/2025.naacl-s...

11 months ago 14 3 1 0
Preview
ICML 2025 Workshop TokShop Welcome to the OpenReview homepage for ICML 2025 Workshop TokShop

📣 Call for Paper Alert: TokShop @ ICML 2025
TokShop explores tokenization across all data modalities. Topics include: subword NLP techniques, multimodal approaches, multilingual challenges, post-training modification, alternative representations, and statistical perspectives.

11 months ago 18 12 1 2
Advertisement

It’s finally official: the long-awaited Tokenization Workshop is here!

1 year ago 1 1 0 0
Post image

So, apparently, confusing these two buttons can ignite a serious flame-war in reviewer-author discussion🔥 @aclmeeting.bsky.social

1 year ago 6 0 0 0
Post image Post image

Excited to continue my research adventure as a postdoc at @uwnlp.bsky.social and Meta! I’ve joined @lukezettlemoyer.bsky.social’s fantastic lab. Together, we plan to rethink how LLMs perceive data to unlock their capabilities to uncharted language and, further, beyond text!

1 year ago 15 1 1 0

Paper 👉Beyond Literal Token Overlap: Token Alignability for Multilinguality👈 by @kathaem.bsky.social, @tomlim.bsky.social, @jlibovicky.bsky.social and Alex Fraser will appear at #NAACL2025! arxiv.org/abs/2502.06468 Congratulations to all authors! 🥳

1 year ago 5 1 0 0
Preview
Beyond Literal Token Overlap: Token Alignability for Multilinguality Previous work has considered token overlap, or even similarity of token distributions, as predictors for multilinguality and cross-lingual knowledge transfer in language models. However, these very li...

Happy to say that our paper "Beyond Literal Token Overlap: Token Alignability for Multilinguality" will be presented at #NAACL2025!

This is work with @tomlim.bsky.social, @jlibovicky.bsky.social, and Alex Fraser.

arxiv.org/abs/2502.06468

#newpaper #NLP #NLProc

1 year ago 10 2 1 2

It’d be great to stay in touch!

1 year ago 0 0 0 0

Work in progress -- suggestions for NLP-ers based in the EU/Europe & already on Bluesky very welcome!

go.bsky.app/NZDc31B

1 year ago 70 20 51 0

Haha, that's me, both name and surname 😁

1 year ago 0 0 0 0
Advertisement

Ahh, that's a pitty to miss that .

1 year ago 0 0 0 0

Thanks, I'm happy to hear that 🙂. Do you have a rough estimate of when to expect a call for workshop proposals?

1 year ago 0 0 1 0

How about workshops before or after the main conference?

1 year ago 0 0 1 0

Good to see you here! #nlp

1 year ago 0 0 0 0
Preview
Lexically Grounded Subword Segmentation Jindřich Libovický, Jindřich Helcl. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024.

Also this one:

Lexically Grounded Subword Segmentation
aclanthology.org/2024.emnlp-m...

Poster Session Nov 12 (Tue) 2 pm 🙂

1 year ago 1 0 0 0

Fantastic list, thank you!

1 year ago 0 0 0 0

Tokenization is so back! at #EMNLP

1 year ago 7 0 0 0

also, if you are in Miami for EMNLP this week don’t miss Hila Gonen's MRL keynote about fair multilingual tokenization (including MYTE).

Happening on Saturday (Nov 16) at 9:50 am ET MRL workshop (room: Jasmine).

1 year ago 3 0 0 0
Advertisement
from transformers import T5ForConditionalGeneration 
from transformers import MyT5Tokenizer

MODEL_SIZE = "large" # small, base, or large
MODEL = f"Tomlim/myt5_{MODEL_SIZE}"

model = T5ForConditionalGeneration.from_pretrained(
  MODEL, use_safetensors=True)
  
tokenizer = MyT5Tokenizer.from_pretrained(MODEL)

from transformers import T5ForConditionalGeneration from transformers import MyT5Tokenizer MODEL_SIZE = "large" # small, base, or large MODEL = f"Tomlim/myt5_{MODEL_SIZE}" model = T5ForConditionalGeneration.from_pretrained( MODEL, use_safetensors=True) tokenizer = MyT5Tokenizer.from_pretrained(MODEL)

#firstpost

Are you working on NLP for low-resource or non-Latin script languages?

If yes, I have great news for you! Our MYTE tokenizer and MyT5 models 🪲 are now easily available through🤗. It’s easy to try:

1 year ago 8 0 1 0

If you are interested in AI, follow the folks in this starter pack! I have just updated it to include a few new arrivals here, but please let me know who else is missing

go.bsky.app/SipA7it

1 year ago 61 25 26 1

Great list, thanks for making start at 🦋 easier. I’d also love to be added to the list!

1 year ago 1 0 1 0

That's awesome. Time for a fresh start at 🦋

1 year ago 1 0 0 0