π₯ Videos of our invited talks and the panel discussion are now also available on YouTube: www.youtube.com/@tokenizatio... βΆοΈ
Posts by Tokenization Workshop (TokShop) @ICML2025
π₯ Videos from our Tokenization Workshop are now live! Watch invited talks, panel discussions, and the best paper presentation at icml.cc/virtual/2025... #Tokenization #NLP #LLMs
π Announcing our Best Paper Awards!
π₯ Winner: "BPE Stays on SCRIPT: Structured Encoding for Robust Multilingual Pretokenization" openreview.net/forum?id=AO7...
π₯ Runner-up: "One-D-Piece: Image Tokenizer Meets Quality-Controllable Compression" openreview.net/forum?id=lC4...
Congrats! π
π₯ The Tokenization Workshop is happening NOW, and we have a packed room! It's great to see so much interest in tokenization research. #ICML2025 #Tokenization #LLM #NLP
Three invited speakers will share their insights at TokShop! Hear from Yuval Pinter @uvp.bsky.social, Desmond Elliott @delliott.bsky.social, and Adrian ΕaΕcuck on cutting-edge tokenization research. Don't miss these keynote presentations! #ICML2025 tokenization-workshop.github.io/speakers
π€ Meet our expert panelists! Join Albert Gu, Alisa Liu, Kris Cao, Sander Land, and Yuval Pinter as they discuss the Future of Tokenization on July 18 at 3:30 PM at TokShop at #ICML2025.
The TokShop schedule is now live! Join us at #ICML2025 for invited talks, poster sessions, and a panel on the future of tokenization. tokenization-workshop.github.io/schedule #Tokenization #LLM #NLP
TokShop @ #ICML2025 got way more submissions than expected! π We could really use a few more reviewers to help out. If you have the capacity to review a #tokenization paper by Saturday, please fill out this form: forms.gle/32A6sQHQrMSb... π
π£ We extend the submission deadline by 24 hours to avoid conflict with ACL camera-ready deadline.
π
New Submission Deadline: May 31, 2025 (23:59 AoE)
π© OpenReview: openreview.net/group?id=ICM...
Got a good tokenization paper under review at COLM, but the scores were a letdown? π¬
Why bother with rebuttal when the perfect venue is right around the corner!
Submit your paper to the #ICML2025 Tokenization Workshop (TokShop) by May 30! π
Beyond text: Modern AI tokenizes images too! Vision models split photos into patches, treating each 16x16 pixel square as a "token." πΌοΈβ‘οΈπ€ #VisualTokenization
Interested in tokenization? Join our workshop tokenization-workshop.github.io
The submission deadline is already May 30!
Got a tokenization paper rejected from ACL? Didn't submit to EMNLP/NeurIPS? Want to present your ACL/EMNLP/NeurIPS work non-archivally? Submit to TokShop @ ICML 2025!
The deadline is already May 30!
openreview.net/group?id=ICM...
tokenization-workshop.github.io
Language matters: Low-resource languages are severely overtokenized: While English uses ~1.2 tokens per word, e.g., Tamil requires more tokens than characters, making #LLMs much costlier for billions of speakers! πΈπ
Check out our ICML workshop π tokenization-workshop.github.io
Did you know BPE (Byte Pair Encoding), the most common LLM tokenizer, was originally a compression algorithm from 1994? #Tokenization #LLM #NLP
Want to find out more about tokenization? Attend our workshop at ICML! tokenization-workshop.github.io
π Submit papers (up to 9 pages, shorter submission ) via OpenReview: openreview.net/group?id=ICM...
ποΈ Important dates:
Deadline: May 30, 2025
Notifications: June 9, 2025
Workshop: July 18, 2025
Both archival and non-archival options available! #ICML2025 #TokShop #ML #NLP
π£ Call for Paper Alert: TokShop @ ICML 2025
TokShop explores tokenization across all data modalities. Topics include: subword NLP techniques, multimodal approaches, multilingual challenges, post-training modification, alternative representations, and statistical perspectives.
Got a tokenization paper that just didn't make the cut for ICML? Submit it to the Tokenization Workshop TokShop at #ICML2025 -- we'd love to see it there!
tokenization-workshop.github.io
TokShop is organized by an amazing team of researchers passionate about tokenization:
@tomlim.bsky.social, @valentinhofmann.bsky.social, @shocheen.bsky.social, @jlibovicky.bsky.social, @jindrahelcl.bsky.social, @orevaahia.bsky.social,
@esalesky.bsky.social, @smfsamir.bsky.social
In the upcoming weeks, we will announce an exciting line-up of invited talks and panelists. Follow our account
@tokshop.bsky.social to stay tuned.
Join us at TokShop at #ICML2025!
We're looking for papers on tokenization in text, vision, audio, multimodal, and more.
π Up to 9 pages (shorter welcome!)
π Double-blind review
π Archival and non-archival options available
There has been a lot of chatter about tokenization for LLMs over the last few months, but tokenization goes beyond text-based models.
It's time we bring the NLP and ML communities together to explore this foundational topic. Let's talk about tokenization at TokShop!
π¨ NEW WORKSHOP ALERT π¨
We're thrilled to announce the first-ever Tokenization Workshop (TokShop) at #ICML2025 @icmlconf.bsky.social! π
Submissions are open for work on tokenization across all areas of machine learning.
π
Submission deadline: May 30, 2025
π tokenization-workshop.github.io
TokShop is organized by an amazing team of researchers passionate about tokenization: @tomlim.bsky.social, @valentinhofmann.bsky.social, @shocheen.bsky.social, @jlibovicky.bsky.social, @jindrahelcl.bsky.social, @orevaahia.bsky.social, @esalesky.bsky.social, @smfsamir.bsky.social
In the upcoming weeks, we will announce an exciting line-up of invited talks and panelists. Follow our account @tokshop.bsky.social to stay tuned.
Join us at TokShop at #ICML2025! @icmlconf.bsky.social
We're looking for papers on tokenization in text, vision, audio, multimodal, and more.
π Up to 9 pages (shorter welcome!)
π Double-blind review
π Archival and non-archival options available