8/ Paper preprint:
Mohammad Hassan Vali, Tom Bäckström, and Arno Solin (2026). DiVeQ: Differentiable vector quantization using the reparameterization trick. ICLR 2026.
arxiv.org/abs/2509.26469
Posts by Arno Solin
7/ DiVeQ is also included in the popular vector-quantize-pytorch package.
To use it there, enable:
directional_reparam=True
6/ We have also released a PyTorch package on PyPI:
pip install diveq
It implements the methods and variants from the paper and makes integration into training pipelines straightforward.
5/ The result is a direct and general way to do end-to-end trainable quantization, without many of the complications of earlier approaches.
We also see improved performance in image compression, image generation, and speech coding.
4/ We do this by modelling quantization as adding a carefully constructed error vector.
So the forward pass still uses hard assignments, while training gets meaningful gradient flow.
3/ In our #ICLR2026 paper, we introduce DiVeQ.
The idea is simple: keep the hard quantization behavior we want, but make training behave as if learning can still pass through it.
2/ The challenge is that VQ uses a hard nearest-codeword decision. That makes learning awkward, because the quantization step is non-differentiable and gradients stop flowing. Existing fixes often add bias and extra tuning.
1/ 🔥 New paper: Differentiable Vector Quantization (DiVeQ) 🔥
Vector quantization (VQ) is a core tool in modern AI. It connects continuous data like images and audio to discrete tokens used by transformers. It underpins compression, generation, and multimodal modelling.
Statement from #AISTATS2026 organizers regarding the @openreview.bsky.social API Security Incident
I'm feeling grateful to colleagues, students, collaborators, and everyone who joined the talk – and excited about the next steps in research on machines that learn, and maybe one day, truly make sense. 🙏✨
4/n
My own research, together with my group, focuses less on building the giant models and more on designing the building blocks behind them: model components, inductive biases, training principles, and inference methods that make AI systems more robust, data-efficient, and uncertainty-aware.
3/n
I talked about "Making Sense of Learning Machines":
• How modern machine learning has learned to cope with natural, “chaotic” data – images, text, sound
• Why the big breakthroughs of the last 10–15 years matter
• What we lack and what we would like to understand
2/n
I recently gave my installation talk after being tenured. The video of the talk is now available on the university's YouTube channel: youtu.be/R1UQoflPTDg 1/n
Yes. The easiest way to find it will be on the website virtual.aistats.org We are in the process of adding material there and will add a link.
We will go public with it as soon as everything is set up with the venue.
I'm thrilled to be Program Chairing AISTATS 2026 together with Aaditya Ramdas. AISTATS has a special feel to it, and it has been described by many colleagues as their "favourite conference". We aim to preserve that spirit while introducing some fresh elements for 2026. [3/3]
Accepted papers will be presented in person in Morocco, May 2–5, 2026. The full Call for Papers is available here: virtual.aistats.org/Conferences/... [2/3]
📣 Please share: We invite submissions to the 29th International Conference on Artificial Intelligence and Statistics (#AISTATS 2026) and welcome paper submissions at the intersection of AI, machine learning, statistics, and related areas. [1/3]
BitVI on 1D Gaussian mixture models.
Remember that computers use bitstrings to represent numbers? We exploit this in our recent @auai.org paper and introduce #BitVI.
#BitVI directly learns an approximation in the space of bitstring representations, thus, capturing complex distributions under varying numerical precision regimes.
Check our #CVPR paper and project page for more results, videos, and code!
📄 arxiv.org/abs/2411.19756
🎈 aaltoml.github.io/desplat/
Qualitative visualization of static distractor elements achieved by our model, DeSplat. [3/n]
Compared to Splatfacto we model and can ignore distractors to improve 3DGS reconstruction quality. [2/n]
Real-world #3DGS scenes are messy—occluders, moving objects, and clutter often ruin reconstruction. This #CVPR2025 paper presents DeSplat, which separates static scene content from distractors, all without requiring external semantic models. [1/n]
I’m visiting the Isaac Newton Institute for Mathematical Sciences in Cambridge this week.
I’m giving an invited talk in the ”Calibrating prediction uncertainty : statistics and machine learning perspectives” workshop on Thursday.
Our method addresses the eminent question of probabilistic modelling in quantized large-scale ML models. See the workshop paper below. [3/3]
📄 Paper: openreview.net/forum?id=Sai...
We introduce BitVI, a novel approach for variational inference with discrete bitstring representations of continuous parameters. We use a deterministic probabilistic circuit structure to model the distribution over bitstrings, allowing for exact and efficient probabilistic inference. [2/3]
Have you thought that in computer memory model weights are given in terms of discrete values in any case. Thus, why not do probabilistic inference on the discrete (quantized) parameters. @trappmartin.bsky.social is presenting our work at #AABI2025 today. [1/3]
We show that externalising reasoning as a DAG at test time leads to more accurate, efficient multi-hop retrieval – and integrates seamlessly with RAG systems like Self-RAG.
📄 Paper: openreview.net/pdf?id=gi9aq...
3/3
This work was born out of Prakhar's internship with Microsoft Research (\w Sukruta Prakash Midigeshi, Gaurav Sinha, Arno Solin, Nagarajan Natarajan, and Amit Sharma).
2/3