arxiv.org/abs/2603.15045 LLMs and Speech: Integration vs. Combination. Comparison. Shallow fusion CTC + ext LLM with delayed fusion performs best. Proposed CTC + speech LLM (prefix LM), also with novel optimizations.
Posts by Albert Zeyer
arxiv.org/abs/2512.13576 Denoising Language Models for Speech Recognition. Outperforming standard LMs in data-constrained setting given enough compute, similar to diffusion LMs. More efficient decoding compared to std LMs. Public recipe, state-of-the-art, comprehensive studies.
askubuntu.com/questions/15... Debugging an annoying problem on #Linux, namely constant "Authentication Required" dialogues, involving polkit/policykit, pkcheck, loginctl, NetworkManager, ... and finally chrome-remote-desktop, which triggered the problem in the first place.
Nix wants to cleans all procs of the build user, via setuid(uid) + kill(-1, SIGKILL)(github.com/NixOS/nix/bl...). When nix is run via apptainer/singularity with --fakeroot, it will kill all running procs of your current user even outside of apptainer. --pid is a good idea here...
github.com/albertz/wiki... Why game development is a great learning playground. Updated and resurrected article.
github.com/albertz/py_b... www.reddit.com/r/Python/com... Some updates to my #Python better_exchook, semi-intelligently print variables in stack traces. Better selection of what variables to print, multi-line statements in stack trace output, full function qualified name (not just co_name)
I also see a bit the frustration in continuing doing research with small models and questioning the relevance of that nowadays. Or not being able to train large models with limited compute.
But I do see a bit the frustration that some of the domain-specific knowledge (e.g. subtleties of speech recognition models) seem to become somewhat irrelevant. But that was always clear. I'm not sure whether large models make this more irrelevant than what was anyway to be expected.
So I think working for 5 years or so full-time on this, involving research, publications in top conferences, provides still a much better level of experience than any bachelor or master student could possibly gain. Bachelor or master students are not better versed at training and deploying models.
At least in our group, the work we did was still very practical - always building actual speech recognition or translation or other kind of models, i.e. building things which could actually be used like this in production (and it is).
Google Scholar is messed up right now? The Transformer paper PDF links to some weird host, doesn't show other versions, and only shows the first author as sole author?
The same also for the LSTM paper after you click on 'cite'.
I often split this already in the beginning, so the loop goes like:
for msg_type, msg_opts in msg_queue: ...
msg_type is just a string, msg_opts is a dict.
Whether you use `match`/`case` or `if` on the msg_type is a matter of preference. Match/case is still quite young (Python 3.10).
I just learned that Torch ctc_loss calculates the wrong gradient (but when there was log_softmax before, it does not matter).
For the grad ctc_loss w.r.t. log_probs, it calculates exp(log_probs) - y, but correct would be -y. Some workaround: github.com/pytorch/pyto...
PS: First Bluesky post.