Advertisement · 728 × 90

Posts by Mobius Labs

Post image

Our re-distilled Deepseek R1 (1.5B) outperforms the original distilled model! Get it at huggingface.co/mobiuslabsgm.... We’re distilling more models and look forward to releasing them soon!

1 year ago 0 0 0 1
Lecture 34: Low Bit Triton Kernels
Lecture 34: Low Bit Triton Kernels YouTube video by GPU MODE

Watch this video for insights into our experience during development:
www.youtube.com/watch?v=7c3c...

1 year ago 0 0 0 0
Preview
Release 0.4.0 · mobiusml/gemlite Improved performance on the A100 and H100. Flexible bitpacking support (32-bit / 8-bit, over cols or rows). Best config caching over all kernels. Helper functions for easier usage. GEMV_SPLITK kern...

Introduced new kernels, max-autotuning, and several other improvements Check out the release details at github.com/mobiusml/gem...

1 year ago 0 0 1 0
Preview
GitHub - mobiusml/gemlite: Fast low-bit matmul kernels in Triton Fast low-bit matmul kernels in Triton. Contribute to mobiusml/gemlite development by creating an account on GitHub.

Releasing a new version of Gemlite github.com/mobiusml/gem... significantly improved performance on datacenter GPUS (A100/H100) delivering up to 7–8x faster prefill and 3–6x faster batch decoding compared to PyTorch's tinygemm.

1 year ago 4 2 1 1
Preview
Release faster-whisper 1.1.0 · SYSTRAN/faster-whisper New Features New batched inference that is 4x faster and accurate, Refer to README on usage instructions. Support for the new large-v3-turbo model. VAD filter is now 3x faster on CPU. Feature Extr...

Really happy to contribute to the batched version of faster-whisper that is 4x faster and more accurate 🚀🚀🚀

github.com/SYSTRAN/fast...

1 year ago 2 1 0 0