Unsloth AI (@unsloth.ai) Bsky

You can now fine-tune Gemma 4 with our free notebooks! 🔥

You just need 8GB VRAM to train Gemma 4 locally!

Unsloth trains Gemma4 1.5x faster with 50% less VRAM.

GitHub: github.com/unslothai/un...
Guide: unsloth.ai/docs/models/...

Gemma-4-E4B Colab: colab.research.google.com/github/unslo...

6 days ago 33 3 1 3

Gemma 4 E4B (4-bit) completed a full repo audit by executing Bash code and tool calls locally.

Runs on just 6GB RAM.

1 week ago 138 24 3 9

Google releases Gemma 4. ✨

Gemma 4 introduces 4 models: E2B, E4B, 26B-A4B, 31B.
The multimodal reasoning models are under Apache 2.0.

Run E2B and E4B on ~6GB RAM, and on phones.
Run 26B-A4B and 31B on ~18GB.

GGUFs: huggingface.co/collections/...
Guide: unsloth.ai/docs/models/...

1 week ago 60 11 1 1

Qwen3.5-4B searched 20+ websites, cited its sources, and found the best answer! 🔥

Try this locally with just 4GB RAM via Unsloth Studio.

The 4B model did this by executing tool calls + web search directly during its thinking trace.

3 weeks ago 33 1 1 0

Introducing Unsloth Studio ✨
A new open-source web UI to train and run LLMs

• Run models locally on Mac, Windows, Linux
• Train 500+ models 2x faster with -70% VRAM
• Supports GGUF, vision, audio
• Create datasets from PDFs
• Tool calling, code execution + export

GitHub: github.com/unslothai/un...

3 weeks ago 44 7 3 1

We created a repo with 250+ notebooks for LLM training.

Train locally on your device with 3GB VRAM or free on Colab.

Learn the entire fine-tuning and inference workflow.

Supports RL, vision, audio, embedding, TTS models

GitHub: github.com/unslothai/no...

1 month ago 10 3 0 1

Learn how to run Qwen3.5 locally using Claude Code.

Our guide shows you how to run Qwen3.5 on your server for local agentic coding.

We then build a Qwen 3.5 agent that autonomously fine-tunes models using Unsloth.

Works on 24GB RAM or less.

Guide: unsloth.ai/docs/basics/...

1 month ago 39 2 2 0

You can now fine-tune Qwen3.5 with our free notebook! 🔥

You just need 5GB VRAM to train Qwen3.5-2B LoRA locally!

Unsloth trains Qwen3.5 1.5x faster with 50% less VRAM.

GitHub: github.com/unslothai/un...
Guide: unsloth.ai/docs/models/...

Qwen3.5-4B Colab: colab.research.google.com/github/unslo...

1 month ago 42 5 0 4

Qwen releases 4 new Qwen3.5 Small models!

Qwen3.5: 0.8B • 2B • 4B • 9B

Run Qwen3.5-0.8B, 2B and 4B on your phone. Run 9B on 6GB RAM.

The vision reasoning LLMs perform better than models 4x their size.

GGUFs: huggingface.co/collections/...
Guide: unsloth.ai/docs/models/...

1 month ago 52 4 1 3

apparently Unsloth is on bluesky?

2 months ago 53 3 2 0

You can now train MoE models 12× faster with 35% less VRAM via our new Triton kernels (no accuracy loss).

Train gpt-oss locally on 12.8GB VRAM.

In collab with @hf.co, Unsloth trains DeepSeek, Qwen3, GLM faster.

Repo: github.com/unslothai/un...
Blog: unsloth.ai/docs/new/fas...

2 months ago 60 3 0 1

You can now fine-tune embedding models in our free notebook!

Improve retrieval and RAG with better semantic search & similarity.

Unsloth trains 2x faster, 20% less VRAM, 2x context & no accuracy loss

Blog: unsloth.ai/docs/new/emb...
EmbeddingGemma (300M): colab.research.google.com/github/unslo...

2 months ago 4 0 0 0

You can now train LLMs 3× faster with no accuracy loss, via our new RoPE and MLP kernels.

Our Triton kernels plus smart auto packing delivers ~3× faster training & 30% less VRAM vs optimized FA3 setups.

Train Qwen3-4B 3x faster on just 3.9GB VRAM.

Blog: docs.unsloth.ai/new/3x-faste...

4 months ago 3 1 0 0

You can now train OpenAI gpt-oss with Reinforcement Learning in our free notebook!

This notebook automatically creates faster kernels via RL.

Unsloth RL achieves the fastest inference & lowest VRAM vs. any setup - 0 accuracy loss

gpt-oss-20b GRPO Colab: colab.research.google.com/github/unslo...

6 months ago 9 3 0 1

We're teaming up with Mistral and NVIDIA for an Unsloth event on Tues, Oct 21 at Y Combinator's office! 🦥

Join us in San Francisco for a night of talks, merch and more.

Food & drinks provided. RSVP required!
⭐ lu.ma/unsloth-yc

6 months ago 5 1 0 0

Run DeepSeek-V3.1 locally on 170GB RAM with our Dynamic 1-bit GGUFs!🐋

The 715GB model gets reduced to 170GB (-80% size) by smartly quantizing layers.

The 1-bit GGUF passes all our code tests & we fixed the chat template.

Guide: docs.unsloth.ai/basics/deeps...
GGUF: huggingface.co/unsloth/Deep...

7 months ago 5 0 0 0

Learn to fine-tune OpenAI gpt-oss with our new step-by-step guide! ✨

Learn about:
• Local gpt-oss training + inference FAQ & tips
• Evaluation, hyperparameters & overfitting
• Reasoning effort, Data prep
• Run & saving your model to llama.cpp GGUF, HF etc.

🔗Guide: docs.unsloth.ai/basics/gpt-o...

7 months ago 1 0 0 0

We made a complete Guide on Reinforcement Learning (RL) for LLMs!

Learn about:
• RL's goal & why it's key to building intelligent AI agents
• Why o3, Claude 4 & R1 use RL
• GRPO, RLHF, DPO, reward functions
• Training your own local R1 model with Unsloth

🔗 docs.unsloth.ai/basics/reinf...

9 months ago 2 0 0 0

You can now run DeepSeek-R1-0528 locally with our Dynamic 1-bit GGUFs! 🐋

We shrank the full 715GB model to just 168GB (-80% size).

We achieve optimal accuracy by selectively quantizing layers.

DeepSeek-R1-0528-Qwen3-8B is also supported.

GGUFs: huggingface.co/unsloth/Deep...

10 months ago 5 1 0 0

Introducing Unsloth Dynamic v2.0 GGUFs!

v2.0 sets new benchmarks on 5-shot MMLU + KL Divergence. So, you can now run quantized LLMs with minimal accuracy loss.

For benchmarks, we built an evaluation framework to match official MMLU scores of Llama 4 & Gemma 3

Blog: docs.unsloth.ai/basics/dynam...

11 months ago 5 1 0 1

You can now run Llama 4 on your local device! 🦙

We shrank Maverick (402B) from 400GB to 122GB (-70%).

Our Dynamic 1.78-bit iMatrix GGUFs ensures optimal accuracy & size by selectively quantizing layers.

Scout + Maverick GGUFs: huggingface.co/collections/...
Guide: docs.unsloth.ai/basics/tutor...

1 year ago 14 2 0 1

The 1.58-bit quant fits in 131GB VRAM (2× H100s) for fast throughput inference at ~140 tokens/s.

For best results, use 2.51-bit Dynamic quant & at least 160GB+ combined VRAM + RAM.

Basic 1-bit & 2-bit quantization causes the model to produce repetition and poor code. Our dynamic quants solve this.

1 year ago 0 0 0 0

How to Run Deepseek-V3-0324 Locally DeepSeek's V3-0324 model is the most powerful open-source model rivalling GPT 4.5 and Claude 3.7. Learn to run the model with Unsloth Dynamic quants.

You can now Run DeepSeek-V3-0324 locally using our 1.58 & 2.51-bit Dynamic GGUF!🐋

We shrank the 720GB model to 131GB (-80%) by selectively quantizing layers for the best performance. Fixes basic quants breaking & bad output issues: www.unsloth.ai/blog/deepsee...

GGUF huggingface.co/unsloth/Deep...

1 year ago 12 1 1 0

We teamed up with 🤗Hugging Face to release a free notebook for fine-tuning Gemma 3 with GRPO

Learn to:
• Enable reasoning in Gemma 3 (1B)
• Prepare/understand reward functions
• Make GRPO work for tiny LLMs

Notebook: colab.research.google.com/github/unslo...
Details: huggingface.co/reasoning-co...

1 year ago 16 6 0 0

We made a Guide to teach you how to Fine-tune LLMs correctly!

Learn about:
• Choosing the right parameters & training method
• RL, GRPO, DPO, CPT
• Data prep, Overfitting, Evaluation
• Training with Unsloth & deploy on vLLM, Ollama, Open WebUI

🔗https://docs.unsloth.ai/get-started/fine-tuning-guide

1 year ago 19 5 0 0

Tutorial: Train your own Reasoning LLM for free!

Make Llama 3.1 (8B) have chain-of-thought with DeepSeek's GRPO algorithm. Unsloth enables 90% less VRAM use.

Learn about:
• Reward Functions + dataset prep
• Training on free Colab GPUs
• Running + Evaluating

Guide: docs.unsloth.ai/basics/reaso...

1 year ago 10 3 0 0

For our benchmarks, a standard GRPO QLoRA setup (TRL + FA2) for Llama 3.1 (8B) at 20K context required 510.8GB VRAM. Unsloth’s GRPO algorithms reduces this to just 54.3GB.

The 5GB VRAM requirement for Qwen2.5 (1.5B) is down from 7GB in our previous GRPO release two weeks ago!

1 year ago 0 0 0 0

Today, we’re launching new algorithms that enable 10x longer context lengths & 90% less VRAM for training Reasoning Models (GRPO).

Using Unsloth, you can now train your own reasoning model with just 5GB VRAM for Qwen2.5-1.5B with no accuracy loss.

Blog: unsloth.ai/blog/grpo

1 year ago 11 3 1 0

You can now reproduce DeepSeek-R1's reasoning on your own local device!

Experience the "Aha" moment with just 7GB VRAM.

Unsloth reduces GRPO training memory use by 80%.

15GB VRAM can transform Llama-3.1 (8B) & Phi-4 (14B) into reasoning models.

Blog: unsloth.ai/blog/r1-reas...

1 year ago 34 8 0 1

Finetune Phi-4 with Unsloth Fine-tune Microsoft's new Phi-4 model with Unsloth! We've also found & fixed 4 bugs in the model.

You can finetune Phi-4 for free on Colab now!

Unsloth finetunes LLMs 2x faster, with 70% less VRAM, 12x longer context - with no accuracy loss

Documentation: docs.unsloth.ai
We also fixed 4 bugs in Phi-4: unsloth.ai/blog/phi4

Phi-4 Colab: colab.research.google.com/github/unslo...

1 year ago 15 3 0 0

Posts by Unsloth AI