You can now fine-tune Gemma 4 with our free notebooks! 🔥
You just need 8GB VRAM to train Gemma 4 locally!
Unsloth trains Gemma4 1.5x faster with 50% less VRAM.
GitHub: github.com/unslothai/un...
Guide: unsloth.ai/docs/models/...
Gemma-4-E4B Colab: colab.research.google.com/github/unslo...
Posts by Unsloth AI
Gemma 4 E4B (4-bit) completed a full repo audit by executing Bash code and tool calls locally.
Runs on just 6GB RAM.
Google releases Gemma 4. ✨
Gemma 4 introduces 4 models: E2B, E4B, 26B-A4B, 31B.
The multimodal reasoning models are under Apache 2.0.
Run E2B and E4B on ~6GB RAM, and on phones.
Run 26B-A4B and 31B on ~18GB.
GGUFs: huggingface.co/collections/...
Guide: unsloth.ai/docs/models/...
Qwen3.5-4B searched 20+ websites, cited its sources, and found the best answer! 🔥
Try this locally with just 4GB RAM via Unsloth Studio.
The 4B model did this by executing tool calls + web search directly during its thinking trace.
Introducing Unsloth Studio ✨
A new open-source web UI to train and run LLMs
• Run models locally on Mac, Windows, Linux
• Train 500+ models 2x faster with -70% VRAM
• Supports GGUF, vision, audio
• Create datasets from PDFs
• Tool calling, code execution + export
GitHub: github.com/unslothai/un...
We created a repo with 250+ notebooks for LLM training.
Train locally on your device with 3GB VRAM or free on Colab.
Learn the entire fine-tuning and inference workflow.
Supports RL, vision, audio, embedding, TTS models
GitHub: github.com/unslothai/no...
Learn how to run Qwen3.5 locally using Claude Code.
Our guide shows you how to run Qwen3.5 on your server for local agentic coding.
We then build a Qwen 3.5 agent that autonomously fine-tunes models using Unsloth.
Works on 24GB RAM or less.
Guide: unsloth.ai/docs/basics/...
You can now fine-tune Qwen3.5 with our free notebook! 🔥
You just need 5GB VRAM to train Qwen3.5-2B LoRA locally!
Unsloth trains Qwen3.5 1.5x faster with 50% less VRAM.
GitHub: github.com/unslothai/un...
Guide: unsloth.ai/docs/models/...
Qwen3.5-4B Colab: colab.research.google.com/github/unslo...
Qwen releases 4 new Qwen3.5 Small models!
Qwen3.5: 0.8B • 2B • 4B • 9B
Run Qwen3.5-0.8B, 2B and 4B on your phone. Run 9B on 6GB RAM.
The vision reasoning LLMs perform better than models 4x their size.
GGUFs: huggingface.co/collections/...
Guide: unsloth.ai/docs/models/...
apparently Unsloth is on bluesky?
You can now train MoE models 12× faster with 35% less VRAM via our new Triton kernels (no accuracy loss).
Train gpt-oss locally on 12.8GB VRAM.
In collab with @hf.co, Unsloth trains DeepSeek, Qwen3, GLM faster.
Repo: github.com/unslothai/un...
Blog: unsloth.ai/docs/new/fas...
You can now fine-tune embedding models in our free notebook!
Improve retrieval and RAG with better semantic search & similarity.
Unsloth trains 2x faster, 20% less VRAM, 2x context & no accuracy loss
Blog: unsloth.ai/docs/new/emb...
EmbeddingGemma (300M): colab.research.google.com/github/unslo...
You can now train LLMs 3× faster with no accuracy loss, via our new RoPE and MLP kernels.
Our Triton kernels plus smart auto packing delivers ~3× faster training & 30% less VRAM vs optimized FA3 setups.
Train Qwen3-4B 3x faster on just 3.9GB VRAM.
Blog: docs.unsloth.ai/new/3x-faste...
You can now train OpenAI gpt-oss with Reinforcement Learning in our free notebook!
This notebook automatically creates faster kernels via RL.
Unsloth RL achieves the fastest inference & lowest VRAM vs. any setup - 0 accuracy loss
gpt-oss-20b GRPO Colab: colab.research.google.com/github/unslo...
We're teaming up with Mistral and NVIDIA for an Unsloth event on Tues, Oct 21 at Y Combinator's office! 🦥
Join us in San Francisco for a night of talks, merch and more.
Food & drinks provided. RSVP required!
⭐ lu.ma/unsloth-yc
Run DeepSeek-V3.1 locally on 170GB RAM with our Dynamic 1-bit GGUFs!🐋
The 715GB model gets reduced to 170GB (-80% size) by smartly quantizing layers.
The 1-bit GGUF passes all our code tests & we fixed the chat template.
Guide: docs.unsloth.ai/basics/deeps...
GGUF: huggingface.co/unsloth/Deep...
Learn to fine-tune OpenAI gpt-oss with our new step-by-step guide! ✨
Learn about:
• Local gpt-oss training + inference FAQ & tips
• Evaluation, hyperparameters & overfitting
• Reasoning effort, Data prep
• Run & saving your model to llama.cpp GGUF, HF etc.
🔗Guide: docs.unsloth.ai/basics/gpt-o...
We made a complete Guide on Reinforcement Learning (RL) for LLMs!
Learn about:
• RL's goal & why it's key to building intelligent AI agents
• Why o3, Claude 4 & R1 use RL
• GRPO, RLHF, DPO, reward functions
• Training your own local R1 model with Unsloth
🔗 docs.unsloth.ai/basics/reinf...
You can now run DeepSeek-R1-0528 locally with our Dynamic 1-bit GGUFs! 🐋
We shrank the full 715GB model to just 168GB (-80% size).
We achieve optimal accuracy by selectively quantizing layers.
DeepSeek-R1-0528-Qwen3-8B is also supported.
GGUFs: huggingface.co/unsloth/Deep...
Introducing Unsloth Dynamic v2.0 GGUFs!
v2.0 sets new benchmarks on 5-shot MMLU + KL Divergence. So, you can now run quantized LLMs with minimal accuracy loss.
For benchmarks, we built an evaluation framework to match official MMLU scores of Llama 4 & Gemma 3
Blog: docs.unsloth.ai/basics/dynam...
You can now run Llama 4 on your local device! 🦙
We shrank Maverick (402B) from 400GB to 122GB (-70%).
Our Dynamic 1.78-bit iMatrix GGUFs ensures optimal accuracy & size by selectively quantizing layers.
Scout + Maverick GGUFs: huggingface.co/collections/...
Guide: docs.unsloth.ai/basics/tutor...
The 1.58-bit quant fits in 131GB VRAM (2× H100s) for fast throughput inference at ~140 tokens/s.
For best results, use 2.51-bit Dynamic quant & at least 160GB+ combined VRAM + RAM.
Basic 1-bit & 2-bit quantization causes the model to produce repetition and poor code. Our dynamic quants solve this.
You can now Run DeepSeek-V3-0324 locally using our 1.58 & 2.51-bit Dynamic GGUF!🐋
We shrank the 720GB model to 131GB (-80%) by selectively quantizing layers for the best performance. Fixes basic quants breaking & bad output issues: www.unsloth.ai/blog/deepsee...
GGUF huggingface.co/unsloth/Deep...
We teamed up with 🤗Hugging Face to release a free notebook for fine-tuning Gemma 3 with GRPO
Learn to:
• Enable reasoning in Gemma 3 (1B)
• Prepare/understand reward functions
• Make GRPO work for tiny LLMs
Notebook: colab.research.google.com/github/unslo...
Details: huggingface.co/reasoning-co...
We made a Guide to teach you how to Fine-tune LLMs correctly!
Learn about:
• Choosing the right parameters & training method
• RL, GRPO, DPO, CPT
• Data prep, Overfitting, Evaluation
• Training with Unsloth & deploy on vLLM, Ollama, Open WebUI
🔗https://docs.unsloth.ai/get-started/fine-tuning-guide
Tutorial: Train your own Reasoning LLM for free!
Make Llama 3.1 (8B) have chain-of-thought with DeepSeek's GRPO algorithm. Unsloth enables 90% less VRAM use.
Learn about:
• Reward Functions + dataset prep
• Training on free Colab GPUs
• Running + Evaluating
Guide: docs.unsloth.ai/basics/reaso...
For our benchmarks, a standard GRPO QLoRA setup (TRL + FA2) for Llama 3.1 (8B) at 20K context required 510.8GB VRAM. Unsloth’s GRPO algorithms reduces this to just 54.3GB.
The 5GB VRAM requirement for Qwen2.5 (1.5B) is down from 7GB in our previous GRPO release two weeks ago!
Today, we’re launching new algorithms that enable 10x longer context lengths & 90% less VRAM for training Reasoning Models (GRPO).
Using Unsloth, you can now train your own reasoning model with just 5GB VRAM for Qwen2.5-1.5B with no accuracy loss.
Blog: unsloth.ai/blog/grpo
You can now reproduce DeepSeek-R1's reasoning on your own local device!
Experience the "Aha" moment with just 7GB VRAM.
Unsloth reduces GRPO training memory use by 80%.
15GB VRAM can transform Llama-3.1 (8B) & Phi-4 (14B) into reasoning models.
Blog: unsloth.ai/blog/r1-reas...
You can finetune Phi-4 for free on Colab now!
Unsloth finetunes LLMs 2x faster, with 70% less VRAM, 12x longer context - with no accuracy loss
Documentation: docs.unsloth.ai
We also fixed 4 bugs in Phi-4: unsloth.ai/blog/phi4
Phi-4 Colab: colab.research.google.com/github/unslo...