You can now Run DeepSeek-V3-0324 locally using our 2.71-bit Dynamic GGUF! We shrank 720GB to 231GB (-70%) by selectively quantizing layers. 2.71bit passes many code tests, producing nearly identical results to the full 8bit model. Guide + examples: https://lnkd.in/g8_D9-fz Model upload: https://lnkd.in/gUfp5-xm
Unsloth AI
Technology, Information and Internet
Sans Fransisco, California 11,799 followers
Making AI accessible for everyone! 🦥
About us
- Website
-
https://unsloth.ai
External link for Unsloth AI
- Industry
- Technology, Information and Internet
- Company size
- 2-10 employees
- Headquarters
- Sans Fransisco, California
- Type
- Privately Held
- Founded
- 2023
- Specialties
- artificial intelligence, ai, llms, language models, and finetuning
Locations
-
Primary
Sans Fransisco, California 94107, US
Employees at Unsloth AI
Updates
-
Unsloth AI reposted this
We teamed up with Hugging Face to release a free GRPO notebook that fine-tunes Gemma 3 into a powerful reasoning model! Using Unsloth AI, OpenAI’s math dataset and custom reward functions, we fine-tune Google’s Gemma 3 (1B) to generate chain-of-thought reasoning. Free Colab Notebook: https://lnkd.in/e94SKJz4 Summary of what you'll learn: • Implement chain-of-thought reasoning in Google's Gemma 3 (1B) using 16-bit LoRA • Make tiny LLMs benefit from GRPO • Understand reward functions • Prepare your data + evaluate your LLM Join HF's Course: https://lnkd.in/e_PhX4tc Thank you Ben Burtenshaw for being patient and working with us on this collab! 🤗
-
-
Unsloth AI reposted this
The unit we’re all waiting for is here! Unsloth AI + Hugging Face on GRPO in the reasoning course. 🔗 https://lnkd.in/enr3adQ5 In this unit, you’ll build on the earlier units by implementing GRPO in Unsloth, this time we’re also levelling things up: - run on limited hardware with unsloth optimizations - expand GRPO reward functions to format and beyond - explore a wider range of model sizes up to 7B This should help way more students without serious hardware. Can’t wait to hear how it goes. Follow the org to join in: https://lnkd.in/enr3adQ5
-
-
Unsloth now works on Windows! 🦥 Fine-tune LLMs locally on Windows without Linux or WSL. Just install prerequisites & run our pip command. Tutorial: https://lnkd.in/gWC4AcMV
-
-
Tutorial: Train your own Reasoning LLM for free! Transform Llama 3.1 (8B) to have chain-of-thought using DeepSeek's GRPO algorithm. Unsloth makes GRPO use 90% less VRAM: https://docs.unsloth.ai/ You'll learn about: • Reward Functions + dataset prep • GRPO Basics + tips & tricks • Training on free Colab GPUs • Running + evaluating + saving your model Tutorial Link: https://lnkd.in/gxYGrFhd
-
-
Today, we’re launching new algorithms that enable 10x longer context lengths & 90% less VRAM for training Reasoning Models (GRPO). Using Unsloth, you can now train your own reasoning model with just 5GB VRAM for Qwen2.5-1.5B with no accuracy loss. Blog: https://lnkd.in/gnvEjxMm Free Colab Notebook for Llama 3.1 (8B) GRPO: https://lnkd.in/g7deg5Uw For our benchmarks, a standard GRPO QLoRA setup (TRL + FA2) for Llama 3.1 (8B) at 20K context required 510.8GB VRAM. Unsloth’s GRPO algorithms reduces this to just 54.3GB. The 5GB VRAM requirement for Qwen2.5 (1.5B) is down from 7GB in our previous GRPO release two weeks ago!
-
-
You can now reproduce DeepSeek-R1's reasoning on your own local device! Introducing reasoning in Unsloth. You'll just need 7GB VRAM to experience your own "Aha" moment 100% locally or free on Colab. Unsloth makes GRPO RL use 80% less memory. With 15GB VRAM, you can convert Llama 3.1 (8B), Phi-4 (14B), Mistral (7B), or any model up to 15B parameters into reasoning models. Guide + Blog: https://lnkd.in/gdzMDsYF
-
-
Unsloth AI reposted this
Introducing 1.58bit DeepSeek-R1 GGUFs! 🐋 R1 can now run in 1.58-bit, while being fully functional. We shrank the 671B parameter model from 720GB to just 131GB - a 80% size reduction. Naively quantizing all layers breaks the model entirely, causing endless loops & gibberish outputs. Our dynamic quants solve this. The 1.58-bit quant fits in 160GB VRAM (2x H100 80GB) for fast inference at ~140 tokens/sec for throughput. By studying DeepSeek AI's R1 architecture, we selectively quantized certain layers to higher bits (like 4-bit), and leave most MoE layers to 1.5-bit. Benchmarks + Blog: https://lnkd.in/g5uA3855 Dynamic GGUFs (131GB–212GB) on Hugging Face: https://lnkd.in/gP7ysgfe
-
-
Introducing 1.58bit DeepSeek-R1 GGUFs! 🐋 R1 can now run in 1.58-bit, while being fully functional. We shrank the 671B parameter model from 720GB to just 131GB - a 80% size reduction. Naively quantizing all layers breaks the model entirely, causing endless loops & gibberish outputs. Our dynamic quants solve this. The 1.58-bit quant fits in 160GB VRAM (2x H100 80GB) for fast inference at ~140 tokens/sec for throughput. By studying DeepSeek AI's R1 architecture, we selectively quantized certain layers to higher bits (like 4-bit), and leave most MoE layers to 1.5-bit. Benchmarks + Blog: https://lnkd.in/g5uA3855 Dynamic GGUFs (131GB–212GB) on Hugging Face: https://lnkd.in/gP7ysgfe
-
-
Unsloth AI reposted this
running Phi 4 w/ Ollama & Unsloth AI on Mac, 100% local and fully private! 🔥 ollama run hf. co/unsloth/phi-4-GGUF:Q8_0 that's it! 🤗