HuggingFace
SmolLM2 135M
Tiny 135M model. Default LLM - guaranteed to run on any iPhone. Only 145MB download. Perfect for quick experiments.
About This Model
SmolLM2 135M is a lightweight language model developed by HuggingFace, designed for efficient local deployment on devices with limited resources. With just 135 million parameters, this model offers a balance between performance and resource consumption, making it particularly suitable for text generation tasks that require quick responses without heavy computational overhead. It excels in generating coherent and contextually relevant text, thanks to its impressive context length of 8192 tokens, which allows it to maintain a broader understanding of the input text compared to many smaller models.
Despite its relatively small size, SmolLM2 135M holds its own against larger models in its class, demonstrating good efficiency and effectiveness. It punches above its weight in terms of text quality and coherence, making it a solid choice for applications where real-time performance and low resource usage are crucial. The model supports quantization options like Q8_0 and FP16, further enhancing its efficiency and reducing memory requirements. Users with devices equipped with as little as 0.6–0.8 GB of VRAM can comfortably run this model, making it an excellent option for developers, hobbyists, and businesses looking to deploy text generation capabilities on edge devices, laptops, or other resource-constrained environments.
Check Your Hardware
See which quantizations of SmolLM2 135M your hardware can run.
Quantization Options
| Quantization | Bits | File Size | VRAM Needed | RAM Needed | Quality |
|---|---|---|---|---|---|
| Q8_0 | 8 | 0.135 GB | 0.64 GB | 1.14 GB | 98% |
| FP16 | 16 | 0.252 GB | 0.75 GB | 1.25 GB | 100% |
See It In Action
Real model outputs generated via RunThisModel.com — watch responses stream in real time.
Outputs generated by real AI models via RunThisModel.com. Generation speed shown is from cloud inference. Local speeds vary by hardware — check your device.
Frequently Asked Questions
How much VRAM do I need to run SmolLM2 135M?
SmolLM2 135M requires 0.64GB VRAM minimum with Q8_0 quantization. For full precision, you need 0.75GB VRAM.
What is the best quantization for SmolLM2 135M?
Q4_K_M offers the best balance of quality and VRAM usage. Q8_0 is near-lossless if you have enough VRAM.