HuggingFace

SmolLM2 135M

Tiny 135M model. Default LLM - guaranteed to run on any iPhone. Only 145MB download. Perfect for quick experiments.

0.135B parameterssmollmapache-2.08K context0.64GB - 0.75GB VRAM

About This Model

SmolLM2 135M is a lightweight language model developed by HuggingFace, designed for efficient local deployment on devices with limited resources. With just 135 million parameters, this model offers a balance between performance and resource consumption, making it particularly suitable for text generation tasks that require quick responses without heavy computational overhead. It excels in generating coherent and contextually relevant text, thanks to its impressive context length of 8192 tokens, which allows it to maintain a broader understanding of the input text compared to many smaller models.

Despite its relatively small size, SmolLM2 135M holds its own against larger models in its class, demonstrating good efficiency and effectiveness. It punches above its weight in terms of text quality and coherence, making it a solid choice for applications where real-time performance and low resource usage are crucial. The model supports quantization options like Q8_0 and FP16, further enhancing its efficiency and reducing memory requirements. Users with devices equipped with as little as 0.6–0.8 GB of VRAM can comfortably run this model, making it an excellent option for developers, hobbyists, and businesses looking to deploy text generation capabilities on edge devices, laptops, or other resource-constrained environments.

Check Your Hardware

See which quantizations of SmolLM2 135M your hardware can run.

Quantization Options

QuantizationBitsFile SizeVRAM NeededRAM NeededQuality
Q8_080.135 GB0.64 GB1.14 GB
98%
FP16160.252 GB0.75 GB1.25 GB
100%

See It In Action

Real model outputs generated via RunThisModel.com — watch responses stream in real time.

Llama 3.3 70B responding...

Outputs generated by real AI models via RunThisModel.com. Generation speed shown is from cloud inference. Local speeds vary by hardware — check your device.

Frequently Asked Questions

How much VRAM do I need to run SmolLM2 135M?

SmolLM2 135M requires 0.64GB VRAM minimum with Q8_0 quantization. For full precision, you need 0.75GB VRAM.

What is the best quantization for SmolLM2 135M?

Q4_K_M offers the best balance of quality and VRAM usage. Q8_0 is near-lossless if you have enough VRAM.