Kokoro

Kokoro 82M TTS

High quality 82M parameter TTS model. Excellent speech synthesis with multiple voice options. 86MB download.

0.082B parameterskokoroapache-2.00.58GB - 0.58GB VRAM

About This Model

Kokoro 82M TTS is a compact text-to-speech model designed to convert written text into natural-sounding speech. With just 82 million parameters, this model is remarkably lightweight yet delivers surprisingly high-quality audio outputs, making it an excellent choice for applications where resource constraints are a concern. The model is built on the Kokoro architecture and is licensed under the Apache 2.0 license, ensuring it is freely available for both personal and commercial projects. Despite its small size, Kokoro 82M TTS punches well above its weight in terms of performance, offering clear and intelligible speech that can be fine-tuned for various accents and intonations.

In comparison to other models in its size class, Kokoro 82M TTS stands out for its efficiency and low memory footprint. It requires only 0.6 GB of VRAM, which means it can run smoothly on a wide range of devices, from low-end laptops to more powerful desktops. This makes it an ideal choice for developers and hobbyists who need a reliable TTS solution without the need for high-end hardware. The availability of ONNX-Q8F16 quantization further enhances its efficiency, reducing computational requirements while maintaining acceptable audio quality. Users looking for a balance between performance and resource usage, particularly in embedded systems or edge devices, will find Kokoro 82M TTS to be a valuable asset.

Check Your Hardware

See which quantizations of Kokoro 82M TTS your hardware can run.

Quantization Options

QuantizationBitsFile SizeVRAM NeededRAM NeededQuality
ONNX-Q8F1680.08 GB0.58 GB1.08 GB
95%

Frequently Asked Questions

How much VRAM do I need to run Kokoro 82M TTS?

Kokoro 82M TTS requires 0.58GB VRAM minimum with ONNX-Q8F16 quantization. For full precision, you need 0.58GB VRAM.

What is the best quantization for Kokoro 82M TTS?

Q4_K_M offers the best balance of quality and VRAM usage. Q8_0 is near-lossless if you have enough VRAM.