Whisper Models

VoicePad uses faster-whisper, a CTranslate2-optimised implementation of OpenAI Whisper.

Default Model

The default model is turbo, a fine-tuned version of large-v3 that is significantly faster while maintaining high accuracy. It is the recommended model for most users with an NVIDIA GPU.

Model Comparison

Model	Parameters	VRAM	Speed	Accuracy	Language
`tiny`	39 M	< 1 GB	Very fast	Low	Multilingual
`base`	74 M	< 1 GB	Fast	Fair	Multilingual
`small`	244 M	~1 GB	Moderate	Good	Multilingual
`medium`	769 M	~2 GB	Slow	Very good	Multilingual
`large-v2`	1.5 B	~5 GB	Very slow	Excellent	Multilingual
`large-v3`	1.5 B	~5 GB	Very slow	Excellent	Multilingual
`turbo`	809 M	~3 GB	Moderate	Excellent	Multilingual
`distil-small.en`	134 M	< 1 GB	Fast	Good	English only
`distil-medium.en`	394 M	~1 GB	Moderate	Very good	English only
`distil-large-v2`	756 M	~3 GB	Slow	Excellent	English only
`distil-large-v3`	756 M	~3 GB	Slow	Excellent	English only

Which Model Should I Use?

Your Hardware	Recommended Model
NVIDIA GPU, 4 GB VRAM	`turbo`
NVIDIA GPU, 6-8 GB VRAM	`turbo` or `large-v3`
NVIDIA GPU, < 4 GB VRAM	`small` or `distil-small.en`
CPU only	`small` or `base`

Changing the Model

Open the Settings tab in VoicePad, select a model from the dropdown, and press Save. VoicePad reloads the model immediately.

The first time you use a new model, it downloads from HuggingFace. Download sizes range from ~75 MB (tiny) to ~3 GB (large-v3). After the first download, the model is cached locally and no network access is required.