Speech-to-Text

Configure Speech-to-Text settings in MultitaskAI for voice input and audio file transcription

MultitaskAI provides powerful speech-to-text capabilities powered by OpenAI's Whisper model, allowing you to dictate messages and transcribe audio files directly in your browser. All processing happens locally on your device, ensuring privacy and offline functionality.

Setting Up Speech-to-Text

Navigate to https://app.multitaskai.com/settings/speech-to-text
Enable the speech-to-text feature
Choose and download your preferred Whisper model

Available Models

MultitaskAI offers two categories of models:

Regular Models

These provide the best accuracy but require more storage space:

Tiny (43.5 MB)
Tiny English (43.6 MB)
Base (81.8 MB)
Base English (81.8 MB)
Small (264 MB)
Small English (264 MB)

Quantized Models

These are optimized for efficiency with slightly reduced accuracy:

Medium English (823 MB)
Medium English Quantized (823 MB)
Large English Turbo (874 MB)
Large English (3100 MB)

Using Speech-to-Text

Once enabled and with a model downloaded, you can:

Dictate Messages: Use voice input directly in the chat input field
Transcribe Audio Files: Upload audio files (.mp3, .wav, .m4a, .ogg, .aac) for automatic transcription

Language Support

While models are available for multiple languages, English-specific models are optimized for better performance when working primarily with English content.

Speech-to-Text

Speech-to-Text

Setting Up Speech-to-Text

Available Models

Regular Models

Quantized Models

Using Speech-to-Text

Language Support

On this page