Speech-to-Text
Configure Speech-to-Text settings in MultitaskAI for voice input and audio file transcription
Speech-to-Text
MultitaskAI provides powerful speech-to-text capabilities powered by OpenAI's Whisper model, allowing you to dictate messages and transcribe audio files directly in your browser. All processing happens locally on your device, ensuring privacy and offline functionality.
Setting Up Speech-to-Text
- Navigate to https://app.multitaskai.com/settings/speech-to-text
- Enable the speech-to-text feature
- Choose and download your preferred Whisper model
Available Models
MultitaskAI offers two categories of models:
Regular Models
These provide the best accuracy but require more storage space:
- Tiny (43.5 MB)
- Tiny English (43.6 MB)
- Base (81.8 MB)
- Base English (81.8 MB)
- Small (264 MB)
- Small English (264 MB)
Quantized Models
These are optimized for efficiency with slightly reduced accuracy:
- Medium English (823 MB)
- Medium English Quantized (823 MB)
- Large English Turbo (874 MB)
- Large English (3100 MB)
Using Speech-to-Text
Once enabled and with a model downloaded, you can:
- Dictate Messages: Use voice input directly in the chat input field
- Transcribe Audio Files: Upload audio files (
.mp3
,.wav
,.m4a
,.ogg
,.aac
) for automatic transcription
Language Support
While models are available for multiple languages, English-specific models are optimized for better performance when working primarily with English content.