logo

Speech-to-Text

Configure Speech-to-Text settings in MultitaskAI for voice input and audio file transcription

Speech-to-Text

MultitaskAI provides powerful speech-to-text capabilities powered by OpenAI's Whisper model, allowing you to dictate messages and transcribe audio files directly in your browser. All processing happens locally on your device, ensuring privacy and offline functionality.

Setting Up Speech-to-Text

  1. Navigate to https://app.multitaskai.com/settings/speech-to-text
  2. Enable the speech-to-text feature
  3. Choose and download your preferred Whisper model

Available Models

MultitaskAI offers two categories of models:

Regular Models

These provide the best accuracy but require more storage space:

  • Tiny (43.5 MB)
  • Tiny English (43.6 MB)
  • Base (81.8 MB)
  • Base English (81.8 MB)
  • Small (264 MB)
  • Small English (264 MB)

Quantized Models

These are optimized for efficiency with slightly reduced accuracy:

  • Medium English (823 MB)
  • Medium English Quantized (823 MB)
  • Large English Turbo (874 MB)
  • Large English (3100 MB)

Using Speech-to-Text

Once enabled and with a model downloaded, you can:

  1. Dictate Messages: Use voice input directly in the chat input field
  2. Transcribe Audio Files: Upload audio files (.mp3, .wav, .m4a, .ogg, .aac) for automatic transcription

Language Support

While models are available for multiple languages, English-specific models are optimized for better performance when working primarily with English content.

On this page