What is STT?
Speech-to-Text (STT) is the process of converting spoken words into written text. In Kuralit, STT:- Streams audio continuously - Process audio in real-time
- Provides interim results - Show transcription as user speaks
- Delivers final transcripts - Complete utterance transcriptions
- Supports multiple languages - Various language codes
How STT Works
STT plugins process audio through this flow:Configuration
Basic Configuration
Environment Variables
Available Providers
- Deepgram - High accuracy, real-time streaming, multiple languages
- Google Cloud STT - Google ecosystem integration, high accuracy
Language Codes
Common language codes:en-US- English (United States)en-GB- English (United Kingdom)es-ES- Spanish (Spain)fr-FR- French (France)de-DE- German (Germany)
Sample Rates
STT plugins support various sample rates:- 8000 Hz - Telephone quality
- 16000 Hz - Standard quality (recommended)
- 44100 Hz - High quality
Next Steps
- STT Providers → - Choose and configure STT providers
- VAD → - Voice Activity Detection
- Turn Detection → - Turn detection

