Voice

Real-time voice enables natural spoken conversations with your AI Voice Agents. It processes audio input through Speech-to-Text (STT), Voice Activity Detection (VAD), and turn detection.

What is Voice Streaming?

Voice streaming enables your AI Voice Agent to:

Receive voice input - Process audio in real-time
Convert speech to text - Use STT to transcribe audio
Detect speech activity - Use VAD to know when users are speaking
Detect conversation turns - Know when users have finished speaking
Respond naturally - Enable natural turn-taking in conversations

Core Components

Speech-to-Text (STT)

Convert speech to text in real-time

Voice Activity Detection (VAD)

Detect when users are speaking

Turn Detection

Determine when users finish speaking

Audio Streaming

Stream audio in real-time

Audio Pipeline

The complete audio processing flow:

Audio Input
    ↓
VAD (Voice Activity Detection)
    ↓ (detects speech start/end)
STT (Speech-to-Text)
    ↓ (converts audio to text)
Turn Detection
    ↓ (determines end of turn)
AI Agent (LLM)
    ↓ (processes text and generates response)
Response to User

Quick Example

from kuralit.server.agent_session import AgentSession

# Voice-enabled agent
agent = AgentSession(
    stt="deepgram/nova-2:en-US",        # Speech-to-Text
    vad="silero/v3",                     # Voice Activity Detection
    turn_detection="multilingual/v1",     # Turn Detection
    llm="gemini/gemini-2.0-flash-001",  # AI Agent
)

Next Steps

STT → - Speech-to-Text
VAD → - Voice Activity Detection
Turn Detection → - Turn detection
Audio Streaming → - Real-time streaming
Integrations → - Choose providers

Get Started

Basics

Additional Features

Integrations

SDKs

Help

What is Voice Streaming?

Core Components

Speech-to-Text (STT)

Voice Activity Detection (VAD)

Turn Detection

Audio Streaming

Audio Pipeline

Quick Example

Next Steps

Get Started

Basics

Additional Features

Integrations

SDKs

Help

​What is Voice Streaming?

​Core Components

Speech-to-Text (STT)

Voice Activity Detection (VAD)

Turn Detection

Audio Streaming

​Audio Pipeline

​Quick Example

​Next Steps

What is Voice Streaming?

Core Components

Audio Pipeline

Quick Example

Next Steps