Protocol Overview
The protocol supports:- Text messages - Send and receive text
- Audio streaming - Stream audio in real-time
- Event notifications - Connection, tool calls, errors
- Session management - Session IDs and state
Message Types
Client Messages
client_text- Text message from clientclient_audio_start- Begin audio streamingclient_audio_chunk- Audio data chunksclient_audio_end- End audio streaming
Server Messages
server_connected- Connection confirmationserver_text- Final text responseserver_partial- Streaming text responseserver_stt- Speech-to-text transcriptionserver_tool_call- Tool execution notificationserver_tool_result- Tool execution resultserver_error- Error messages
Connection Flow
Message Format
All messages follow JSON format:Audio Streaming
Audio is streamed via WebSocket:- client_audio_start - Begin stream (with sample rate, encoding)
- client_audio_chunk - Audio data (base64 encoded, continuous)
- client_audio_end - End stream (optional final chunk)
Next Steps
- Streaming → - Real-time streaming
- Error Handling → - Error handling
- Connection Management → - Connection management

