Bidirectional realtime voice conversations with AI agents.
assistant-ui supports realtime bidirectional voice via the RealtimeVoiceAdapter interface. This enables live voice conversations where the user speaks into their microphone and the AI agent responds with audio, with transcripts appearing in the thread in real time.
How It Works
Unlike Speech Synthesis (text-to-speech) and Dictation (speech-to-text), the voice adapter handles both directions simultaneously — the user's microphone audio is streamed to the agent, and the agent's audio response is played back, all while transcripts are appended to the message thread.
| Feature | Adapter | Direction |
|---|---|---|
| Speech Synthesis | SpeechSynthesisAdapter | Text → Audio (one message at a time) |
| Dictation | DictationAdapter | Audio → Text (into composer) |
| Realtime Voice | RealtimeVoiceAdapter | Audio ↔ Audio (bidirectional, live) |
Configuration
Pass a RealtimeVoiceAdapter implementation to the runtime via adapters.voice:
const runtime = useChatRuntime({
adapters: {
voice: new MyVoiceAdapter({ /* ... */ }),
},
});When a voice adapter is provided, capabilities.voice is automatically set to true.
Hooks
useVoiceState
Returns the current voice session state, or undefined when no session is active.
import { useVoiceState, useVoiceVolume } from "@assistant-ui/react";
const voiceState = useVoiceState();
// voiceState?.status.type — "starting" | "running" | "ended"
// voiceState?.isMuted — boolean
// voiceState?.mode — "listening" | "speaking"
const volume = useVoiceVolume();
// volume — number (0–1, real-time audio level via separate subscription)useVoiceControls
Returns methods to control the voice session.
import { useVoiceControls } from "@assistant-ui/react";
const { connect, disconnect, mute, unmute } = useVoiceControls();UI Example
import { useVoiceState, useVoiceControls } from "@assistant-ui/react";
import { PhoneIcon, PhoneOffIcon, MicIcon, MicOffIcon } from "lucide-react";
function VoiceControls() {
const voiceState = useVoiceState();
const { connect, disconnect, mute, unmute } = useVoiceControls();
const isRunning = voiceState?.status.type === "running";
const isStarting = voiceState?.status.type === "starting";
const isMuted = voiceState?.isMuted ?? false;
if (!isRunning && !isStarting) {
return (
<button onClick={() => connect()}>
<PhoneIcon /> Connect
</button>
);
}
return (
<>
<button onClick={() => (isMuted ? unmute() : mute())} disabled={!isRunning}>
{isMuted ? <MicOffIcon /> : <MicIcon />}
{isMuted ? "Unmute" : "Mute"}
</button>
<button onClick={() => disconnect()}>
<PhoneOffIcon /> Disconnect
</button>
</>
);
}Custom Adapters
Implement the RealtimeVoiceAdapter interface to integrate with any voice provider.
RealtimeVoiceAdapter Interface
import type { RealtimeVoiceAdapter } from "@assistant-ui/react";
class MyVoiceAdapter implements RealtimeVoiceAdapter {
connect(options: {
abortSignal?: AbortSignal;
}): RealtimeVoiceAdapter.Session {
// Establish connection to your voice service
return {
get status() { /* ... */ },
get isMuted() { /* ... */ },
disconnect: () => { /* ... */ },
mute: () => { /* ... */ },
unmute: () => { /* ... */ },
onStatusChange: (callback) => {
// Status: { type: "starting" } → { type: "running" } → { type: "ended", reason }
return () => {}; // Return unsubscribe
},
onTranscript: (callback) => {
// callback({ role: "user" | "assistant", text: "...", isFinal: true })
// Transcripts are automatically appended as messages in the thread.
return () => {};
},
// Report who is speaking (drives the VoiceOrb speaking animation)
onModeChange: (callback) => {
// callback("listening") — user's turn
// callback("speaking") — agent's turn
return () => {};
},
// Report real-time audio level (0–1) for visual feedback
onVolumeChange: (callback) => {
// callback(0.72) — drives VoiceOrb amplitude and waveform bar heights
return () => {};
},
};
}
}Session Lifecycle
The session status follows the same pattern as other adapters:
starting → running → endedThe ended status includes a reason:
"finished"— session ended normally"cancelled"— session was cancelled by the user"error"— session ended due to an error (includeserrorfield)
Mode and Volume
All adapters must implement onModeChange and onVolumeChange. If your provider doesn't support these, return a no-op unsubscribe:
onModeChange— Reports"listening"(user's turn) or"speaking"(agent's turn). TheVoiceOrbswitches to the active speaking animation.onVolumeChange— Reports a real-time audio level (0–1). TheVoiceOrbmodulates its amplitude and glow, and waveform bars scale to match.
When using createVoiceSession, these are handled automatically — call session.emitMode() and session.emitVolume() when your provider delivers data.
Transcript Handling
Transcripts emitted via onTranscript are automatically appended to the message thread:
- User transcripts (
role: "user",isFinal: true) are appended as user messages. - Assistant transcripts (
role: "assistant") are streamed into an assistant message. The message shows a "running" status untilisFinal: trueis received.
Example: ElevenLabs Conversational AI
ElevenLabs Conversational AI provides realtime voice agents via WebRTC.
Install Dependencies
npm install @elevenlabs/clientAdapter
import type { RealtimeVoiceAdapter, Unsubscribe } from "@assistant-ui/react";
import { VoiceConversation } from "@elevenlabs/client";
export class ElevenLabsVoiceAdapter implements RealtimeVoiceAdapter {
private _agentId: string;
constructor(options: { agentId: string }) {
this._agentId = options.agentId;
}
connect(options: {
abortSignal?: AbortSignal;
}): RealtimeVoiceAdapter.Session {
const statusCallbacks = new Set<(s: RealtimeVoiceAdapter.Status) => void>();
const transcriptCallbacks = new Set<(t: RealtimeVoiceAdapter.TranscriptItem) => void>();
const modeCallbacks = new Set<(m: RealtimeVoiceAdapter.Mode) => void>();
const volumeCallbacks = new Set<(v: number) => void>();
let currentStatus: RealtimeVoiceAdapter.Status = { type: "starting" };
let isMuted = false;
let conversation: VoiceConversation | null = null;
let disposed = false;
const updateStatus = (status: RealtimeVoiceAdapter.Status) => {
if (disposed) return;
currentStatus = status;
for (const cb of statusCallbacks) cb(status);
};
const cleanup = () => {
disposed = true;
conversation = null;
statusCallbacks.clear();
transcriptCallbacks.clear();
modeCallbacks.clear();
volumeCallbacks.clear();
};
const session: RealtimeVoiceAdapter.Session = {
get status() { return currentStatus; },
get isMuted() { return isMuted; },
disconnect: () => { conversation?.endSession(); cleanup(); },
mute: () => { conversation?.setMicMuted(true); isMuted = true; },
unmute: () => { conversation?.setMicMuted(false); isMuted = false; },
onStatusChange: (cb): Unsubscribe => {
statusCallbacks.add(cb);
return () => statusCallbacks.delete(cb);
},
onTranscript: (cb): Unsubscribe => {
transcriptCallbacks.add(cb);
return () => transcriptCallbacks.delete(cb);
},
onModeChange: (cb): Unsubscribe => {
modeCallbacks.add(cb);
return () => modeCallbacks.delete(cb);
},
onVolumeChange: (cb): Unsubscribe => {
volumeCallbacks.add(cb);
return () => volumeCallbacks.delete(cb);
},
};
if (options.abortSignal) {
options.abortSignal.addEventListener("abort", () => {
conversation?.endSession(); cleanup();
}, { once: true });
}
const doConnect = async () => {
if (disposed) return;
try {
conversation = await VoiceConversation.startSession({
agentId: this._agentId,
onConnect: () => updateStatus({ type: "running" }),
onDisconnect: () => { updateStatus({ type: "ended", reason: "finished" }); cleanup(); },
onError: (msg) => { updateStatus({ type: "ended", reason: "error", error: new Error(msg) }); cleanup(); },
onModeChange: ({ mode }) => {
if (disposed) return;
for (const cb of modeCallbacks) cb(mode === "speaking" ? "speaking" : "listening");
},
onMessage: (msg) => {
if (disposed) return;
for (const cb of transcriptCallbacks) {
cb({ role: msg.role === "user" ? "user" : "assistant", text: msg.message, isFinal: true });
}
},
});
} catch (error) {
updateStatus({ type: "ended", reason: "error", error }); cleanup();
}
};
doConnect();
return session;
}
}Usage
import { ElevenLabsVoiceAdapter } from "@/lib/elevenlabs-voice-adapter";
const runtime = useChatRuntime({
adapters: {
voice: new ElevenLabsVoiceAdapter({
agentId: process.env.NEXT_PUBLIC_ELEVENLABS_AGENT_ID!,
}),
},
});Example: LiveKit
LiveKit provides realtime voice via WebRTC rooms with transcription support. Unlike fully-hosted agent services, LiveKit follows a "bring-your-own-agent" model: the browser adapter only joins a room, and you run a separate agent worker that joins the same room and handles STT, LLM, and TTS. Without an agent in the room, the client will connect successfully but have nothing to talk to.
Prerequisites
- A LiveKit server — create a LiveKit Cloud project (grab the URL, API Key, and API Secret from the project settings) or self-host
livekit-server. - An agent worker — built with the LiveKit Agents SDK (Python or Node). The worker connects to your LiveKit server and is automatically dispatched into new rooms.
Install Dependencies
npm install livekit-client livekit-server-sdklivekit-client powers the browser adapter; livekit-server-sdk is used server-side to mint access tokens.
Usage
import { LiveKitVoiceAdapter } from "@/lib/livekit-voice-adapter";
const runtime = useChatRuntime({
adapters: {
voice: new LiveKitVoiceAdapter({
url: process.env.NEXT_PUBLIC_LIVEKIT_URL!,
token: async () => {
const res = await fetch("/api/livekit-token", { method: "POST" });
const { token } = await res.json();
return token;
},
}),
},
});See the with-livekit example for a complete implementation: the browser adapter, the token endpoint, and a minimal Python agent worker using the OpenAI Realtime API.