Bidirectional realtime voice conversations with AI agents.
assistant-ui supports realtime bidirectional voice via the RealtimeVoiceAdapter interface. This enables live voice conversations where the user speaks into their microphone and the AI agent responds with audio, with transcripts appearing in the thread in real time.
How It Works
Unlike Speech Synthesis (text-to-speech) and Dictation (speech-to-text), the voice adapter handles both directions simultaneously — the user's microphone audio is streamed to the agent, and the agent's audio response is played back, all while transcripts are appended to the message thread.
| Feature | Adapter | Direction |
|---|---|---|
| Speech Synthesis | SpeechSynthesisAdapter | Text → Audio (one message at a time) |
| Dictation | DictationAdapter | Audio → Text (into composer) |
| Realtime Voice | RealtimeVoiceAdapter | Audio ↔ Audio (bidirectional, live) |
Configuration
Pass a RealtimeVoiceAdapter implementation to the runtime via adapters.voice:
const runtime = useChatRuntime({
adapters: {
voice: new MyVoiceAdapter({ /* ... */ }),
},
});When a voice adapter is provided, capabilities.voice is automatically set to true.
Hooks
useVoiceState
Returns the current voice session state, or undefined when no session is active.
import { useVoiceState, useVoiceVolume } from "@assistant-ui/react";
const voiceState = useVoiceState();
// voiceState?.status.type — "starting" | "running" | "ended"
// voiceState?.isMuted — boolean
// voiceState?.mode — "listening" | "speaking"
const volume = useVoiceVolume();
// volume — number (0–1, real-time audio level via separate subscription)useVoiceControls
Returns methods to control the voice session.
import { useVoiceControls } from "@assistant-ui/react";
const { connect, disconnect, mute, unmute } = useVoiceControls();UI Example
import { useVoiceState, useVoiceControls } from "@assistant-ui/react";
import { PhoneIcon, PhoneOffIcon, MicIcon, MicOffIcon } from "lucide-react";
function VoiceControls() {
const voiceState = useVoiceState();
const { connect, disconnect, mute, unmute } = useVoiceControls();
const isRunning = voiceState?.status.type === "running";
const isStarting = voiceState?.status.type === "starting";
const isMuted = voiceState?.isMuted ?? false;
if (!isRunning && !isStarting) {
return (
<button onClick={() => connect()}>
<PhoneIcon /> Connect
</button>
);
}
return (
<>
<button onClick={() => (isMuted ? unmute() : mute())} disabled={!isRunning}>
{isMuted ? <MicOffIcon /> : <MicIcon />}
{isMuted ? "Unmute" : "Mute"}
</button>
<button onClick={() => disconnect()}>
<PhoneOffIcon /> Disconnect
</button>
</>
);
}Custom Adapters
Implement the RealtimeVoiceAdapter interface to integrate with any voice provider.
RealtimeVoiceAdapter Interface
import type { RealtimeVoiceAdapter } from "@assistant-ui/react";
class MyVoiceAdapter implements RealtimeVoiceAdapter {
connect(options: {
abortSignal?: AbortSignal;
}): RealtimeVoiceAdapter.Session {
// Establish connection to your voice service
return {
get status() { /* ... */ },
get isMuted() { /* ... */ },
disconnect: () => { /* ... */ },
mute: () => { /* ... */ },
unmute: () => { /* ... */ },
onStatusChange: (callback) => {
// Status: { type: "starting" } → { type: "running" } → { type: "ended", reason }
return () => {}; // Return unsubscribe
},
onTranscript: (callback) => {
// callback({ role: "user" | "assistant", text: "...", isFinal: true })
// Transcripts are automatically appended as messages in the thread.
return () => {};
},
// Report who is speaking (drives the VoiceOrb speaking animation)
onModeChange: (callback) => {
// callback("listening") — user's turn
// callback("speaking") — agent's turn
return () => {};
},
// Report real-time audio level (0–1) for visual feedback
onVolumeChange: (callback) => {
// callback(0.72) — drives VoiceOrb amplitude and waveform bar heights
return () => {};
},
};
}
}Session Lifecycle
The session status follows the same pattern as other adapters:
starting → running → endedThe ended status includes a reason:
"finished"— session ended normally"cancelled"— session was cancelled by the user"error"— session ended due to an error (includeserrorfield)
Mode and Volume
All adapters must implement onModeChange and onVolumeChange. If your provider doesn't support these, return a no-op unsubscribe:
onModeChange— Reports"listening"(user's turn) or"speaking"(agent's turn). TheVoiceOrbswitches to the active speaking animation.onVolumeChange— Reports a real-time audio level (0–1). TheVoiceOrbmodulates its amplitude and glow, and waveform bars scale to match.
When using createVoiceSession, these are handled automatically — call session.emitMode() and session.emitVolume() when your provider delivers data.
Transcript Handling
Transcripts emitted via onTranscript are automatically appended to the message thread:
- User transcripts (
role: "user",isFinal: true) are appended as user messages. - Assistant transcripts (
role: "assistant") are streamed into an assistant message. The message shows a "running" status untilisFinal: trueis received.
Example: ElevenLabs Conversational AI
ElevenLabs Conversational AI provides realtime voice agents via WebRTC.
Install Dependencies
npm install @elevenlabs/clientAdapter
import type { RealtimeVoiceAdapter, Unsubscribe } from "@assistant-ui/react";
import { VoiceConversation } from "@elevenlabs/client";
export class ElevenLabsVoiceAdapter implements RealtimeVoiceAdapter {
private _agentId: string;
constructor(options: { agentId: string }) {
this._agentId = options.agentId;
}
connect(options: {
abortSignal?: AbortSignal;
}): RealtimeVoiceAdapter.Session {
const statusCallbacks = new Set<(s: RealtimeVoiceAdapter.Status) => void>();
const transcriptCallbacks = new Set<(t: RealtimeVoiceAdapter.TranscriptItem) => void>();
const modeCallbacks = new Set<(m: RealtimeVoiceAdapter.Mode) => void>();
const volumeCallbacks = new Set<(v: number) => void>();
let currentStatus: RealtimeVoiceAdapter.Status = { type: "starting" };
let isMuted = false;
let conversation: VoiceConversation | null = null;
let disposed = false;
const updateStatus = (status: RealtimeVoiceAdapter.Status) => {
if (disposed) return;
currentStatus = status;
for (const cb of statusCallbacks) cb(status);
};
const cleanup = () => {
disposed = true;
conversation = null;
statusCallbacks.clear();
transcriptCallbacks.clear();
modeCallbacks.clear();
volumeCallbacks.clear();
};
const session: RealtimeVoiceAdapter.Session = {
get status() { return currentStatus; },
get isMuted() { return isMuted; },
disconnect: () => { conversation?.endSession(); cleanup(); },
mute: () => { conversation?.setMicMuted(true); isMuted = true; },
unmute: () => { conversation?.setMicMuted(false); isMuted = false; },
onStatusChange: (cb): Unsubscribe => {
statusCallbacks.add(cb);
return () => statusCallbacks.delete(cb);
},
onTranscript: (cb): Unsubscribe => {
transcriptCallbacks.add(cb);
return () => transcriptCallbacks.delete(cb);
},
onModeChange: (cb): Unsubscribe => {
modeCallbacks.add(cb);
return () => modeCallbacks.delete(cb);
},
onVolumeChange: (cb): Unsubscribe => {
volumeCallbacks.add(cb);
return () => volumeCallbacks.delete(cb);
},
};
if (options.abortSignal) {
options.abortSignal.addEventListener("abort", () => {
conversation?.endSession(); cleanup();
}, { once: true });
}
const doConnect = async () => {
if (disposed) return;
try {
conversation = await VoiceConversation.startSession({
agentId: this._agentId,
onConnect: () => updateStatus({ type: "running" }),
onDisconnect: () => { updateStatus({ type: "ended", reason: "finished" }); cleanup(); },
onError: (msg) => { updateStatus({ type: "ended", reason: "error", error: new Error(msg) }); cleanup(); },
onModeChange: ({ mode }) => {
if (disposed) return;
for (const cb of modeCallbacks) cb(mode === "speaking" ? "speaking" : "listening");
},
onMessage: (msg) => {
if (disposed) return;
for (const cb of transcriptCallbacks) {
cb({ role: msg.role === "user" ? "user" : "assistant", text: msg.message, isFinal: true });
}
},
});
} catch (error) {
updateStatus({ type: "ended", reason: "error", error }); cleanup();
}
};
doConnect();
return session;
}
}Usage
import { ElevenLabsVoiceAdapter } from "@/lib/elevenlabs-voice-adapter";
const runtime = useChatRuntime({
adapters: {
voice: new ElevenLabsVoiceAdapter({
agentId: process.env.NEXT_PUBLIC_ELEVENLABS_AGENT_ID!,
}),
},
});Example: LiveKit
LiveKit provides realtime voice via WebRTC rooms with transcription support.
Install Dependencies
npm install livekit-clientUsage
import { LiveKitVoiceAdapter } from "@/lib/livekit-voice-adapter";
const runtime = useChatRuntime({
adapters: {
voice: new LiveKitVoiceAdapter({
url: process.env.NEXT_PUBLIC_LIVEKIT_URL!,
token: async () => {
const res = await fetch("/api/livekit-token", { method: "POST" });
const { token } = await res.json();
return token;
},
}),
},
});See the examples/with-livekit directory in the repository for a complete implementation including the adapter and token endpoint.