Realtime Voice

Bidirectional realtime voice conversations with AI agents.

assistant-ui supports realtime bidirectional voice via the RealtimeVoiceAdapter interface. This enables live voice conversations where the user speaks into their microphone and the AI agent responds with audio, with transcripts appearing in the thread in real time.

idle

How It Works

Unlike Speech Synthesis (text-to-speech) and Dictation (speech-to-text), the voice adapter handles both directions simultaneously — the user's microphone audio is streamed to the agent, and the agent's audio response is played back, all while transcripts are appended to the message thread.

FeatureAdapterDirection
Speech SynthesisSpeechSynthesisAdapterText → Audio (one message at a time)
DictationDictationAdapterAudio → Text (into composer)
Realtime VoiceRealtimeVoiceAdapterAudio ↔ Audio (bidirectional, live)

Configuration

Pass a RealtimeVoiceAdapter implementation to the runtime via adapters.voice:

const runtime = useChatRuntime({
  adapters: {
    voice: new MyVoiceAdapter({ /* ... */ }),
  },
});

When a voice adapter is provided, capabilities.voice is automatically set to true.

Hooks

useVoiceState

Returns the current voice session state, or undefined when no session is active.

import { useVoiceState, useVoiceVolume } from "@assistant-ui/react";

const voiceState = useVoiceState();
// voiceState?.status.type — "starting" | "running" | "ended"
// voiceState?.isMuted — boolean
// voiceState?.mode — "listening" | "speaking"

const volume = useVoiceVolume();
// volume — number (0–1, real-time audio level via separate subscription)

useVoiceControls

Returns methods to control the voice session.

import { useVoiceControls } from "@assistant-ui/react";

const { connect, disconnect, mute, unmute } = useVoiceControls();

UI Example

import { useVoiceState, useVoiceControls } from "@assistant-ui/react";
import { PhoneIcon, PhoneOffIcon, MicIcon, MicOffIcon } from "lucide-react";

function VoiceControls() {
  const voiceState = useVoiceState();
  const { connect, disconnect, mute, unmute } = useVoiceControls();

  const isRunning = voiceState?.status.type === "running";
  const isStarting = voiceState?.status.type === "starting";
  const isMuted = voiceState?.isMuted ?? false;

  if (!isRunning && !isStarting) {
    return (
      <button onClick={() => connect()}>
        <PhoneIcon /> Connect
      </button>
    );
  }

  return (
    <>
      <button onClick={() => (isMuted ? unmute() : mute())} disabled={!isRunning}>
        {isMuted ? <MicOffIcon /> : <MicIcon />}
        {isMuted ? "Unmute" : "Mute"}
      </button>
      <button onClick={() => disconnect()}>
        <PhoneOffIcon /> Disconnect
      </button>
    </>
  );
}

Custom Adapters

Implement the RealtimeVoiceAdapter interface to integrate with any voice provider.

RealtimeVoiceAdapter Interface

import type { RealtimeVoiceAdapter } from "@assistant-ui/react";

class MyVoiceAdapter implements RealtimeVoiceAdapter {
  connect(options: {
    abortSignal?: AbortSignal;
  }): RealtimeVoiceAdapter.Session {
    // Establish connection to your voice service
    return {
      get status() { /* ... */ },
      get isMuted() { /* ... */ },

      disconnect: () => { /* ... */ },
      mute: () => { /* ... */ },
      unmute: () => { /* ... */ },

      onStatusChange: (callback) => {
        // Status: { type: "starting" } → { type: "running" } → { type: "ended", reason }
        return () => {}; // Return unsubscribe
      },

      onTranscript: (callback) => {
        // callback({ role: "user" | "assistant", text: "...", isFinal: true })
        // Transcripts are automatically appended as messages in the thread.
        return () => {};
      },

      // Report who is speaking (drives the VoiceOrb speaking animation)
      onModeChange: (callback) => {
        // callback("listening") — user's turn
        // callback("speaking") — agent's turn
        return () => {};
      },

      // Report real-time audio level (0–1) for visual feedback
      onVolumeChange: (callback) => {
        // callback(0.72) — drives VoiceOrb amplitude and waveform bar heights
        return () => {};
      },
    };
  }
}

Session Lifecycle

The session status follows the same pattern as other adapters:

starting → running → ended

The ended status includes a reason:

  • "finished" — session ended normally
  • "cancelled" — session was cancelled by the user
  • "error" — session ended due to an error (includes error field)

Mode and Volume

All adapters must implement onModeChange and onVolumeChange. If your provider doesn't support these, return a no-op unsubscribe:

  • onModeChange — Reports "listening" (user's turn) or "speaking" (agent's turn). The VoiceOrb switches to the active speaking animation.
  • onVolumeChange — Reports a real-time audio level (01). The VoiceOrb modulates its amplitude and glow, and waveform bars scale to match.

When using createVoiceSession, these are handled automatically — call session.emitMode() and session.emitVolume() when your provider delivers data.

Transcript Handling

Transcripts emitted via onTranscript are automatically appended to the message thread:

  • User transcripts (role: "user", isFinal: true) are appended as user messages.
  • Assistant transcripts (role: "assistant") are streamed into an assistant message. The message shows a "running" status until isFinal: true is received.

Example: ElevenLabs Conversational AI

ElevenLabs Conversational AI provides realtime voice agents via WebRTC.

Install Dependencies

npm install @elevenlabs/client

Adapter

lib/elevenlabs-voice-adapter.ts
import type { RealtimeVoiceAdapter, Unsubscribe } from "@assistant-ui/react";
import { VoiceConversation } from "@elevenlabs/client";

export class ElevenLabsVoiceAdapter implements RealtimeVoiceAdapter {
  private _agentId: string;

  constructor(options: { agentId: string }) {
    this._agentId = options.agentId;
  }

  connect(options: {
    abortSignal?: AbortSignal;
  }): RealtimeVoiceAdapter.Session {
    const statusCallbacks = new Set<(s: RealtimeVoiceAdapter.Status) => void>();
    const transcriptCallbacks = new Set<(t: RealtimeVoiceAdapter.TranscriptItem) => void>();
    const modeCallbacks = new Set<(m: RealtimeVoiceAdapter.Mode) => void>();
    const volumeCallbacks = new Set<(v: number) => void>();

    let currentStatus: RealtimeVoiceAdapter.Status = { type: "starting" };
    let isMuted = false;
    let conversation: VoiceConversation | null = null;
    let disposed = false;

    const updateStatus = (status: RealtimeVoiceAdapter.Status) => {
      if (disposed) return;
      currentStatus = status;
      for (const cb of statusCallbacks) cb(status);
    };

    const cleanup = () => {
      disposed = true;
      conversation = null;
      statusCallbacks.clear();
      transcriptCallbacks.clear();
      modeCallbacks.clear();
      volumeCallbacks.clear();
    };

    const session: RealtimeVoiceAdapter.Session = {
      get status() { return currentStatus; },
      get isMuted() { return isMuted; },
      disconnect: () => { conversation?.endSession(); cleanup(); },
      mute: () => { conversation?.setMicMuted(true); isMuted = true; },
      unmute: () => { conversation?.setMicMuted(false); isMuted = false; },
      onStatusChange: (cb): Unsubscribe => {
        statusCallbacks.add(cb);
        return () => statusCallbacks.delete(cb);
      },
      onTranscript: (cb): Unsubscribe => {
        transcriptCallbacks.add(cb);
        return () => transcriptCallbacks.delete(cb);
      },
      onModeChange: (cb): Unsubscribe => {
        modeCallbacks.add(cb);
        return () => modeCallbacks.delete(cb);
      },
      onVolumeChange: (cb): Unsubscribe => {
        volumeCallbacks.add(cb);
        return () => volumeCallbacks.delete(cb);
      },
    };

    if (options.abortSignal) {
      options.abortSignal.addEventListener("abort", () => {
        conversation?.endSession(); cleanup();
      }, { once: true });
    }

    const doConnect = async () => {
      if (disposed) return;
      try {
        conversation = await VoiceConversation.startSession({
          agentId: this._agentId,
          onConnect: () => updateStatus({ type: "running" }),
          onDisconnect: () => { updateStatus({ type: "ended", reason: "finished" }); cleanup(); },
          onError: (msg) => { updateStatus({ type: "ended", reason: "error", error: new Error(msg) }); cleanup(); },
          onModeChange: ({ mode }) => {
            if (disposed) return;
            for (const cb of modeCallbacks) cb(mode === "speaking" ? "speaking" : "listening");
          },
          onMessage: (msg) => {
            if (disposed) return;
            for (const cb of transcriptCallbacks) {
              cb({ role: msg.role === "user" ? "user" : "assistant", text: msg.message, isFinal: true });
            }
          },
        });
      } catch (error) {
        updateStatus({ type: "ended", reason: "error", error }); cleanup();
      }
    };

    doConnect();
    return session;
  }
}

Usage

import { ElevenLabsVoiceAdapter } from "@/lib/elevenlabs-voice-adapter";

const runtime = useChatRuntime({
  adapters: {
    voice: new ElevenLabsVoiceAdapter({
      agentId: process.env.NEXT_PUBLIC_ELEVENLABS_AGENT_ID!,
    }),
  },
});

Example: LiveKit

LiveKit provides realtime voice via WebRTC rooms with transcription support. Unlike fully-hosted agent services, LiveKit follows a "bring-your-own-agent" model: the browser adapter only joins a room, and you run a separate agent worker that joins the same room and handles STT, LLM, and TTS. Without an agent in the room, the client will connect successfully but have nothing to talk to.

Prerequisites

  1. A LiveKit server — create a LiveKit Cloud project (grab the URL, API Key, and API Secret from the project settings) or self-host livekit-server.
  2. An agent worker — built with the LiveKit Agents SDK (Python or Node). The worker connects to your LiveKit server and is automatically dispatched into new rooms.

Install Dependencies

npm install livekit-client livekit-server-sdk

livekit-client powers the browser adapter; livekit-server-sdk is used server-side to mint access tokens.

Usage

import { LiveKitVoiceAdapter } from "@/lib/livekit-voice-adapter";

const runtime = useChatRuntime({
  adapters: {
    voice: new LiveKitVoiceAdapter({
      url: process.env.NEXT_PUBLIC_LIVEKIT_URL!,
      token: async () => {
        const res = await fetch("/api/livekit-token", { method: "POST" });
        const { token } = await res.json();
        return token;
      },
    }),
  },
});

See the with-livekit example for a complete implementation: the browser adapter, the token endpoint, and a minimal Python agent worker using the OpenAI Realtime API.