Realtime Voice

Bidirectional realtime voice conversations with AI agents.

assistant-ui supports realtime bidirectional voice via the RealtimeVoiceAdapter interface. This enables live voice conversations where the user speaks into their microphone and the AI agent responds with audio, with transcripts appearing in the thread in real time.

idle

How It Works

Unlike Speech Synthesis (text-to-speech) and Dictation (speech-to-text), the voice adapter handles both directions simultaneously — the user's microphone audio is streamed to the agent, and the agent's audio response is played back, all while transcripts are appended to the message thread.

FeatureAdapterDirection
Speech SynthesisSpeechSynthesisAdapterText → Audio (one message at a time)
DictationDictationAdapterAudio → Text (into composer)
Realtime VoiceRealtimeVoiceAdapterAudio ↔ Audio (bidirectional, live)

Configuration

Pass a RealtimeVoiceAdapter implementation to the runtime via adapters.voice:

const runtime = useChatRuntime({
  adapters: {
    voice: new MyVoiceAdapter({ /* ... */ }),
  },
});

When a voice adapter is provided, capabilities.voice is automatically set to true.

Hooks

useVoiceState

Returns the current voice session state, or undefined when no session is active.

import { useVoiceState, useVoiceVolume } from "@assistant-ui/react";

const voiceState = useVoiceState();
// voiceState?.status.type — "starting" | "running" | "ended"
// voiceState?.isMuted — boolean
// voiceState?.mode — "listening" | "speaking"

const volume = useVoiceVolume();
// volume — number (0–1, real-time audio level via separate subscription)

useVoiceControls

Returns methods to control the voice session.

import { useVoiceControls } from "@assistant-ui/react";

const { connect, disconnect, mute, unmute } = useVoiceControls();

UI Example

import { useVoiceState, useVoiceControls } from "@assistant-ui/react";
import { PhoneIcon, PhoneOffIcon, MicIcon, MicOffIcon } from "lucide-react";

function VoiceControls() {
  const voiceState = useVoiceState();
  const { connect, disconnect, mute, unmute } = useVoiceControls();

  const isRunning = voiceState?.status.type === "running";
  const isStarting = voiceState?.status.type === "starting";
  const isMuted = voiceState?.isMuted ?? false;

  if (!isRunning && !isStarting) {
    return (
      <button onClick={() => connect()}>
        <PhoneIcon /> Connect
      </button>
    );
  }

  return (
    <>
      <button onClick={() => (isMuted ? unmute() : mute())} disabled={!isRunning}>
        {isMuted ? <MicOffIcon /> : <MicIcon />}
        {isMuted ? "Unmute" : "Mute"}
      </button>
      <button onClick={() => disconnect()}>
        <PhoneOffIcon /> Disconnect
      </button>
    </>
  );
}

Custom Adapters

Implement the RealtimeVoiceAdapter interface to integrate with any voice provider.

RealtimeVoiceAdapter Interface

import type { RealtimeVoiceAdapter } from "@assistant-ui/react";

class MyVoiceAdapter implements RealtimeVoiceAdapter {
  connect(options: {
    abortSignal?: AbortSignal;
  }): RealtimeVoiceAdapter.Session {
    // Establish connection to your voice service
    return {
      get status() { /* ... */ },
      get isMuted() { /* ... */ },

      disconnect: () => { /* ... */ },
      mute: () => { /* ... */ },
      unmute: () => { /* ... */ },

      onStatusChange: (callback) => {
        // Status: { type: "starting" } → { type: "running" } → { type: "ended", reason }
        return () => {}; // Return unsubscribe
      },

      onTranscript: (callback) => {
        // callback({ role: "user" | "assistant", text: "...", isFinal: true })
        // Transcripts are automatically appended as messages in the thread.
        return () => {};
      },

      // Report who is speaking (drives the VoiceOrb speaking animation)
      onModeChange: (callback) => {
        // callback("listening") — user's turn
        // callback("speaking") — agent's turn
        return () => {};
      },

      // Report real-time audio level (0–1) for visual feedback
      onVolumeChange: (callback) => {
        // callback(0.72) — drives VoiceOrb amplitude and waveform bar heights
        return () => {};
      },
    };
  }
}

Session Lifecycle

The session status follows the same pattern as other adapters:

starting → running → ended

The ended status includes a reason:

  • "finished" — session ended normally
  • "cancelled" — session was cancelled by the user
  • "error" — session ended due to an error (includes error field)

Mode and Volume

All adapters must implement onModeChange and onVolumeChange. If your provider doesn't support these, return a no-op unsubscribe:

  • onModeChange — Reports "listening" (user's turn) or "speaking" (agent's turn). The VoiceOrb switches to the active speaking animation.
  • onVolumeChange — Reports a real-time audio level (01). The VoiceOrb modulates its amplitude and glow, and waveform bars scale to match.

When using createVoiceSession, these are handled automatically — call session.emitMode() and session.emitVolume() when your provider delivers data.

Transcript Handling

Transcripts emitted via onTranscript are automatically appended to the message thread:

  • User transcripts (role: "user", isFinal: true) are appended as user messages.
  • Assistant transcripts (role: "assistant") are streamed into an assistant message. The message shows a "running" status until isFinal: true is received.

Example: ElevenLabs Conversational AI

ElevenLabs Conversational AI provides realtime voice agents via WebRTC.

Install Dependencies

npm install @elevenlabs/client

Adapter

lib/elevenlabs-voice-adapter.ts
import type { RealtimeVoiceAdapter, Unsubscribe } from "@assistant-ui/react";
import { VoiceConversation } from "@elevenlabs/client";

export class ElevenLabsVoiceAdapter implements RealtimeVoiceAdapter {
  private _agentId: string;

  constructor(options: { agentId: string }) {
    this._agentId = options.agentId;
  }

  connect(options: {
    abortSignal?: AbortSignal;
  }): RealtimeVoiceAdapter.Session {
    const statusCallbacks = new Set<(s: RealtimeVoiceAdapter.Status) => void>();
    const transcriptCallbacks = new Set<(t: RealtimeVoiceAdapter.TranscriptItem) => void>();
    const modeCallbacks = new Set<(m: RealtimeVoiceAdapter.Mode) => void>();
    const volumeCallbacks = new Set<(v: number) => void>();

    let currentStatus: RealtimeVoiceAdapter.Status = { type: "starting" };
    let isMuted = false;
    let conversation: VoiceConversation | null = null;
    let disposed = false;

    const updateStatus = (status: RealtimeVoiceAdapter.Status) => {
      if (disposed) return;
      currentStatus = status;
      for (const cb of statusCallbacks) cb(status);
    };

    const cleanup = () => {
      disposed = true;
      conversation = null;
      statusCallbacks.clear();
      transcriptCallbacks.clear();
      modeCallbacks.clear();
      volumeCallbacks.clear();
    };

    const session: RealtimeVoiceAdapter.Session = {
      get status() { return currentStatus; },
      get isMuted() { return isMuted; },
      disconnect: () => { conversation?.endSession(); cleanup(); },
      mute: () => { conversation?.setMicMuted(true); isMuted = true; },
      unmute: () => { conversation?.setMicMuted(false); isMuted = false; },
      onStatusChange: (cb): Unsubscribe => {
        statusCallbacks.add(cb);
        return () => statusCallbacks.delete(cb);
      },
      onTranscript: (cb): Unsubscribe => {
        transcriptCallbacks.add(cb);
        return () => transcriptCallbacks.delete(cb);
      },
      onModeChange: (cb): Unsubscribe => {
        modeCallbacks.add(cb);
        return () => modeCallbacks.delete(cb);
      },
      onVolumeChange: (cb): Unsubscribe => {
        volumeCallbacks.add(cb);
        return () => volumeCallbacks.delete(cb);
      },
    };

    if (options.abortSignal) {
      options.abortSignal.addEventListener("abort", () => {
        conversation?.endSession(); cleanup();
      }, { once: true });
    }

    const doConnect = async () => {
      if (disposed) return;
      try {
        conversation = await VoiceConversation.startSession({
          agentId: this._agentId,
          onConnect: () => updateStatus({ type: "running" }),
          onDisconnect: () => { updateStatus({ type: "ended", reason: "finished" }); cleanup(); },
          onError: (msg) => { updateStatus({ type: "ended", reason: "error", error: new Error(msg) }); cleanup(); },
          onModeChange: ({ mode }) => {
            if (disposed) return;
            for (const cb of modeCallbacks) cb(mode === "speaking" ? "speaking" : "listening");
          },
          onMessage: (msg) => {
            if (disposed) return;
            for (const cb of transcriptCallbacks) {
              cb({ role: msg.role === "user" ? "user" : "assistant", text: msg.message, isFinal: true });
            }
          },
        });
      } catch (error) {
        updateStatus({ type: "ended", reason: "error", error }); cleanup();
      }
    };

    doConnect();
    return session;
  }
}

Usage

import { ElevenLabsVoiceAdapter } from "@/lib/elevenlabs-voice-adapter";

const runtime = useChatRuntime({
  adapters: {
    voice: new ElevenLabsVoiceAdapter({
      agentId: process.env.NEXT_PUBLIC_ELEVENLABS_AGENT_ID!,
    }),
  },
});

Example: LiveKit

LiveKit provides realtime voice via WebRTC rooms with transcription support.

Install Dependencies

npm install livekit-client

Usage

import { LiveKitVoiceAdapter } from "@/lib/livekit-voice-adapter";

const runtime = useChatRuntime({
  adapters: {
    voice: new LiveKitVoiceAdapter({
      url: process.env.NEXT_PUBLIC_LIVEKIT_URL!,
      token: async () => {
        const res = await fetch("/api/livekit-token", { method: "POST" });
        const { token } = await res.json();
        return token;
      },
    }),
  },
});

See the examples/with-livekit directory in the repository for a complete implementation including the adapter and token endpoint.