# Realtime Voice Chat
URL: /docs/guides/voice

Build bidirectional voice conversations with AI in React — realtime audio streaming, interruption handling, and visual state, integrated via assistant-ui.

> For AI agents: a documentation index is available at [llms.txt](/llms.txt). Use `.md` for canonical markdown pages; `.mdx` is kept as a backwards-compatible alias on supported URL paths.

assistant-ui supports realtime bidirectional voice via the `RealtimeVoiceAdapter` interface. This enables live voice conversations where the user speaks into their microphone and the AI agent responds with audio, with transcripts appearing in the thread in real time.

\[interactive preview omitted]

## How It Works

Unlike [Speech Synthesis](/docs/guides/speech) (text-to-speech) and [Dictation](/docs/guides/dictation) (speech-to-text), the voice adapter handles **both directions simultaneously** — the user's microphone audio is streamed to the agent, and the agent's audio response is played back, all while transcripts are appended to the message thread.

| Feature                                 | Adapter                  | Direction                            |
| --------------------------------------- | ------------------------ | ------------------------------------ |
| [Speech Synthesis](/docs/guides/speech) | `SpeechSynthesisAdapter` | Text → Audio (one message at a time) |
| [Dictation](/docs/guides/dictation)     | `DictationAdapter`       | Audio → Text (into composer)         |
| **Realtime Voice**                      | `RealtimeVoiceAdapter`   | Audio ↔ Audio (bidirectional, live)  |

## Configuration

Pass a `RealtimeVoiceAdapter` implementation to the runtime via `adapters.voice`:

```
const runtime = useChatRuntime({
  adapters: {
    voice: new MyVoiceAdapter({ /* ... */ }),
  },
});
```

When a voice adapter is provided, `capabilities.voice` is automatically set to `true`.

## Hooks

### useVoiceState

Returns the current voice session state, or `undefined` when no session is active.

```
import { useVoiceState, useVoiceVolume } from "@assistant-ui/react";

const voiceState = useVoiceState();
// voiceState?.status.type — "starting" | "running" | "ended"
// voiceState?.isMuted — boolean
// voiceState?.mode — "listening" | "speaking"

const volume = useVoiceVolume();
// volume — number (0–1, real-time audio level via separate subscription)
```

### useVoiceControls

Returns methods to control the voice session.

```
import { useVoiceControls } from "@assistant-ui/react";

const { connect, disconnect, mute, unmute } = useVoiceControls();
```

## UI Example

> [!info]
>
> For a ready-made control bar with a voice orb and call controls, see the [Voice component](/docs/ui/voice).

```
import { useVoiceState, useVoiceControls } from "@assistant-ui/react";
import { PhoneIcon, PhoneOffIcon, MicIcon, MicOffIcon } from "lucide-react";

function VoiceControls() {
  const voiceState = useVoiceState();
  const { connect, disconnect, mute, unmute } = useVoiceControls();

  const isRunning = voiceState?.status.type === "running";
  const isStarting = voiceState?.status.type === "starting";
  const isMuted = voiceState?.isMuted ?? false;

  if (!isRunning && !isStarting) {
    return (
      <button onClick={() => connect()}>
        <PhoneIcon /> Connect
      </button>
    );
  }

  return (
    <>
      <button onClick={() => (isMuted ? unmute() : mute())} disabled={!isRunning}>
        {isMuted ? <MicOffIcon /> : <MicIcon />}
        {isMuted ? "Unmute" : "Mute"}
      </button>
      <button onClick={() => disconnect()}>
        <PhoneOffIcon /> Disconnect
      </button>
    </>
  );
}
```

## Custom Adapters

Implement the `RealtimeVoiceAdapter` interface to integrate with any voice provider.

### RealtimeVoiceAdapter Interface

```
import type { RealtimeVoiceAdapter } from "@assistant-ui/react";

class MyVoiceAdapter implements RealtimeVoiceAdapter {
  connect(options: {
    abortSignal?: AbortSignal;
  }): RealtimeVoiceAdapter.Session {
    // Establish connection to your voice service
    return {
      get status() { /* ... */ },
      get isMuted() { /* ... */ },

      disconnect: () => { /* ... */ },
      mute: () => { /* ... */ },
      unmute: () => { /* ... */ },

      onStatusChange: (callback) => {
        // Status: { type: "starting" } → { type: "running" } → { type: "ended", reason }
        return () => {}; // Return unsubscribe
      },

      onTranscript: (callback) => {
        // callback({ role: "user" | "assistant", text: "...", isFinal: true })
        // Transcripts are automatically appended as messages in the thread.
        return () => {};
      },

      // Report who is speaking (drives the VoiceOrb speaking animation)
      onModeChange: (callback) => {
        // callback("listening") — user's turn
        // callback("speaking") — agent's turn
        return () => {};
      },

      // Report real-time audio level (0–1) for visual feedback
      onVolumeChange: (callback) => {
        // callback(0.72) — drives VoiceOrb amplitude and waveform bar heights
        return () => {};
      },
    };
  }
}
```

### Session Lifecycle

The session status follows the same pattern as other adapters:

The `ended` status includes a `reason`:

- `"finished"` — session ended normally
- `"cancelled"` — session was cancelled by the user
- `"error"` — session ended due to an error (includes `error` field)

### Mode and Volume

All adapters must implement `onModeChange` and `onVolumeChange`. If your provider doesn't support these, return a no-op unsubscribe:

- **`onModeChange`** — Reports `"listening"` (user's turn) or `"speaking"` (agent's turn). The `VoiceOrb` switches to the active speaking animation.
- **`onVolumeChange`** — Reports a real-time audio level (`0`–`1`). The `VoiceOrb` modulates its amplitude and glow, and waveform bars scale to match.

When using `createVoiceSession`, these are handled automatically — call `session.emitMode()` and `session.emitVolume()` when your provider delivers data.

### Transcript Handling

Transcripts emitted via `onTranscript` are automatically appended to the message thread:

- **User transcripts** (`role: "user"`, `isFinal: true`) are appended as user messages.
- **Assistant transcripts** (`role: "assistant"`) are streamed into an assistant message. The message shows a "running" status until `isFinal: true` is received.

## Example: ElevenLabs Conversational AI

[ElevenLabs Conversational AI](https://elevenlabs.io/docs/agents-platform/overview) provides realtime voice agents via WebRTC.

### Install Dependencies

```
npm install @elevenlabs/client
```

### Adapter

```
import type { RealtimeVoiceAdapter, Unsubscribe } from "@assistant-ui/react";
import { VoiceConversation } from "@elevenlabs/client";

export class ElevenLabsVoiceAdapter implements RealtimeVoiceAdapter {
  private _agentId: string;

  constructor(options: { agentId: string }) {
    this._agentId = options.agentId;
  }

  connect(options: {
    abortSignal?: AbortSignal;
  }): RealtimeVoiceAdapter.Session {
    const statusCallbacks = new Set<(s: RealtimeVoiceAdapter.Status) => void>();
    const transcriptCallbacks = new Set<(t: RealtimeVoiceAdapter.TranscriptItem) => void>();
    const modeCallbacks = new Set<(m: RealtimeVoiceAdapter.Mode) => void>();
    const volumeCallbacks = new Set<(v: number) => void>();

    let currentStatus: RealtimeVoiceAdapter.Status = { type: "starting" };
    let isMuted = false;
    let conversation: VoiceConversation | null = null;
    let disposed = false;

    const updateStatus = (status: RealtimeVoiceAdapter.Status) => {
      if (disposed) return;
      currentStatus = status;
      for (const cb of statusCallbacks) cb(status);
    };

    const cleanup = () => {
      disposed = true;
      conversation = null;
      statusCallbacks.clear();
      transcriptCallbacks.clear();
      modeCallbacks.clear();
      volumeCallbacks.clear();
    };

    const session: RealtimeVoiceAdapter.Session = {
      get status() { return currentStatus; },
      get isMuted() { return isMuted; },
      disconnect: () => { conversation?.endSession(); cleanup(); },
      mute: () => { conversation?.setMicMuted(true); isMuted = true; },
      unmute: () => { conversation?.setMicMuted(false); isMuted = false; },
      onStatusChange: (cb): Unsubscribe => {
        statusCallbacks.add(cb);
        return () => statusCallbacks.delete(cb);
      },
      onTranscript: (cb): Unsubscribe => {
        transcriptCallbacks.add(cb);
        return () => transcriptCallbacks.delete(cb);
      },
      onModeChange: (cb): Unsubscribe => {
        modeCallbacks.add(cb);
        return () => modeCallbacks.delete(cb);
      },
      onVolumeChange: (cb): Unsubscribe => {
        volumeCallbacks.add(cb);
        return () => volumeCallbacks.delete(cb);
      },
    };

    if (options.abortSignal) {
      options.abortSignal.addEventListener("abort", () => {
        conversation?.endSession(); cleanup();
      }, { once: true });
    }

    const doConnect = async () => {
      if (disposed) return;
      try {
        conversation = await VoiceConversation.startSession({
          agentId: this._agentId,
          onConnect: () => updateStatus({ type: "running" }),
          onDisconnect: () => { updateStatus({ type: "ended", reason: "finished" }); cleanup(); },
          onError: (msg) => { updateStatus({ type: "ended", reason: "error", error: new Error(msg) }); cleanup(); },
          onModeChange: ({ mode }) => {
            if (disposed) return;
            for (const cb of modeCallbacks) cb(mode === "speaking" ? "speaking" : "listening");
          },
          onMessage: (msg) => {
            if (disposed) return;
            for (const cb of transcriptCallbacks) {
              cb({ role: msg.role === "user" ? "user" : "assistant", text: msg.message, isFinal: true });
            }
          },
        });
      } catch (error) {
        updateStatus({ type: "ended", reason: "error", error }); cleanup();
      }
    };

    doConnect();
    return session;
  }
}
```

### Usage

```
import { ElevenLabsVoiceAdapter } from "@/lib/elevenlabs-voice-adapter";

const runtime = useChatRuntime({
  adapters: {
    voice: new ElevenLabsVoiceAdapter({
      agentId: process.env.NEXT_PUBLIC_ELEVENLABS_AGENT_ID!,
    }),
  },
});
```

## Using `createVoiceSession`

`createVoiceSession` is a helper that eliminates the manual `Set<callback>` boilerplate shown in the ElevenLabs example above. Pass it an async `setup` function that receives a `helpers` object and returns `{ disconnect, mute, unmute }`. The helper wires up all callback sets, status tracking, and abort-signal handling for you.

```
import { createVoiceSession, type RealtimeVoiceAdapter } from "@assistant-ui/react";

export class MyVoiceAdapter implements RealtimeVoiceAdapter {
  connect(options: { abortSignal?: AbortSignal }): RealtimeVoiceAdapter.Session {
    return createVoiceSession(options, async (helpers) => {
      // Connect to your provider
      const client = await MyVoiceClient.connect();

      client.on("open", () => helpers.setStatus({ type: "running" }));
      client.on("close", () => helpers.end("finished"));
      client.on("error", (err) => helpers.end("error", err));

      client.on("transcript", (item) => helpers.emitTranscript(item));
      client.on("mode", (mode) => helpers.emitMode(mode));
      client.on("volume", (v) => helpers.emitVolume(v));

      // Return controls — createVoiceSession calls these on disconnect/mute/unmute
      return {
        disconnect: () => client.close(),
        mute: () => client.setMuted(true),
        unmute: () => client.setMuted(false),
      };
    });
  }
}
```

The `helpers` object exposes `setStatus`, `end`, `emitTranscript`, `emitMode`, `emitVolume`, and `isDisposed`. When `isDisposed()` is true the session has been torn down and you can skip further event handling.

## Example: LiveKit

[LiveKit](https://livekit.io/) provides realtime voice via WebRTC rooms with transcription support. Unlike fully-hosted agent services, LiveKit follows a "bring-your-own-agent" model: the browser adapter only joins a room, and you run a separate **agent worker** that joins the same room and handles STT, LLM, and TTS. Without an agent in the room, the client will connect successfully but have nothing to talk to.

### Prerequisites

1. **A LiveKit server** — create a [LiveKit Cloud](https://cloud.livekit.io/) project (grab the URL, API Key, and API Secret from the project settings) or self-host `livekit-server`.
2. **An agent worker** — built with the [LiveKit Agents SDK](https://docs.livekit.io/agents/) (Python or Node). The worker connects to your LiveKit server and is automatically dispatched into new rooms.

### Install Dependencies

```
npm install livekit-client livekit-server-sdk
```

`livekit-client` powers the browser adapter; `livekit-server-sdk` is used server-side to mint access tokens.

### Usage

```
import { LiveKitVoiceAdapter } from "@/lib/livekit-voice-adapter";

const runtime = useChatRuntime({
  adapters: {
    voice: new LiveKitVoiceAdapter({
      url: process.env.NEXT_PUBLIC_LIVEKIT_URL!,
      token: async () => {
        const res = await fetch("/api/livekit-token", { method: "POST" });
        const { token } = await res.json();
        return token;
      },
    }),
  },
});
```

See the [`with-livekit` example](https://github.com/assistant-ui/assistant-ui/tree/main/examples/with-livekit) for a complete implementation: the browser adapter, the token endpoint, and a minimal Python agent worker using the OpenAI Realtime API.