# Speech-to-Text (Dictation)
URL: /docs/guides/Dictation
***
## title: Speech-to-Text (Dictation)
import { DictationSample } from "@/components/docs/samples/dictation-sample";
assistant-ui supports speech-to-text (dictation) via the `DictationAdapter` interface. This allows users to input messages using their voice.
## DictationAdapter
Currently, the following dictation adapters are supported:
* `WebSpeechDictationAdapter`: Uses the browser's `Web Speech API` (SpeechRecognition)
The `WebSpeechDictationAdapter` is supported in Chrome, Edge, and Safari. Check [browser compatibility](https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecognition#browser_compatibility) for details.
## Configuration
```tsx
import { WebSpeechDictationAdapter } from "@assistant-ui/react";
const runtime = useChatRuntime({
api: "/api/chat",
adapters: {
dictation: new WebSpeechDictationAdapter({
// Optional configuration
language: "en-US", // Language for recognition (default: browser language)
continuous: true, // Keep recording after user stops (default: true)
interimResults: true, // Return interim results (default: true)
}),
},
});
```
## UI
The dictation feature uses `ComposerPrimitive.Dictate` and `ComposerPrimitive.StopDictation` components.
```tsx
import { ComposerPrimitive } from "@assistant-ui/react";
import { MicIcon, SquareIcon } from "lucide-react";
const ComposerWithDictation = () => (
{/* Show Dictate button when not dictating */}
{/* Show Stop button when dictating */}
);
```
## Browser Compatibility Check
You can check if the browser supports dictation:
```tsx
import { WebSpeechDictationAdapter } from "@assistant-ui/react";
if (WebSpeechDictationAdapter.isSupported()) {
// Dictation is available
}
```
## Disabling Input During Dictation
Some dictation services (like ElevenLabs Scribe) return cumulative transcripts that conflict with simultaneous typing. You can disable the text input during dictation:
```tsx
import type { DictationAdapter } from "@assistant-ui/react";
class MyAdapter implements DictationAdapter {
// Set to true to disable typing while dictating
disableInputDuringDictation = true;
listen() { /* ... */ }
}
```
When a message is sent during an active dictation session, the session is automatically stopped.
## Custom Adapters
You can create custom adapters to integrate with any dictation service by implementing the `DictationAdapter` interface.
### DictationAdapter Interface
```tsx
import type { DictationAdapter } from "@assistant-ui/react";
class MyCustomDictationAdapter implements DictationAdapter {
// Optional: disable text input while dictating (default: false)
disableInputDuringDictation?: boolean;
listen(): DictationAdapter.Session {
// Return a session object that manages the dictation
return {
status: { type: "starting" },
stop: async () => {
// Stop recognition and finalize results
},
cancel: () => {
// Cancel recognition without finalizing
},
onSpeechStart: (callback) => {
// Called when speech is detected
return () => {}; // Return unsubscribe function
},
onSpeechEnd: (callback) => {
// Called when recognition ends with final result
return () => {};
},
onSpeech: (callback) => {
// Called with transcription results
// callback({ transcript: "text", isFinal: true })
//
// isFinal: true → Append to composer input (default)
// isFinal: false → Show as preview only
return () => {};
},
};
}
}
```
### Interim vs Final Results
The `onSpeech` callback receives results with an optional `isFinal` flag:
```tsx
onSpeech: (callback) => {
// callback({ transcript: "text", isFinal: true })
// - isFinal: true → Text is committed to the input
// - isFinal: false → Text is shown as preview in the input
return () => {};
},
```
**Both interim and final results are displayed directly in the input field**, just like native dictation on iOS/Android. Interim results replace each other until a final result commits the text. This provides seamless real-time feedback while the user speaks.
### Example: ElevenLabs Scribe v2 Realtime
[ElevenLabs Scribe](https://elevenlabs.io/docs/capabilities/speech-to-text) provides ultra-low latency (\~150ms) real-time transcription via WebSocket.
#### Install Dependencies
```bash
npm install @elevenlabs/client
```
#### Backend API Route
Create an API route to generate single-use tokens:
```ts title="app/api/scribe-token/route.ts"
export async function POST() {
const response = await fetch(
"https://api.elevenlabs.io/v1/single-use-token/realtime_scribe",
{
method: "POST",
headers: {
"xi-api-key": process.env.ELEVENLABS_API_KEY!,
},
}
);
const data = await response.json();
return Response.json({ token: data.token });
}
```
#### Frontend Adapter
```tsx title="lib/elevenlabs-scribe-adapter.ts"
import type { DictationAdapter } from "@assistant-ui/react";
import { Scribe, RealtimeEvents } from "@elevenlabs/client";
export class ElevenLabsScribeAdapter implements DictationAdapter {
private tokenEndpoint: string;
private languageCode: string;
// ElevenLabs returns cumulative transcripts, so we disable typing during dictation
public disableInputDuringDictation: boolean;
constructor(options: {
tokenEndpoint: string;
languageCode?: string;
disableInputDuringDictation?: boolean;
}) {
this.tokenEndpoint = options.tokenEndpoint;
this.languageCode = options.languageCode ?? "en";
this.disableInputDuringDictation = options.disableInputDuringDictation ?? true;
}
listen(): DictationAdapter.Session {
const callbacks = {
start: new Set<() => void>(),
end: new Set<(r: DictationAdapter.Result) => void>(),
speech: new Set<(r: DictationAdapter.Result) => void>(),
};
let connection: ReturnType | null = null;
let fullTranscript = "";
const session: DictationAdapter.Session = {
status: { type: "starting" },
stop: async () => {
if (connection) {
connection.commit();
await new Promise((r) => setTimeout(r, 500));
connection.close();
}
if (fullTranscript) {
for (const cb of callbacks.end) cb({ transcript: fullTranscript });
}
},
cancel: () => {
connection?.close();
},
onSpeechStart: (cb) => {
callbacks.start.add(cb);
return () => callbacks.start.delete(cb);
},
onSpeechEnd: (cb) => {
callbacks.end.add(cb);
return () => callbacks.end.delete(cb);
},
onSpeech: (cb) => {
callbacks.speech.add(cb);
return () => callbacks.speech.delete(cb);
},
};
this.connect(session, callbacks, {
setConnection: (c) => { connection = c; },
getFullTranscript: () => fullTranscript,
setFullTranscript: (t) => { fullTranscript = t; },
});
return session;
}
private async connect(
session: DictationAdapter.Session,
callbacks: {
start: Set<() => void>;
end: Set<(r: DictationAdapter.Result) => void>;
speech: Set<(r: DictationAdapter.Result) => void>;
},
refs: {
setConnection: (c: ReturnType) => void;
getFullTranscript: () => string;
setFullTranscript: (t: string) => void;
}
) {
try {
// 1. Get token from backend
const tokenRes = await fetch(this.tokenEndpoint, { method: "POST" });
const { token } = await tokenRes.json();
// 2. Connect to Scribe with microphone
const connection = Scribe.connect({
token,
modelId: "scribe_v2_realtime",
languageCode: this.languageCode,
microphone: {
echoCancellation: true,
noiseSuppression: true,
},
});
refs.setConnection(connection);
// 3. Handle events
connection.on(RealtimeEvents.SESSION_STARTED, () => {
(session as { status: DictationAdapter.Status }).status = {
type: "running",
};
for (const cb of callbacks.start) cb();
});
// Partial transcripts → preview (isFinal: false)
connection.on(RealtimeEvents.PARTIAL_TRANSCRIPT, (data) => {
if (data.text) {
for (const cb of callbacks.speech)
cb({ transcript: data.text, isFinal: false });
}
});
// Committed transcripts → append to input (isFinal: true)
connection.on(RealtimeEvents.COMMITTED_TRANSCRIPT, (data) => {
if (data.text?.trim()) {
refs.setFullTranscript(refs.getFullTranscript() + data.text + " ");
for (const cb of callbacks.speech)
cb({ transcript: data.text, isFinal: true });
}
});
connection.on(RealtimeEvents.ERROR, (error) => {
console.error("Scribe error:", error);
(session as { status: DictationAdapter.Status }).status = {
type: "ended",
reason: "error",
};
});
} catch (error) {
console.error("ElevenLabs Scribe connection failed:", error);
(session as { status: DictationAdapter.Status }).status = {
type: "ended",
reason: "error",
};
}
}
}
```
#### Usage
```tsx
const runtime = useChatRuntime({
api: "/api/chat",
adapters: {
dictation: new ElevenLabsScribeAdapter({
tokenEndpoint: "/api/scribe-token",
languageCode: "en", // Optional: supports 90+ languages
disableInputDuringDictation: true, // Default: true (recommended for ElevenLabs)
}),
},
});
```
#### Real-time Preview
The transcription is displayed directly in the input field as the user speaks — just like native dictation. No additional UI components are needed for basic use cases.
For advanced customization, `composer.dictation?.transcript` contains the current interim transcript, and `ComposerPrimitive.DictationTranscript` can display it separately if desired.
For more details, see the [ElevenLabs Scribe documentation](https://elevenlabs.io/docs/capabilities/speech-to-text).