Quickest path to a working chat. Handles state while you handle the API.
LocalRuntime is the simplest way to connect a custom backend. You implement a single ChatModelAdapter (one run function) and the runtime handles everything else: messages, threads, branching, editing, regeneration, cancellation.
State lives inside the runtime by default. Multi-thread persistence and shared adapters are added via the standard interfaces, see adapters and threads.
When to use it
Pick LocalRuntime when:
- You want assistant-ui to manage chat state for you.
- Your backend exposes a function-call shaped API (REST, OpenAI SDK, your own model client).
- Branching, editing, and regeneration should work without you writing extra code.
- You want to compose adapters (attachments, speech, feedback, history, suggestions).
If you already keep messages in redux, zustand, tanstack-query, or another store, use ExternalStoreRuntime instead.
Quickstart
Create a project
npx create-next-app@latest my-app
cd my-appnpx create-expo-app@latest my-app
cd my-appmkdir my-app
cd my-app
npm init -yInstall dependencies
npm install @assistant-ui/reactnpx expo install @assistant-ui/react-nativenpm install @assistant-ui/react-ink ink reactAdd the Thread component
npx assistant-ui@latest add threadUse your React Native thread component from the React Native setup.
Use your terminal thread component from the Ink setup.
Define a MyRuntimeProvider
Replace the MyModelAdapter body with your backend call.
"use client";
import type { ReactNode } from "react";
import {
AssistantRuntimeProvider,
useLocalRuntime,
type ChatModelAdapter,
} from "@assistant-ui/react";
const MyModelAdapter: ChatModelAdapter = {
async run({ messages, abortSignal }) {
const result = await fetch("<YOUR_API_ENDPOINT>", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ messages }),
signal: abortSignal,
});
const data = await result.json();
return {
content: [{ type: "text", text: data.text }],
};
},
};
export function MyRuntimeProvider({
children,
}: Readonly<{ children: ReactNode }>) {
const runtime = useLocalRuntime(MyModelAdapter);
return (
<AssistantRuntimeProvider runtime={runtime}>
{children}
</AssistantRuntimeProvider>
);
}import type { ReactNode } from "react";
import {
AssistantRuntimeProvider,
useLocalRuntime,
type ChatModelAdapter,
} from "@assistant-ui/react-native";
const MyModelAdapter: ChatModelAdapter = {
async run({ messages, abortSignal }) {
const result = await fetch("<YOUR_API_ENDPOINT>", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ messages }),
signal: abortSignal,
});
const data = await result.json();
return {
content: [{ type: "text", text: data.text }],
};
},
};
export function MyRuntimeProvider({
children,
}: Readonly<{ children: ReactNode }>) {
const runtime = useLocalRuntime(MyModelAdapter);
return (
<AssistantRuntimeProvider runtime={runtime}>
{children}
</AssistantRuntimeProvider>
);
}import type { ReactNode } from "react";
import {
AssistantRuntimeProvider,
useLocalRuntime,
type ChatModelAdapter,
} from "@assistant-ui/react-ink";
const MyModelAdapter: ChatModelAdapter = {
async run({ messages, abortSignal }) {
const result = await fetch("<YOUR_API_ENDPOINT>", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ messages }),
signal: abortSignal,
});
const data = await result.json();
return {
content: [{ type: "text", text: data.text }],
};
},
};
export function MyRuntimeProvider({
children,
}: Readonly<{ children: ReactNode }>) {
const runtime = useLocalRuntime(MyModelAdapter);
return (
<AssistantRuntimeProvider runtime={runtime}>
{children}
</AssistantRuntimeProvider>
);
}Wrap your app
import type { ReactNode } from "react";
import { MyRuntimeProvider } from "@/app/MyRuntimeProvider";
export default function RootLayout({ children }: { children: ReactNode }) {
return (
<MyRuntimeProvider>
<html lang="en">
<body>{children}</body>
</html>
</MyRuntimeProvider>
);
}import { Stack } from "expo-router";
import { MyRuntimeProvider } from "@/runtime/MyRuntimeProvider";
export default function RootLayout() {
return (
<MyRuntimeProvider>
<Stack />
</MyRuntimeProvider>
);
}import { Box } from "ink";
import { Thread } from "./components/thread.js";
import { MyRuntimeProvider } from "./runtime/MyRuntimeProvider.js";
export function App() {
return (
<MyRuntimeProvider>
<Box flexDirection="column">
<Thread />
</Box>
</MyRuntimeProvider>
);
}Render the Thread
import { Thread } from "@/components/assistant-ui/thread";
export default function Page() {
return <Thread />;
}import { View } from "react-native";
import { Thread } from "@/components/assistant-ui/thread";
export default function Page() {
return (
<View style={{ flex: 1 }}>
<Thread />
</View>
);
}import { render } from "ink";
import { App } from "./app.js";
render(<App />);Streaming responses
Declare run as an async * generator and yield the full cumulative content on each iteration:
import {
ChatModelAdapter,
ThreadMessage,
type ModelContext,
} from "@assistant-ui/react";
import { OpenAI } from "openai";
const openai = new OpenAI();
const MyModelAdapter: ChatModelAdapter = {
async *run({ messages, abortSignal, context }) {
const stream = await openai.chat.completions.create({
model: "gpt-5.4-mini",
messages: convertToOpenAIMessages(messages),
stream: true,
signal: abortSignal,
});
let text = "";
for await (const part of stream) {
text += part.choices[0]?.delta?.content || "";
yield {
content: [{ type: "text", text }],
};
}
},
};Each yield replaces the previous content. Yield the full state every time, not deltas.
Streaming with tool calls
Accumulate tool calls in a Map outside the streaming loop so they persist across chunks:
async *run({ messages, abortSignal, context }) {
const stream = await openai.chat.completions.create({
model: "gpt-5.4-mini",
messages: convertToOpenAIMessages(messages),
tools: context.tools,
stream: true,
signal: abortSignal,
});
let text = "";
const toolCallsMap = new Map();
for await (const chunk of stream) {
text += chunk.choices[0]?.delta?.content ?? "";
for (const toolCall of chunk.choices[0]?.delta?.tool_calls ?? []) {
toolCallsMap.set(toolCall.id, {
type: "tool-call",
toolName: toolCall.function?.name,
toolCallId: toolCall.id,
args: JSON.parse(toolCall.function?.arguments ?? "{}"),
});
}
yield {
content: [
...(text ? [{ type: "text" as const, text }] : []),
...Array.from(toolCallsMap.values()),
],
};
}
}If you build the content array fresh from the current chunk each iteration, tool calls from earlier chunks will disappear when a later chunk carries only text. The Map outside the loop is the fix.
Tool calling
LocalRuntime supports OpenAI-compatible function calling. Register tools through useAui so the runtime exposes them to your adapter via context.tools:
"use generative";
import { defineToolkit } from "@assistant-ui/react";
import { z } from "zod";
export default defineToolkit({
getWeather: {
description: "Get the current weather in a location",
parameters: z.object({
location: z.string(),
unit: z.enum(["celsius", "fahrenheit"]).default("celsius"),
}),
execute: async ({ location, unit }) => {
"use client";
return fetchWeather(location, unit);
},
renderText: { running: "Checking the weather…", complete: "Weather ready" },
},
});import { useAui, Tools } from "@assistant-ui/react";
import toolkit from "./toolkit";
function MyRuntimeProvider({ children }: { children: React.ReactNode }) {
const runtime = useLocalRuntime(MyModelAdapter);
const aui = useAui({ tools: Tools({ toolkit }) });
return (
<AssistantRuntimeProvider aui={aui} runtime={runtime}>
{children}
</AssistantRuntimeProvider>
);
}See the tools guide for advanced patterns.
Human-in-the-loop tools
Tools listed in unstable_humanToolNames are not executed by code. The run pauses on the tool call and the user supplies the result through the tool UI:
const runtime = useLocalRuntime(MyModelAdapter, {
unstable_humanToolNames: ["send_email"],
});The pause is driven by the message status your adapter returns; LocalRuntime never sets it for you. When the model requests a human tool call, end the run with status: { type: "requires-action", reason: "tool-calls" }. Without that status, the runtime marks the message complete and nothing waits. Only return it while a listed tool call is missing its result: unresolved tool calls that are not listed do not hold the run, so the runtime would invoke your adapter again immediately.
const MyModelAdapter: ChatModelAdapter = {
async run({ messages, abortSignal, unstable_getMessage }) {
const toolResults = unstable_getMessage().content.flatMap((part) =>
part.type === "tool-call" && part.result !== undefined
? [{ toolCallId: part.toolCallId, result: part.result }]
: [],
);
const result = await fetch("<YOUR_API_ENDPOINT>", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ messages, toolResults }),
signal: abortSignal,
});
const data = await result.json();
if (data.toolCall) {
return {
content: [
{
type: "tool-call",
toolCallId: data.toolCall.id,
toolName: data.toolCall.name,
args: data.toolCall.args,
argsText: JSON.stringify(data.toolCall.args),
},
],
status: { type: "requires-action", reason: "tool-calls" },
};
}
return { content: [{ type: "text", text: data.text }] };
},
};The full loop:
- The run pauses. While a listed tool call has no result, the runtime stops invoking your adapter. The unresolved tool call part reports
status.type === "requires-action"to its renderer. - The user responds. The tool UI completes the call with
addResult(...). The stockToolFallbackcomponent handles this out of the box: in the requires-action state it shows Allow and Deny buttons that record the decision as the tool result. - The run resumes. Once every listed tool call has a result, the runtime invokes your adapter again. The resumed call receives the same
messagesarray as before (it ends at the user message; the in-progress assistant message is not part of it), so read the recorded results fromunstable_getMessage().contentas shown above. Content returned by the resumed call is appended to the same assistant message.
For a custom confirmation UI, register a human tool whose render completes the call with addResult. The shape of the result payload is yours to define; the adapter receives it verbatim and translates it for your backend:
const toolkit = defineToolkit({
send_email: {
type: "human",
description: "Send an email after the user confirms",
parameters: z.object({ to: z.string(), subject: z.string() }),
render: ({ args, result, addResult }) => {
if (result) {
return <p>{result.approved ? "Sent" : `Cancelled: ${result.reason}`}</p>;
}
return (
<div>
<p>
Send "{args.subject}" to {args.to}?
</p>
<button onClick={() => addResult({ approved: true })}>Allow</button>
<button
onClick={() =>
addResult({ approved: false, reason: "User declined" })
}
>
Deny
</button>
</div>
);
},
},
});unstable_humanToolNames is unstable; see stability.
Approval gates
The server-side approval gate is also supported on LocalRuntime, for actions your backend executes after the user authorizes them. Where a human tool asks the user to supply the tool result, an approval gate asks the user to allow or block an action the adapter performs. Emit approval: { id } on the tool call part and end the run with the same requires-action status:
return {
content: [
{
type: "tool-call",
toolCallId: data.toolCall.id,
toolName: data.toolCall.name,
args: data.toolCall.args,
argsText: JSON.stringify(data.toolCall.args),
approval: { id: data.toolCall.id },
},
],
status: { type: "requires-action", reason: "tool-calls" },
};A tool call with a pending approval pauses the run, whether or not the tool is listed in unstable_humanToolNames. Always emit gates in the pending state (approval: { id } with no approved field); a part that arrives already decided is treated as resolved and the runtime invokes the adapter again immediately. The stock ToolFallback Allow and Deny buttons, or a custom renderer calling respondToApproval({ approved, reason? }), record the decision:
- Deny sets
approval.approved: falseand synthesizes an error result ({ error: reason || "Tool approval denied" }withisError: true), so the model sees the denial. - Allow sets
approval.approved: trueand leaves the result empty; performing the action is your adapter's job.
Once every pending approval on the message is decided and every listed human tool has a result, the runtime invokes your adapter again. Read the decisions from unstable_getMessage().content, perform the approved actions, and return the follow-up response. A tool call that carries an approval is owned by the gate: it does not additionally require a result, even when its name is listed in unstable_humanToolNames.
Resuming a run
resumeRun reconnects to an in-progress assistant run. Useful for page refresh, network reconnect, tab backgrounding, or thread switching when the backend is still generating.
Unlike startRun (which uses the ChatModelAdapter), resumeRun requires a stream parameter; you provide the async generator that produces the response.
import { useAui } from "@assistant-ui/react";
import type { ChatModelRunResult } from "@assistant-ui/core";
const aui = useAui();
async function* createCustomStream(): AsyncGenerator<ChatModelRunResult> {
yield { content: [{ type: "text", text: "Initial response" }] };
await new Promise((r) => setTimeout(r, 500));
yield {
content: [
{ type: "text", text: "Initial response. And here's more content..." },
],
};
}
aui.thread().resumeRun({
parentId: "message-id",
stream: createCustomStream,
});A common pattern is to check whether the backend is still running on mount, then reconnect:
function useStreamReconnect(threadId: string) {
const aui = useAui();
const checkedRef = useRef(false);
useEffect(() => {
if (checkedRef.current) return;
checkedRef.current = true;
(async () => {
const status = await fetch(`/api/status/${threadId}`).then((r) =>
r.json(),
);
if (status.isRunning) {
const parentId = aui.thread().getState().messages.at(-1)?.id ?? null;
aui.thread().resumeRun({ parentId });
}
})();
}, [aui, threadId]);
}Queueing messages during a run
Set unstable_enableMessageQueue to keep the composer usable while a run is in progress. A message sent during a run is held in composer.queue and sent once the run settles; steering a queued message runs it next.
const runtime = useLocalRuntime(MyModelAdapter, {
unstable_enableMessageQueue: true,
});Render the pending messages with ComposerPrimitive.Queue and QueueItemPrimitive.
Adapters
Attachments, speech, feedback, history, and suggestions are wired through the standard adapter contracts, see adapters:
const runtime = useLocalRuntime(MyModelAdapter, {
adapters: {
attachments: myAttachmentAdapter,
speech: mySpeechAdapter,
feedback: myFeedbackAdapter,
history: myHistoryAdapter,
suggestion: mySuggestionAdapter,
},
});Multi-thread
LocalRuntime supports multi-thread either via AssistantCloud or via a custom RemoteThreadListAdapter. See threads for the contract and full examples.
// managed (see "AssistantCloud" in /docs/runtimes/concepts/threads for cloud setup)
const runtime = useLocalRuntime(MyModelAdapter, { cloud });Integration examples
OpenAI
import { OpenAI } from "openai";
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const OpenAIAdapter: ChatModelAdapter = {
async *run({ messages, abortSignal, context }) {
const stream = await openai.chat.completions.create({
model: "gpt-5.4-mini",
messages: messages.map((m) => ({
role: m.role,
content: m.content
.filter((c) => c.type === "text")
.map((c) => c.text)
.join("\n"),
})),
stream: true,
signal: abortSignal,
});
let fullText = "";
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content;
if (content) {
fullText += content;
yield { content: [{ type: "text", text: fullText }] };
}
}
},
};Custom REST API
const CustomAPIAdapter: ChatModelAdapter = {
async run({ messages, abortSignal, unstable_threadId }) {
const response = await fetch("/api/chat", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
messages: messages.map((m) => ({
role: m.role,
content: m.content,
})),
threadId: unstable_threadId,
}),
signal: abortSignal,
});
if (!response.ok) throw new Error(`API error: ${response.statusText}`);
const data = await response.json();
return { content: [{ type: "text", text: data.message }] };
},
};Best practices
- Always pass
abortSignaltofetchand SDK calls so cancel works:fetch(url, { signal: abortSignal }); - Handle errors gracefully. Swallow
AbortError(it is the user cancelling); rethrow others to surface in the UI. - Yield cumulative state, not deltas. Each yield replaces the previous content; if you yield deltas the UI flickers.
- Accumulate tool calls outside the streaming loop, otherwise they vanish on the first text-only chunk.
Troubleshooting
Messages not appearing. Ensure your adapter returns the correct shape: { content: [{ type: "text", text: "..." }] }.
Streaming not working. Use async *run (with the asterisk). A plain async run cannot yield.
Tool UI flickers and disappears. state is being reset between chunks. Accumulate tool calls in a Map declared outside the for await loop.
API reference
ChatModelAdapter
ChatModelAdapterrun: ChatModelRunOptions => ChatModelRunResult | AsyncGenerator<ChatModelRunResult>Function that sends messages to your API and returns the response.
ChatModelRunOptions
ChatModelRunOptionsmessages: readonly ThreadMessage[]The conversation history to send to your API.
runConfig: RunConfigRun configuration with optional custom metadata. RunConfig is { readonly custom?: Record<string, unknown> }.
abortSignal: AbortSignalSignal to cancel the request if user interrupts.
context: ModelContextAdditional context including configuration and tools.
unstable_assistantMessageIdunstable?: string | UndefinedID of the assistant message being generated. Useful for tracking or updating specific messages.
unstable_threadIdunstable?: string | UndefinedCurrent thread/conversation identifier. Useful for passing to your backend API.
unstable_parentIdunstable?: string | Null | UndefinedID of the parent message this response is replying to. null if this is the first message.
unstable_getMessageunstable?: () => ThreadMessageReturns the current assistant message being generated. Useful during streaming.
LocalRuntimeOptions
LocalRuntimeOptionsinitialMessages?: readonly ThreadMessageLike[]Pre-populate the thread with messages.
maxSteps: number= 2Maximum number of sequential tool calls before requiring user input.
cloud?: AssistantCloudEnable Assistant Cloud integration for multi-thread support and persistence.
adapters?: LocalRuntimeAdaptersCapability adapters. UI features automatically enable based on which adapters are provided. See /docs/runtimes/concepts/adapters.
unstable_humanToolNamesunstable?: string[]Tool names that pause the run until the user supplies a result via addResult (unstable).