LocalRuntime

Quickest path to a working chat. Handles state while you handle the API.

LocalRuntime is the simplest way to connect a custom backend. You implement a single ChatModelAdapter (one run function) and the runtime handles everything else: messages, threads, branching, editing, regeneration, cancellation.

State lives inside the runtime by default. Multi-thread persistence and shared adapters are added via the standard interfaces, see adapters and threads.

When to use it

Pick LocalRuntime when:

You want assistant-ui to manage chat state for you.
Your backend exposes a function-call shaped API (REST, OpenAI SDK, your own model client).
Branching, editing, and regeneration should work without you writing extra code.
You want to compose adapters (attachments, speech, feedback, history, suggestions).

If you already keep messages in redux, zustand, tanstack-query, or another store, use ExternalStoreRuntime instead.

Quickstart

Create a Next.js project

npx create-next-app@latest my-app
cd my-app

Install `@assistant-ui/react`

npm install @assistant-ui/react

Add the Thread component

npx assistant-ui@latest add thread

Define a `MyRuntimeProvider`

Replace the MyModelAdapter body with your backend call.

app/MyRuntimeProvider.tsx

"use client";

import type { ReactNode } from "react";
import {
  AssistantRuntimeProvider,
  useLocalRuntime,
  type ChatModelAdapter,
} from "@assistant-ui/react";

const MyModelAdapter: ChatModelAdapter = {
  async run({ messages, abortSignal }) {
    const result = await fetch("<YOUR_API_ENDPOINT>", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({ messages }),
      signal: abortSignal,
    });
    const data = await result.json();
    return {
      content: [{ type: "text", text: data.text }],
    };
  },
};

export function MyRuntimeProvider({
  children,
}: Readonly<{ children: ReactNode }>) {
  const runtime = useLocalRuntime(MyModelAdapter);
  return (
    <AssistantRuntimeProvider runtime={runtime}>
      {children}
    </AssistantRuntimeProvider>
  );
}

Wrap your app

app/layout.tsx

import type { ReactNode } from "react";
import { MyRuntimeProvider } from "@/app/MyRuntimeProvider";

export default function RootLayout({ children }: { children: ReactNode }) {
  return (
    <MyRuntimeProvider>
      <html lang="en">
        <body>{children}</body>
      </html>
    </MyRuntimeProvider>
  );
}

Render the Thread

app/page.tsx

import { Thread } from "@/components/assistant-ui/thread";

export default function Page() {
  return <Thread />;
}

Streaming responses

Declare run as an async * generator and yield the full cumulative content on each iteration:

import {
  ChatModelAdapter,
  ThreadMessage,
  type ModelContext,
} from "@assistant-ui/react";
import { OpenAI } from "openai";

const openai = new OpenAI();

const MyModelAdapter: ChatModelAdapter = {
  async *run({ messages, abortSignal, context }) {
    const stream = await openai.chat.completions.create({
      model: "gpt-4o",
      messages: convertToOpenAIMessages(messages),
      stream: true,
      signal: abortSignal,
    });

    let text = "";
    for await (const part of stream) {
      text += part.choices[0]?.delta?.content || "";
      yield {
        content: [{ type: "text", text }],
      };
    }
  },
};

Each yield replaces the previous content. Yield the full state every time, not deltas.

Streaming with tool calls

Accumulate tool calls in a Map outside the streaming loop so they persist across chunks:

async *run({ messages, abortSignal, context }) {
  const stream = await openai.chat.completions.create({
    model: "gpt-4o",
    messages: convertToOpenAIMessages(messages),
    tools: context.tools,
    stream: true,
    signal: abortSignal,
  });

  let text = "";
  const toolCallsMap = new Map();

  for await (const chunk of stream) {
    text += chunk.choices[0]?.delta?.content ?? "";

    for (const toolCall of chunk.choices[0]?.delta?.tool_calls ?? []) {
      toolCallsMap.set(toolCall.id, {
        type: "tool-call",
        toolName: toolCall.function?.name,
        toolCallId: toolCall.id,
        args: JSON.parse(toolCall.function?.arguments ?? "{}"),
      });
    }

    yield {
      content: [
        ...(text ? [{ type: "text" as const, text }] : []),
        ...Array.from(toolCallsMap.values()),
      ],
    };
  }
}

If you build the content array fresh from the current chunk each iteration, tool calls from earlier chunks will disappear when a later chunk carries only text. The Map outside the loop is the fix.

Tool calling

LocalRuntime supports OpenAI-compatible function calling. Register tools through useAui so the runtime exposes them to your adapter via context.tools:

import { useAui, Tools, type Toolkit } from "@assistant-ui/react";
import { z } from "zod";

const myToolkit: Toolkit = {
  getWeather: {
    description: "Get the current weather in a location",
    parameters: z.object({
      location: z.string(),
      unit: z.enum(["celsius", "fahrenheit"]).default("celsius"),
    }),
    execute: async ({ location, unit }) => fetchWeather(location, unit),
  },
};

function MyRuntimeProvider({ children }: { children: React.ReactNode }) {
  const runtime = useLocalRuntime(MyModelAdapter);
  const aui = useAui({ tools: Tools({ toolkit: myToolkit }) });

  return (
    <AssistantRuntimeProvider aui={aui} runtime={runtime}>
      {children}
    </AssistantRuntimeProvider>
  );
}

See the tools guide for advanced patterns.

Human-in-the-loop approval

Require user confirmation before specific tools execute:

const runtime = useLocalRuntime(MyModelAdapter, {
  unstable_humanToolNames: ["delete_file", "send_email"],
});

unstable_humanToolNames is unstable; see stability.

Resuming a run

resumeRun reconnects to an in-progress assistant run. Useful for page refresh, network reconnect, tab backgrounding, or thread switching when the backend is still generating.

Unlike startRun (which uses the ChatModelAdapter), resumeRun requires a stream parameter; you provide the async generator that produces the response.

import { useAui } from "@assistant-ui/react";
import type { ChatModelRunResult } from "@assistant-ui/core";

const aui = useAui();

async function* createCustomStream(): AsyncGenerator<ChatModelRunResult> {
  yield { content: [{ type: "text", text: "Initial response" }] };
  await new Promise((r) => setTimeout(r, 500));
  yield {
    content: [
      { type: "text", text: "Initial response. And here's more content..." },
    ],
  };
}

aui.thread().resumeRun({
  parentId: "message-id",
  stream: createCustomStream,
});

A common pattern is to check whether the backend is still running on mount, then reconnect:

function useStreamReconnect(threadId: string) {
  const aui = useAui();
  const checkedRef = useRef(false);

  useEffect(() => {
    if (checkedRef.current) return;
    checkedRef.current = true;

    (async () => {
      const status = await fetch(`/api/status/${threadId}`).then((r) =>
        r.json(),
      );
      if (status.isRunning) {
        const parentId = aui.thread().getState().messages.at(-1)?.id ?? null;
        aui.thread().resumeRun({ parentId });
      }
    })();
  }, [aui, threadId]);
}

Adapters

Attachments, speech, feedback, history, and suggestions are wired through the standard adapter contracts, see adapters:

const runtime = useLocalRuntime(MyModelAdapter, {
  adapters: {
    attachments: myAttachmentAdapter,
    speech: mySpeechAdapter,
    feedback: myFeedbackAdapter,
    history: myHistoryAdapter,
    suggestion: mySuggestionAdapter,
  },
});

Multi-thread

LocalRuntime supports multi-thread either via AssistantCloud or via a custom RemoteThreadListAdapter. See threads for the contract and full examples.

// managed (see "AssistantCloud" in /docs/runtimes/concepts/threads for cloud setup)
const runtime = useLocalRuntime(MyModelAdapter, { cloud });

Integration examples

OpenAI

import { OpenAI } from "openai";

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

const OpenAIAdapter: ChatModelAdapter = {
  async *run({ messages, abortSignal, context }) {
    const stream = await openai.chat.completions.create({
      model: "gpt-4o",
      messages: messages.map((m) => ({
        role: m.role,
        content: m.content
          .filter((c) => c.type === "text")
          .map((c) => c.text)
          .join("\n"),
      })),
      stream: true,
      signal: abortSignal,
    });

    let fullText = "";
    for await (const chunk of stream) {
      const content = chunk.choices[0]?.delta?.content;
      if (content) {
        fullText += content;
        yield { content: [{ type: "text", text: fullText }] };
      }
    }
  },
};

Custom REST API

const CustomAPIAdapter: ChatModelAdapter = {
  async run({ messages, abortSignal, unstable_threadId }) {
    const response = await fetch("/api/chat", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({
        messages: messages.map((m) => ({
          role: m.role,
          content: m.content,
        })),
        threadId: unstable_threadId,
      }),
      signal: abortSignal,
    });
    if (!response.ok) throw new Error(`API error: ${response.statusText}`);
    const data = await response.json();
    return { content: [{ type: "text", text: data.message }] };
  },
};

Best practices

Always pass abortSignal to fetch and SDK calls so cancel works:
```
fetch(url, { signal: abortSignal });
```
Handle errors gracefully. Swallow AbortError (it is the user cancelling); rethrow others to surface in the UI.
Yield cumulative state, not deltas. Each yield replaces the previous content; if you yield deltas the UI flickers.
Accumulate tool calls outside the streaming loop, otherwise they vanish on the first text-only chunk.

Troubleshooting

Messages not appearing. Ensure your adapter returns the correct shape: { content: [{ type: "text", text: "..." }] }.

Streaming not working. Use async *run (with the asterisk). A plain async run cannot yield.

Tool UI flickers and disappears. state is being reset between chunks. Accumulate tool calls in a Map declared outside the for await loop.

API reference

`ChatModelAdapter`

ChatModelAdapter

run: ChatModelRunOptions => ChatModelRunResult | AsyncGenerator<ChatModelRunResult>

Function that sends messages to your API and returns the response.

`ChatModelRunOptions`

ChatModelRunOptions

messages: readonly ThreadMessage[]

The conversation history to send to your API.

runConfig: RunConfig

Run configuration with optional custom metadata. RunConfig is { readonly custom?: Record<string, unknown> }.

abortSignal: AbortSignal

Signal to cancel the request if user interrupts.

context: ModelContext

Additional context including configuration and tools.

unstable_assistantMessageIdunstable?: string | Undefined

ID of the assistant message being generated. Useful for tracking or updating specific messages.

unstable_threadIdunstable?: string | Undefined

Current thread/conversation identifier. Useful for passing to your backend API.

unstable_parentIdunstable?: string | Null | Undefined

ID of the parent message this response is replying to. null if this is the first message.

unstable_getMessageunstable?: () => ThreadMessage

Returns the current assistant message being generated. Useful during streaming.

`LocalRuntimeOptions`

LocalRuntimeOptions

initialMessages?: readonly ThreadMessageLike[]

Pre-populate the thread with messages.

maxSteps: number= 2

Maximum number of sequential tool calls before requiring user input.

cloud?: AssistantCloud

Enable Assistant Cloud integration for multi-thread support and persistence.

adapters?: LocalRuntimeAdapters

Capability adapters. UI features automatically enable based on which adapters are provided. See /docs/runtimes/concepts/adapters.

unstable_humanToolNamesunstable?: string[]

Tool names that require human approval before execution (unstable).

ExternalStoreRuntimeBring your own state store.AdaptersAttachments, speech, feedback, history, suggestions.ThreadsCloud, custom database, ExternalStore-based.