Custom Backend

LocalRuntime

Quickest path to a working chat. Handles state while you handle the API.

LocalRuntime is the simplest way to connect a custom backend. You implement a single ChatModelAdapter (one run function) and the runtime handles everything else: messages, threads, branching, editing, regeneration, cancellation.

State lives inside the runtime by default. Multi-thread persistence and shared adapters are added via the standard interfaces, see adapters and threads.

When to use it

Pick LocalRuntime when:

  • You want assistant-ui to manage chat state for you.
  • Your backend exposes a function-call shaped API (REST, OpenAI SDK, your own model client).
  • Branching, editing, and regeneration should work without you writing extra code.
  • You want to compose adapters (attachments, speech, feedback, history, suggestions).

If you already keep messages in redux, zustand, tanstack-query, or another store, use ExternalStoreRuntime instead.

Quickstart

Create a Next.js project

npx create-next-app@latest my-app
cd my-app

Install @assistant-ui/react

npm install @assistant-ui/react

Add the Thread component

npx assistant-ui@latest add thread

Define a MyRuntimeProvider

Replace the MyModelAdapter body with your backend call.

app/MyRuntimeProvider.tsx
"use client";

import type { ReactNode } from "react";
import {
  AssistantRuntimeProvider,
  useLocalRuntime,
  type ChatModelAdapter,
} from "@assistant-ui/react";

const MyModelAdapter: ChatModelAdapter = {
  async run({ messages, abortSignal }) {
    const result = await fetch("<YOUR_API_ENDPOINT>", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({ messages }),
      signal: abortSignal,
    });
    const data = await result.json();
    return {
      content: [{ type: "text", text: data.text }],
    };
  },
};

export function MyRuntimeProvider({
  children,
}: Readonly<{ children: ReactNode }>) {
  const runtime = useLocalRuntime(MyModelAdapter);
  return (
    <AssistantRuntimeProvider runtime={runtime}>
      {children}
    </AssistantRuntimeProvider>
  );
}

Wrap your app

app/layout.tsx
import type { ReactNode } from "react";
import { MyRuntimeProvider } from "@/app/MyRuntimeProvider";

export default function RootLayout({ children }: { children: ReactNode }) {
  return (
    <MyRuntimeProvider>
      <html lang="en">
        <body>{children}</body>
      </html>
    </MyRuntimeProvider>
  );
}

Render the Thread

app/page.tsx
import { Thread } from "@/components/assistant-ui/thread";

export default function Page() {
  return <Thread />;
}

Streaming responses

Declare run as an async * generator and yield the full cumulative content on each iteration:

import {
  ChatModelAdapter,
  ThreadMessage,
  type ModelContext,
} from "@assistant-ui/react";
import { OpenAI } from "openai";

const openai = new OpenAI();

const MyModelAdapter: ChatModelAdapter = {
  async *run({ messages, abortSignal, context }) {
    const stream = await openai.chat.completions.create({
      model: "gpt-4o",
      messages: convertToOpenAIMessages(messages),
      stream: true,
      signal: abortSignal,
    });

    let text = "";
    for await (const part of stream) {
      text += part.choices[0]?.delta?.content || "";
      yield {
        content: [{ type: "text", text }],
      };
    }
  },
};

Each yield replaces the previous content. Yield the full state every time, not deltas.

Streaming with tool calls

Accumulate tool calls in a Map outside the streaming loop so they persist across chunks:

async *run({ messages, abortSignal, context }) {
  const stream = await openai.chat.completions.create({
    model: "gpt-4o",
    messages: convertToOpenAIMessages(messages),
    tools: context.tools,
    stream: true,
    signal: abortSignal,
  });

  let text = "";
  const toolCallsMap = new Map();

  for await (const chunk of stream) {
    text += chunk.choices[0]?.delta?.content ?? "";

    for (const toolCall of chunk.choices[0]?.delta?.tool_calls ?? []) {
      toolCallsMap.set(toolCall.id, {
        type: "tool-call",
        toolName: toolCall.function?.name,
        toolCallId: toolCall.id,
        args: JSON.parse(toolCall.function?.arguments ?? "{}"),
      });
    }

    yield {
      content: [
        ...(text ? [{ type: "text" as const, text }] : []),
        ...Array.from(toolCallsMap.values()),
      ],
    };
  }
}

If you build the content array fresh from the current chunk each iteration, tool calls from earlier chunks will disappear when a later chunk carries only text. The Map outside the loop is the fix.

Tool calling

LocalRuntime supports OpenAI-compatible function calling. Register tools through useAui so the runtime exposes them to your adapter via context.tools:

import { useAui, Tools, type Toolkit } from "@assistant-ui/react";
import { z } from "zod";

const myToolkit: Toolkit = {
  getWeather: {
    description: "Get the current weather in a location",
    parameters: z.object({
      location: z.string(),
      unit: z.enum(["celsius", "fahrenheit"]).default("celsius"),
    }),
    execute: async ({ location, unit }) => fetchWeather(location, unit),
  },
};

function MyRuntimeProvider({ children }: { children: React.ReactNode }) {
  const runtime = useLocalRuntime(MyModelAdapter);
  const aui = useAui({ tools: Tools({ toolkit: myToolkit }) });

  return (
    <AssistantRuntimeProvider aui={aui} runtime={runtime}>
      {children}
    </AssistantRuntimeProvider>
  );
}

See the tools guide for advanced patterns.

Human-in-the-loop approval

Require user confirmation before specific tools execute:

const runtime = useLocalRuntime(MyModelAdapter, {
  unstable_humanToolNames: ["delete_file", "send_email"],
});

unstable_humanToolNames is unstable; see stability.

Resuming a run

resumeRun reconnects to an in-progress assistant run. Useful for page refresh, network reconnect, tab backgrounding, or thread switching when the backend is still generating.

Unlike startRun (which uses the ChatModelAdapter), resumeRun requires a stream parameter; you provide the async generator that produces the response.

import { useAui } from "@assistant-ui/react";
import type { ChatModelRunResult } from "@assistant-ui/core";

const aui = useAui();

async function* createCustomStream(): AsyncGenerator<ChatModelRunResult> {
  yield { content: [{ type: "text", text: "Initial response" }] };
  await new Promise((r) => setTimeout(r, 500));
  yield {
    content: [
      { type: "text", text: "Initial response. And here's more content..." },
    ],
  };
}

aui.thread().resumeRun({
  parentId: "message-id",
  stream: createCustomStream,
});

A common pattern is to check whether the backend is still running on mount, then reconnect:

function useStreamReconnect(threadId: string) {
  const aui = useAui();
  const checkedRef = useRef(false);

  useEffect(() => {
    if (checkedRef.current) return;
    checkedRef.current = true;

    (async () => {
      const status = await fetch(`/api/status/${threadId}`).then((r) =>
        r.json(),
      );
      if (status.isRunning) {
        const parentId = aui.thread().getState().messages.at(-1)?.id ?? null;
        aui.thread().resumeRun({ parentId });
      }
    })();
  }, [aui, threadId]);
}

Adapters

Attachments, speech, feedback, history, and suggestions are wired through the standard adapter contracts, see adapters:

const runtime = useLocalRuntime(MyModelAdapter, {
  adapters: {
    attachments: myAttachmentAdapter,
    speech: mySpeechAdapter,
    feedback: myFeedbackAdapter,
    history: myHistoryAdapter,
    suggestion: mySuggestionAdapter,
  },
});

Multi-thread

LocalRuntime supports multi-thread either via AssistantCloud or via a custom RemoteThreadListAdapter. See threads for the contract and full examples.

// managed (see "AssistantCloud" in /docs/runtimes/concepts/threads for cloud setup)
const runtime = useLocalRuntime(MyModelAdapter, { cloud });

Integration examples

OpenAI

import { OpenAI } from "openai";

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

const OpenAIAdapter: ChatModelAdapter = {
  async *run({ messages, abortSignal, context }) {
    const stream = await openai.chat.completions.create({
      model: "gpt-4o",
      messages: messages.map((m) => ({
        role: m.role,
        content: m.content
          .filter((c) => c.type === "text")
          .map((c) => c.text)
          .join("\n"),
      })),
      stream: true,
      signal: abortSignal,
    });

    let fullText = "";
    for await (const chunk of stream) {
      const content = chunk.choices[0]?.delta?.content;
      if (content) {
        fullText += content;
        yield { content: [{ type: "text", text: fullText }] };
      }
    }
  },
};

Custom REST API

const CustomAPIAdapter: ChatModelAdapter = {
  async run({ messages, abortSignal, unstable_threadId }) {
    const response = await fetch("/api/chat", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({
        messages: messages.map((m) => ({
          role: m.role,
          content: m.content,
        })),
        threadId: unstable_threadId,
      }),
      signal: abortSignal,
    });
    if (!response.ok) throw new Error(`API error: ${response.statusText}`);
    const data = await response.json();
    return { content: [{ type: "text", text: data.message }] };
  },
};

Best practices

  1. Always pass abortSignal to fetch and SDK calls so cancel works:
    fetch(url, { signal: abortSignal });
  2. Handle errors gracefully. Swallow AbortError (it is the user cancelling); rethrow others to surface in the UI.
  3. Yield cumulative state, not deltas. Each yield replaces the previous content; if you yield deltas the UI flickers.
  4. Accumulate tool calls outside the streaming loop, otherwise they vanish on the first text-only chunk.

Troubleshooting

Messages not appearing. Ensure your adapter returns the correct shape: { content: [{ type: "text", text: "..." }] }.

Streaming not working. Use async *run (with the asterisk). A plain async run cannot yield.

Tool UI flickers and disappears. state is being reset between chunks. Accumulate tool calls in a Map declared outside the for await loop.

API reference

ChatModelAdapter

ChatModelAdapter
run: ChatModelRunOptions => ChatModelRunResult | AsyncGenerator<ChatModelRunResult>

Function that sends messages to your API and returns the response.

ChatModelRunOptions

ChatModelRunOptions
messages: readonly ThreadMessage[]

The conversation history to send to your API.

runConfig: RunConfig

Run configuration with optional custom metadata. RunConfig is { readonly custom?: Record<string, unknown> }.

abortSignal: AbortSignal

Signal to cancel the request if user interrupts.

context: ModelContext

Additional context including configuration and tools.

unstable_assistantMessageIdunstable?: string | Undefined

ID of the assistant message being generated. Useful for tracking or updating specific messages.

unstable_threadIdunstable?: string | Undefined

Current thread/conversation identifier. Useful for passing to your backend API.

unstable_parentIdunstable?: string | Null | Undefined

ID of the parent message this response is replying to. null if this is the first message.

unstable_getMessageunstable?: () => ThreadMessage

Returns the current assistant message being generated. Useful during streaming.

LocalRuntimeOptions

LocalRuntimeOptions
initialMessages?: readonly ThreadMessageLike[]

Pre-populate the thread with messages.

maxSteps: number= 2

Maximum number of sequential tool calls before requiring user input.

cloud?: AssistantCloud

Enable Assistant Cloud integration for multi-thread support and persistence.

adapters?: LocalRuntimeAdapters

Capability adapters. UI features automatically enable based on which adapters are provided. See /docs/runtimes/concepts/adapters.

unstable_humanToolNamesunstable?: string[]

Tool names that require human approval before execution (unstable).