Custom Backend

LocalRuntime

Quickest path to a working chat. Handles state while you handle the API.

LocalRuntime is the simplest way to connect a custom backend. You implement a single ChatModelAdapter (one run function) and the runtime handles everything else: messages, threads, branching, editing, regeneration, cancellation.

State lives inside the runtime by default. Multi-thread persistence and shared adapters are added via the standard interfaces, see adapters and threads.

When to use it

Pick LocalRuntime when:

  • You want assistant-ui to manage chat state for you.
  • Your backend exposes a function-call shaped API (REST, OpenAI SDK, your own model client).
  • Branching, editing, and regeneration should work without you writing extra code.
  • You want to compose adapters (attachments, speech, feedback, history, suggestions).

If you already keep messages in redux, zustand, tanstack-query, or another store, use ExternalStoreRuntime instead.

Quickstart

Create a project

npx create-next-app@latest my-app
cd my-app

Install dependencies

npm install @assistant-ui/react

Add the Thread component

npx assistant-ui@latest add thread

Define a MyRuntimeProvider

Replace the MyModelAdapter body with your backend call.

app/MyRuntimeProvider.tsx
"use client";

import type { ReactNode } from "react";
import {
  AssistantRuntimeProvider,
  useLocalRuntime,
  type ChatModelAdapter,
} from "@assistant-ui/react";

const MyModelAdapter: ChatModelAdapter = {
  async run({ messages, abortSignal }) {
    const result = await fetch("<YOUR_API_ENDPOINT>", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({ messages }),
      signal: abortSignal,
    });
    const data = await result.json();
    return {
      content: [{ type: "text", text: data.text }],
    };
  },
};

export function MyRuntimeProvider({
  children,
}: Readonly<{ children: ReactNode }>) {
  const runtime = useLocalRuntime(MyModelAdapter);
  return (
    <AssistantRuntimeProvider runtime={runtime}>
      {children}
    </AssistantRuntimeProvider>
  );
}

Wrap your app

app/layout.tsx
import type { ReactNode } from "react";
import { MyRuntimeProvider } from "@/app/MyRuntimeProvider";

export default function RootLayout({ children }: { children: ReactNode }) {
  return (
    <MyRuntimeProvider>
      <html lang="en">
        <body>{children}</body>
      </html>
    </MyRuntimeProvider>
  );
}

Render the Thread

app/page.tsx
import { Thread } from "@/components/assistant-ui/thread";

export default function Page() {
  return <Thread />;
}

Streaming responses

Declare run as an async * generator and yield the full cumulative content on each iteration:

import {
  ChatModelAdapter,
  ThreadMessage,
  type ModelContext,
} from "@assistant-ui/react";
import { OpenAI } from "openai";

const openai = new OpenAI();

const MyModelAdapter: ChatModelAdapter = {
  async *run({ messages, abortSignal, context }) {
    const stream = await openai.chat.completions.create({
      model: "gpt-5.4-mini",
      messages: convertToOpenAIMessages(messages),
      stream: true,
      signal: abortSignal,
    });

    let text = "";
    for await (const part of stream) {
      text += part.choices[0]?.delta?.content || "";
      yield {
        content: [{ type: "text", text }],
      };
    }
  },
};

Each yield replaces the previous content. Yield the full state every time, not deltas.

Streaming with tool calls

Accumulate tool calls in a Map outside the streaming loop so they persist across chunks:

async *run({ messages, abortSignal, context }) {
  const stream = await openai.chat.completions.create({
    model: "gpt-5.4-mini",
    messages: convertToOpenAIMessages(messages),
    tools: context.tools,
    stream: true,
    signal: abortSignal,
  });

  let text = "";
  const toolCallsMap = new Map();

  for await (const chunk of stream) {
    text += chunk.choices[0]?.delta?.content ?? "";

    for (const toolCall of chunk.choices[0]?.delta?.tool_calls ?? []) {
      toolCallsMap.set(toolCall.id, {
        type: "tool-call",
        toolName: toolCall.function?.name,
        toolCallId: toolCall.id,
        args: JSON.parse(toolCall.function?.arguments ?? "{}"),
      });
    }

    yield {
      content: [
        ...(text ? [{ type: "text" as const, text }] : []),
        ...Array.from(toolCallsMap.values()),
      ],
    };
  }
}

If you build the content array fresh from the current chunk each iteration, tool calls from earlier chunks will disappear when a later chunk carries only text. The Map outside the loop is the fix.

Tool calling

LocalRuntime supports OpenAI-compatible function calling. Register tools through useAui so the runtime exposes them to your adapter via context.tools:

app/toolkit.tsx
"use generative";

import { defineToolkit } from "@assistant-ui/react";
import { z } from "zod";

export default defineToolkit({
  getWeather: {
    description: "Get the current weather in a location",
    parameters: z.object({
      location: z.string(),
      unit: z.enum(["celsius", "fahrenheit"]).default("celsius"),
    }),
    execute: async ({ location, unit }) => {
      "use client";
      return fetchWeather(location, unit);
    },
    renderText: { running: "Checking the weather…", complete: "Weather ready" },
  },
});
app/MyRuntimeProvider.tsx
import { useAui, Tools } from "@assistant-ui/react";
import toolkit from "./toolkit";

function MyRuntimeProvider({ children }: { children: React.ReactNode }) {
  const runtime = useLocalRuntime(MyModelAdapter);
  const aui = useAui({ tools: Tools({ toolkit }) });

  return (
    <AssistantRuntimeProvider aui={aui} runtime={runtime}>
      {children}
    </AssistantRuntimeProvider>
  );
}

See the tools guide for advanced patterns.

Human-in-the-loop tools

Tools listed in unstable_humanToolNames are not executed by code. The run pauses on the tool call and the user supplies the result through the tool UI:

const runtime = useLocalRuntime(MyModelAdapter, {
  unstable_humanToolNames: ["send_email"],
});

The pause is driven by the message status your adapter returns; LocalRuntime never sets it for you. When the model requests a human tool call, end the run with status: { type: "requires-action", reason: "tool-calls" }. Without that status, the runtime marks the message complete and nothing waits. Only return it while a listed tool call is missing its result: unresolved tool calls that are not listed do not hold the run, so the runtime would invoke your adapter again immediately.

const MyModelAdapter: ChatModelAdapter = {
  async run({ messages, abortSignal, unstable_getMessage }) {
    const toolResults = unstable_getMessage().content.flatMap((part) =>
      part.type === "tool-call" && part.result !== undefined
        ? [{ toolCallId: part.toolCallId, result: part.result }]
        : [],
    );

    const result = await fetch("<YOUR_API_ENDPOINT>", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({ messages, toolResults }),
      signal: abortSignal,
    });
    const data = await result.json();

    if (data.toolCall) {
      return {
        content: [
          {
            type: "tool-call",
            toolCallId: data.toolCall.id,
            toolName: data.toolCall.name,
            args: data.toolCall.args,
            argsText: JSON.stringify(data.toolCall.args),
          },
        ],
        status: { type: "requires-action", reason: "tool-calls" },
      };
    }

    return { content: [{ type: "text", text: data.text }] };
  },
};

The full loop:

  1. The run pauses. While a listed tool call has no result, the runtime stops invoking your adapter. The unresolved tool call part reports status.type === "requires-action" to its renderer.
  2. The user responds. The tool UI completes the call with addResult(...). The stock ToolFallback component handles this out of the box: in the requires-action state it shows Allow and Deny buttons that record the decision as the tool result.
  3. The run resumes. Once every listed tool call has a result, the runtime invokes your adapter again. The resumed call receives the same messages array as before (it ends at the user message; the in-progress assistant message is not part of it), so read the recorded results from unstable_getMessage().content as shown above. Content returned by the resumed call is appended to the same assistant message.

For a custom confirmation UI, register a human tool whose render completes the call with addResult. The shape of the result payload is yours to define; the adapter receives it verbatim and translates it for your backend:

const toolkit = defineToolkit({
  send_email: {
    type: "human",
    description: "Send an email after the user confirms",
    parameters: z.object({ to: z.string(), subject: z.string() }),
    render: ({ args, result, addResult }) => {
      if (result) {
        return <p>{result.approved ? "Sent" : `Cancelled: ${result.reason}`}</p>;
      }
      return (
        <div>
          <p>
            Send "{args.subject}" to {args.to}?
          </p>
          <button onClick={() => addResult({ approved: true })}>Allow</button>
          <button
            onClick={() =>
              addResult({ approved: false, reason: "User declined" })
            }
          >
            Deny
          </button>
        </div>
      );
    },
  },
});

unstable_humanToolNames is unstable; see stability.

Approval gates

The server-side approval gate is also supported on LocalRuntime, for actions your backend executes after the user authorizes them. Where a human tool asks the user to supply the tool result, an approval gate asks the user to allow or block an action the adapter performs. Emit approval: { id } on the tool call part and end the run with the same requires-action status:

return {
  content: [
    {
      type: "tool-call",
      toolCallId: data.toolCall.id,
      toolName: data.toolCall.name,
      args: data.toolCall.args,
      argsText: JSON.stringify(data.toolCall.args),
      approval: { id: data.toolCall.id },
    },
  ],
  status: { type: "requires-action", reason: "tool-calls" },
};

A tool call with a pending approval pauses the run, whether or not the tool is listed in unstable_humanToolNames. Always emit gates in the pending state (approval: { id } with no approved field); a part that arrives already decided is treated as resolved and the runtime invokes the adapter again immediately. The stock ToolFallback Allow and Deny buttons, or a custom renderer calling respondToApproval({ approved, reason? }), record the decision:

  • Deny sets approval.approved: false and synthesizes an error result ({ error: reason || "Tool approval denied" } with isError: true), so the model sees the denial.
  • Allow sets approval.approved: true and leaves the result empty; performing the action is your adapter's job.

Once every pending approval on the message is decided and every listed human tool has a result, the runtime invokes your adapter again. Read the decisions from unstable_getMessage().content, perform the approved actions, and return the follow-up response. A tool call that carries an approval is owned by the gate: it does not additionally require a result, even when its name is listed in unstable_humanToolNames.

Resuming a run

resumeRun reconnects to an in-progress assistant run. Useful for page refresh, network reconnect, tab backgrounding, or thread switching when the backend is still generating.

Unlike startRun (which uses the ChatModelAdapter), resumeRun requires a stream parameter; you provide the async generator that produces the response.

import { useAui } from "@assistant-ui/react";
import type { ChatModelRunResult } from "@assistant-ui/core";

const aui = useAui();

async function* createCustomStream(): AsyncGenerator<ChatModelRunResult> {
  yield { content: [{ type: "text", text: "Initial response" }] };
  await new Promise((r) => setTimeout(r, 500));
  yield {
    content: [
      { type: "text", text: "Initial response. And here's more content..." },
    ],
  };
}

aui.thread().resumeRun({
  parentId: "message-id",
  stream: createCustomStream,
});

A common pattern is to check whether the backend is still running on mount, then reconnect:

function useStreamReconnect(threadId: string) {
  const aui = useAui();
  const checkedRef = useRef(false);

  useEffect(() => {
    if (checkedRef.current) return;
    checkedRef.current = true;

    (async () => {
      const status = await fetch(`/api/status/${threadId}`).then((r) =>
        r.json(),
      );
      if (status.isRunning) {
        const parentId = aui.thread().getState().messages.at(-1)?.id ?? null;
        aui.thread().resumeRun({ parentId });
      }
    })();
  }, [aui, threadId]);
}

Queueing messages during a run

Set unstable_enableMessageQueue to keep the composer usable while a run is in progress. A message sent during a run is held in composer.queue and sent once the run settles; steering a queued message runs it next.

const runtime = useLocalRuntime(MyModelAdapter, {
  unstable_enableMessageQueue: true,
});

Render the pending messages with ComposerPrimitive.Queue and QueueItemPrimitive.

Adapters

Attachments, speech, feedback, history, and suggestions are wired through the standard adapter contracts, see adapters:

const runtime = useLocalRuntime(MyModelAdapter, {
  adapters: {
    attachments: myAttachmentAdapter,
    speech: mySpeechAdapter,
    feedback: myFeedbackAdapter,
    history: myHistoryAdapter,
    suggestion: mySuggestionAdapter,
  },
});

Multi-thread

LocalRuntime supports multi-thread either via AssistantCloud or via a custom RemoteThreadListAdapter. See threads for the contract and full examples.

// managed (see "AssistantCloud" in /docs/runtimes/concepts/threads for cloud setup)
const runtime = useLocalRuntime(MyModelAdapter, { cloud });

Integration examples

OpenAI

import { OpenAI } from "openai";

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

const OpenAIAdapter: ChatModelAdapter = {
  async *run({ messages, abortSignal, context }) {
    const stream = await openai.chat.completions.create({
      model: "gpt-5.4-mini",
      messages: messages.map((m) => ({
        role: m.role,
        content: m.content
          .filter((c) => c.type === "text")
          .map((c) => c.text)
          .join("\n"),
      })),
      stream: true,
      signal: abortSignal,
    });

    let fullText = "";
    for await (const chunk of stream) {
      const content = chunk.choices[0]?.delta?.content;
      if (content) {
        fullText += content;
        yield { content: [{ type: "text", text: fullText }] };
      }
    }
  },
};

Custom REST API

const CustomAPIAdapter: ChatModelAdapter = {
  async run({ messages, abortSignal, unstable_threadId }) {
    const response = await fetch("/api/chat", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({
        messages: messages.map((m) => ({
          role: m.role,
          content: m.content,
        })),
        threadId: unstable_threadId,
      }),
      signal: abortSignal,
    });
    if (!response.ok) throw new Error(`API error: ${response.statusText}`);
    const data = await response.json();
    return { content: [{ type: "text", text: data.message }] };
  },
};

Best practices

  1. Always pass abortSignal to fetch and SDK calls so cancel works:
    fetch(url, { signal: abortSignal });
  2. Handle errors gracefully. Swallow AbortError (it is the user cancelling); rethrow others to surface in the UI.
  3. Yield cumulative state, not deltas. Each yield replaces the previous content; if you yield deltas the UI flickers.
  4. Accumulate tool calls outside the streaming loop, otherwise they vanish on the first text-only chunk.

Troubleshooting

Messages not appearing. Ensure your adapter returns the correct shape: { content: [{ type: "text", text: "..." }] }.

Streaming not working. Use async *run (with the asterisk). A plain async run cannot yield.

Tool UI flickers and disappears. state is being reset between chunks. Accumulate tool calls in a Map declared outside the for await loop.

API reference

ChatModelAdapter

ChatModelAdapter
run : ChatModelRunOptions => ChatModelRunResult | AsyncGenerator<ChatModelRunResult>

Function that sends messages to your API and returns the response.

ChatModelRunOptions

ChatModelRunOptions
messages : readonly ThreadMessage[]

The conversation history to send to your API.

runConfig : RunConfig

Run configuration with optional custom metadata. RunConfig is { readonly custom?: Record<string, unknown> }.

abortSignal : AbortSignal

Signal to cancel the request if user interrupts.

context : ModelContext

Additional context including configuration and tools.

unstable_assistantMessageIdunstable ?: string | Undefined

ID of the assistant message being generated. Useful for tracking or updating specific messages.

unstable_threadIdunstable ?: string | Undefined

Current thread/conversation identifier. Useful for passing to your backend API.

unstable_parentIdunstable ?: string | Null | Undefined

ID of the parent message this response is replying to. null if this is the first message.

unstable_getMessageunstable ?: () => ThreadMessage

Returns the current assistant message being generated. Useful during streaming.

LocalRuntimeOptions

LocalRuntimeOptions
initialMessages ?: readonly ThreadMessageLike[]

Pre-populate the thread with messages.

maxSteps : number = 2

Maximum number of sequential tool calls before requiring user input.

cloud ?: AssistantCloud

Enable Assistant Cloud integration for multi-thread support and persistence.

adapters ?: LocalRuntimeAdapters

Capability adapters. UI features automatically enable based on which adapters are provided. See /docs/runtimes/concepts/adapters.

unstable_humanToolNamesunstable ?: string[]

Tool names that pause the run until the user supplies a result via addResult (unstable).