# Resumable Stream Deployment
URL: /docs/guides/resumable-stream-deployment

Production hardening for resumable streams. Authorization, serverless lifetimes, TTLs, key isolation, observability, resource limits, and incident response.

This guide assumes you have the basic wiring from [Resumable Streams](/docs/guides/resumable-streams) in place and focuses on what to add before serving production traffic.

## Authentication and authorization \[#authentication-and-authorization]

The default resume endpoint serves any caller that knows the `streamId`. Treat the id as opaque, not as a credential. Bind every newly created `streamId` to the requesting user at acquire time and verify the binding on every resume.

Store the binding next to the rest of your session state, or in Redis under a separate key. The example below uses a parallel `<keyPrefix>:owner:<streamId>` entry that mirrors the TTL of the underlying stream.

```ts title="/lib/resumable-context.ts"
import { createResumableStreamContext } from "assistant-stream/resumable";
import { redis } from "@/lib/redis";
import { store } from "@/lib/resumable-store";

const OWNER_PREFIX = "aui:resumable:owner";
const OWNER_TTL_SEC = 24 * 60 * 60;

export const resumableContext = createResumableStreamContext({ store });

export async function bindStreamToUser(streamId: string, userId: string) {
  await redis.set(`${OWNER_PREFIX}:${streamId}`, userId, { EX: OWNER_TTL_SEC });
}

export async function assertStreamOwner(streamId: string, userId: string) {
  const owner = await redis.get(`${OWNER_PREFIX}:${streamId}`);
  if (owner !== userId) {
    throw new Response("Not Found", { status: 404 });
  }
}
```

Wrap `resume` with the ownership check. Returning 404 (not 403) avoids confirming the existence of a stream the caller does not own.

```ts title="/app/api/chat/resume/[streamId]/route.ts"
import { assertStreamOwner, resumableContext } from "@/lib/resumable-context";
import { getSessionUserId } from "@/lib/auth";

export async function GET(
  req: Request,
  ctx: { params: Promise<{ streamId: string }> },
) {
  const userId = await getSessionUserId(req);
  if (!userId) return new Response("Unauthorized", { status: 401 });

  const { streamId } = await ctx.params;
  await assertStreamOwner(streamId, userId);

  const stream = await resumableContext.resume(streamId);
  if (!stream) return new Response("Not Found", { status: 404 });

  return new Response(stream, {
    headers: { "Content-Type": "text/event-stream" },
  });
}
```

## `waitUntil` on serverless \[#waituntil-on-serverless]

On Vercel and Cloudflare the request handler is torn down once the response is returned, taking the producer task with it. Without a `waitUntil` hook the persisted stream stops growing the moment the originating request unwinds, so reconnects only see the bytes that happened to land before the response flushed.

On Vercel, pass `after` from `next/server`:

```ts title="/lib/resumable-context.ts"
import { after } from "next/server";
import { createResumableStreamContext } from "assistant-stream/resumable";
import { store } from "@/lib/resumable-store";

export const resumableContext = createResumableStreamContext({
  store,
  waitUntil: after,
});
```

On Cloudflare Workers, take the `ExecutionContext` from your handler and forward `ctx.waitUntil`:

```ts title="/src/worker.ts"
import { createResumableStreamContext } from "assistant-stream/resumable";
import { store } from "./resumable-store";

export default {
  async fetch(req: Request, env: Env, ctx: ExecutionContext) {
    const resumableContext = createResumableStreamContext({
      store,
      waitUntil: (promise) => ctx.waitUntil(promise),
    });
    return handle(req, resumableContext);
  },
};
```

In long-lived Node servers (a custom Express app, a container) `waitUntil` can be omitted; the producer task runs on the same event loop as the handler and is not preempted.

## TTL strategy \[#ttl-strategy]

Streams expire 24 hours after the last write. The default suits typical chat workloads where a user might reload after lunch, but every deployment should pick a number deliberately.

* Shorten when chunks contain sensitive payloads (PII, drafts, internal documents). A 5 to 30 minute window usually covers reload survival without leaving recoverable bytes around.
* Extend for long-running agent tasks that may legitimately stretch past a day. Set the TTL above the worst-case task duration so the producer can still finalize.
* Match TTLs across layers. The store TTL, the owner-binding TTL, and any signed cookie that references `streamId` should expire together; otherwise one outlives the other and either leaks or 404s unexpectedly.

Configure on the store for the global default and on the context for a per-deployment override:

```ts
import {
  createInMemoryResumableStreamStore,
  createResumableStreamContext,
} from "assistant-stream/resumable";

const store = createInMemoryResumableStreamStore({
  defaultTtlMs: 30 * 60 * 1000,
});

export const resumableContext = createResumableStreamContext({
  store,
  ttlMs: 30 * 60 * 1000,
});
```

The Redis adapters accept the same `defaultTtlMs` option.

## Multi-tenant key isolation \[#multi-tenant-key-isolation]

When multiple apps or tenants share a Redis instance, set `keyPrefix` per environment so a misconfigured stream in one tenant cannot collide with, or be deleted alongside, another's. The prefix becomes the leading segment of every meta and data key.

```ts title="/lib/resumable-store.ts"
import { createClient } from "redis";
import { createRedisResumableStreamStore } from "assistant-stream/resumable/redis";

const client = createClient({ url: process.env.REDIS_URL });
await client.connect();

export const store = createRedisResumableStreamStore(client, {
  keyPrefix: `aui:${process.env.APP_NAME}:${process.env.TENANT_ID}`,
});
```

Per-tenant prefixes also make incident response cheaper. A `SCAN MATCH aui:app:tenant-42:*` lets you audit or purge a single tenant without touching the rest.

## Observability hooks \[#observability-hooks]

`ResumableStreamContextOptions` exposes lifecycle hooks for structured logging, metrics, and tracing. Each hook is invoked synchronously around the underlying store call; throwing inside a hook surfaces as a producer error.

```ts title="/lib/resumable-context.ts"
import { createResumableStreamContext } from "assistant-stream/resumable";
import { logger, metrics } from "@/lib/observability";
import { store } from "@/lib/resumable-store";

export const resumableContext = createResumableStreamContext({
  store,
  onAcquire: (streamId, role) => {
    metrics.increment("resumable.acquire", { role });
    logger.info("resumable.acquire", { streamId, role });
  },
  onAppend: (streamId, byteLength) => {
    metrics.histogram("resumable.append.bytes", byteLength);
  },
  onFinalize: (streamId, status, error) => {
    metrics.increment("resumable.finalize", { status });
    logger.info("resumable.finalize", { streamId, status, error });
  },
  onError: (streamId, error) => {
    const message = error instanceof Error ? error.message : String(error);
    logger.error("resumable.error", { streamId, error: message });
  },
});
```

Keep hook bodies cheap. They run on the producer's hot path and any latency they add becomes streaming latency seen by the client.

## Resource limits \[#resource-limits]

The in-memory store enforces three caps that the Redis adapters intentionally leave to the underlying database. Set them whenever your process can be reached by untrusted callers.

```ts
import { createInMemoryResumableStreamStore } from "assistant-stream/resumable";

const store = createInMemoryResumableStreamStore({
  maxChunkBytes: 64 * 1024,
  maxEntriesPerStream: 5000,
  maxStreams: 10_000,
});
```

* `maxChunkBytes` rejects oversized writes from a misbehaving producer (a runaway tool result, a base64 blob accidentally piped through). The producer task fails fast instead of pinning memory.
* `maxEntriesPerStream` caps the per-stream entry count. This bounds how much any single stream can grow before it starts erroring; pair it with TTLs so finalized streams clear quickly.
* `maxStreams` caps total live streams. Useful as a backstop in shared development environments and in single-tenant containers; in serverless deployments the platform already constrains concurrency.

These limits exist on the in-memory store. For Redis, configure `maxmemory` and an eviction policy on the server, and rely on application-level rate limiting upstream.

## Incident response \[#incident-response]

The streamId leaks through response headers, browser session storage, server access logs, and (in some setups) error reports. If you suspect any of those channels were compromised, treat all in-flight stream ids as exposed.

What to log up front, so you have it when you need it:

* The acquiring user id, request id, and IP for every `acquire` call (via `onAcquire`).
* The finalize status (and any error) for every stream (via `onFinalize`).
* The owner-binding writes and reads, with the user id and the streamId.

What to rotate or invalidate during an incident:

* Bump `keyPrefix` on the store. Existing streams become unreachable and new ones land under the rotated namespace.
* Invalidate signed session cookies that reference any cached streamId.
* Drop the owner-binding keys for affected users (`DEL aui:resumable:owner:*` scoped by user) so resumes are forced through a fresh acquire.
* Shorten `defaultTtlMs` temporarily so any orphaned stream rolls off quickly.

After rotation, reissue stream ids server-side and redirect clients through a fresh acquire; do not trust any streamId the client already holds.