Helicone

Log and monitor LLM calls by routing them through the Helicone proxy.

Helicone is an LLM observability proxy. Point your provider client at Helicone's URL, add an auth header, and every request and response is recorded with cost, latency, and prompt-level diffs.

Helicone is independent of which assistant-ui runtime you use. It slots in at the LLM-client layer (OpenAI SDK, AI SDK provider, or any HTTP-based provider client) on the server, not at the runtime layer in the browser. So it pairs with any backend: AI SDK, LangGraph, Mastra, custom.

How it works

your server  ──►  Helicone proxy  ──►  OpenAI / Anthropic / etc.
                  │
                  └─ logs request, response, tokens, cost

Calls pass through Helicone's edge before reaching the upstream provider. The proxy is transparent: response shape and streaming behavior are unchanged, you just gain a dashboard of every call.

Setup

Get a Helicone API key

.env.local

HELICONE_API_KEY=sk-helicone-...
OPENAI_API_KEY=sk-...

Point the provider client at the proxy

Swap the provider's baseURL for Helicone's proxy URL and add the Helicone-Auth header.

app/api/chat/route.ts

import { createOpenAI } from "@ai-sdk/openai";
import { streamText, convertToModelMessages } from "ai";
import type { UIMessage } from "ai";

const openai = createOpenAI({
  baseURL: "https://oai.helicone.ai/v1",
  headers: {
    "Helicone-Auth": `Bearer ${process.env.HELICONE_API_KEY}`,
  },
});

export async function POST(req: Request) {
  const { messages }: { messages: UIMessage[] } = await req.json();
  const result = streamText({
    model: openai("gpt-4o"),
    messages: await convertToModelMessages(messages),
  });
  return result.toUIMessageStreamResponse();
}

app/api/chat/route.ts

import OpenAI from "openai";

const openai = new OpenAI({
  baseURL: "https://oai.helicone.ai/v1",
  defaultHeaders: {
    "Helicone-Auth": `Bearer ${process.env.HELICONE_API_KEY}`,
  },
});

export async function POST(req: Request) {
  const { messages } = await req.json();
  const stream = await openai.chat.completions.create({
    model: "gpt-4o",
    messages,
    stream: true,
  });
  return new Response(stream.toReadableStream());
}

If you're calling the OpenAI SDK directly, you'll also need to adapt the response into a stream useChatRuntime understands. The AI SDK tab above handles this for you.

Verify in the dashboard

Send a message through your assistant. The call should appear in the Helicone dashboard within a few seconds, with token counts, latency, and the full prompt and completion text.

If nothing appears, check the request in your network tab. The host should be oai.helicone.ai, not api.openai.com; the request should carry Helicone-Auth (added explicitly above) and Authorization (added automatically by the OpenAI client from OPENAI_API_KEY).

Notes

Server-side only. Never set the Helicone key in browser code; the proxy receives your provider key and must run server-side.
Other providers. For Anthropic, Gemini, and others, swap the base URL: see Helicone's provider docs.
Custom metadata. Add Helicone-User-Id, Helicone-Property-*, or session headers per request to filter and aggregate in the dashboard. The headers are in the request headers reference.
Streaming, tools, attachments all keep working unchanged because Helicone wraps the underlying provider transparently.

Pick a runtimeChoose a backend integration; Helicone pairs with any of them.AI SDK runtimeThe most common pairing: AI SDK route handler proxied through Helicone.