Log and monitor LLM calls by routing them through the Helicone proxy.
Helicone is an LLM observability proxy. Point your provider client at Helicone's URL, add an auth header, and every request and response is recorded with cost, latency, and prompt-level diffs.
Helicone is independent of which assistant-ui runtime you use. It slots in at the LLM-client layer (OpenAI SDK, AI SDK provider, or any HTTP-based provider client) on the server, not at the runtime layer in the browser. So it pairs with any backend: AI SDK, LangGraph, Mastra, custom.
How it works
your server ──► Helicone proxy ──► OpenAI / Anthropic / etc.
│
└─ logs request, response, tokens, costCalls pass through Helicone's edge before reaching the upstream provider. The proxy is transparent: response shape and streaming behavior are unchanged, you just gain a dashboard of every call.
Setup
Get a Helicone API key
Sign up at helicone.ai and copy the key from the dashboard. Add it to your environment alongside the provider key:
HELICONE_API_KEY=sk-helicone-...
OPENAI_API_KEY=sk-...Point the provider client at the proxy
Swap the provider's baseURL for Helicone's proxy URL and add the Helicone-Auth header.
import { createOpenAI } from "@ai-sdk/openai";
import { streamText, convertToModelMessages } from "ai";
import type { UIMessage } from "ai";
const openai = createOpenAI({
baseURL: "https://oai.helicone.ai/v1",
headers: {
"Helicone-Auth": `Bearer ${process.env.HELICONE_API_KEY}`,
},
});
export async function POST(req: Request) {
const { messages }: { messages: UIMessage[] } = await req.json();
const result = streamText({
model: openai("gpt-4o"),
messages: await convertToModelMessages(messages),
});
return result.toUIMessageStreamResponse();
}import OpenAI from "openai";
const openai = new OpenAI({
baseURL: "https://oai.helicone.ai/v1",
defaultHeaders: {
"Helicone-Auth": `Bearer ${process.env.HELICONE_API_KEY}`,
},
});
export async function POST(req: Request) {
const { messages } = await req.json();
const stream = await openai.chat.completions.create({
model: "gpt-4o",
messages,
stream: true,
});
return new Response(stream.toReadableStream());
}If you're calling the OpenAI SDK directly, you'll also need to adapt the response into a stream useChatRuntime understands. The AI SDK tab above handles this for you.
Verify in the dashboard
Send a message through your assistant. The call should appear in the Helicone dashboard within a few seconds, with token counts, latency, and the full prompt and completion text.
If nothing appears, check the request in your network tab. The host should be oai.helicone.ai, not api.openai.com; the request should carry Helicone-Auth (added explicitly above) and Authorization (added automatically by the OpenAI client from OPENAI_API_KEY).
Notes
- Server-side only. Never set the Helicone key in browser code; the proxy receives your provider key and must run server-side.
- Other providers. For Anthropic, Gemini, and others, swap the base URL: see Helicone's provider docs.
- Custom metadata. Add
Helicone-User-Id,Helicone-Property-*, or session headers per request to filter and aggregate in the dashboard. The headers are in the request headers reference. - Streaming, tools, attachments all keep working unchanged because Helicone wraps the underlying provider transparently.