Helix is the unified gateway for OpenAI, Anthropic, Google, Mistral and 20+ LLM providers. Smart routing, semantic caching, automatic fallback — through a single endpoint your team already knows.
// Drop-in for any OpenAI-compatible client. import { Helix } from "@helix/sdk"; const helix = new Helix({ apiKey: process.env.HELIX_KEY }); const reply = await helix.chat.complete({ model: "auto", // or "gpt-4o", "claude-opus-4", "gemini-2.5" messages: [{ role: "user", content: "Summarize Q3." }], fallback: ["claude-opus-4", "gpt-4o"], cache: true, }); // → routed via Anthropic. cached. logged. billed.
Every team using LLMs in production hits the same wall. New models ship weekly. Pricing changes monthly. One provider goes down and your app dies. Helix is the abstraction layer that should have shipped two years ago.
Helix isn't a wrapper. It's the production-grade infrastructure layer between your code and every model on the market.
Change one base URL. Keep your code. Helix speaks the OpenAI API spec natively, so existing clients in any language work unchanged.
No rewrite requiredLet Helix pick the right model per request. Optimize for cheapest, fastest, or highest-rated — configurable per endpoint, per customer, per call.
Auto mode availableWhen OpenAI 503s, Claude takes the request. When Claude rate-limits, Gemini does. Your users never know the difference.
Sub-second failoverHelix caches by meaning, not just exact match. Two prompts that mean the same thing? Same answer, no second bill. Cuts inference cost up to 60%.
Avg 38% cost reductionPer-model cost, latency, error rate, token usage, customer attribution. Everything your finance and SRE teams have been asking for.
Datadog + Grafana exportsDeployed across 280+ Cloudflare PoPs. Routing decisions happen closer to your user than the model itself. Less than 50ms overhead, globally.
Cloudflare WorkersHelix is designed to disappear into your existing codebase. If your LLM client supports a base URL, you're done.
Replace one URL. Done. Helix is wire-compatible with the OpenAI Chat Completions API.
Choose models. Set fallback chains. Define caching strategy. All optional — defaults just work.
Cost, latency, errors, savings — all live. Export to Datadog, Grafana, or your warehouse.
// Step 1 — Point your existing OpenAI client at Helix. import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.helix.dev/v1", apiKey: process.env.HELIX_KEY, }); // That's it. Your existing code now routes through Helix. const r = await client.chat.completions.create({ model: "gpt-4o", messages: [{ role: "user", content: "Hello" }], });
The observability layer your AI bill has been begging for. Drill into cost-per-customer, model performance, cache hit rates — in real time.
Free to start. Subscription tiers for serious teams. A 1.5% routing fee on enterprise volume. Three independent revenue streams baked into the product from day one.
baseURL on your existing client and everything keeps working. No new SDK to learn unless you want the advanced routing features.[claude-opus-4, gemini-2.5]). The whole failover happens in under 800ms. Your end users never see a 500.Spin up Helix in five minutes — or, if you're an acquirer, take the whole thing.