Model Gateway
Your app can call a built-in AI model — for summaries, search, chat, or classification. Every plan includes a free allowance; top up with card or crypto when you need more, or bring your own provider key.
Built-in model (shipped)
The default model is DeepSeek (deepseek-chat, deepseek-reasoner). The endpoint is OpenAI-compatible — every existing AI library works against it:
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://tellwang.com/v1/llm/v1",
apiKey: process.env.TELLWANG_LLM_KEY, // minted via POST /v1/llm-keys
});
const reply = await client.chat.completions.create({
model: "deepseek-chat",
messages: [{ role: "user", content: "Summarise this ticket…" }],
});Free grants are sized by your plan; top-ups go through the same Stripe credit ledger the cp uses for everything else. See Account, plans & model gateway for the full API.
Model discovery (shipped)
SDKs that call client.models.list() work out of the box. The endpoint returns the OpenAI-spec shape {object:"list", data:[{id, object:"model", created, owned_by}]}. Operator-default orgs see deepseek-chat and deepseek-reasoner; BYOK orgs see a single byok:<provider> pseudo-entry that names their upstream (the actual model name passes through verbatim — any model your upstream accepts works).
curl https://tellwang.com/v1/llm/v1/models \
-H "Authorization: Bearer $SLLM_KEY"Streaming responses (shipped)
For chat UIs that need a live typing effect — or long completions where waiting for the full response is bad UX — pass stream: true. The endpoint emits Server-Sent Events (SSE) data: {…} chunks per OpenAI's wire format. Token metering happens after the stream closes, off the final usage chunk; the gateway automatically forces stream_options.include_usage = true so accurate billing is preserved.
const stream = await client.chat.completions.create({
model: "deepseek-chat",
messages: [{ role: "user", content: "Write a 200-word brief…" }],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || "");
}The gateway pipes the upstream's SSE chunks through verbatim (Content-Type: text/event-stream, X-Accel-Buffering: no to suppress edge buffering). If a client disconnects mid-stream the remaining tokens still get debited from the meter for whatever was already delivered — matches the convention used by the upstream providers themselves.
BYOK — bring your own key (shipped)
If you already have an OpenAI / Anthropic / Groq / Mistral / Together account, register your key once per org. The gateway forwards your /chat/completions calls to your upstream with your key — your account is billed directly, and TellWang skips token metering. The gateway-key surface stays the same, so your app code doesn't change.
curl -X PUT https://tellwang.com/v1/orgs/$ORG_SLUG/llm-byok \
-H "Authorization: Bearer $TELLWANG_KEY" \
-H "Content-Type: application/json" \
-d '{"provider":"openai","api_key":"sk-..."}'Supported providers (auto-default base_url): openai, anthropic, groq, mistral, together, deepseek. For anything else, pass your own base_url (must be https://). Your key is encrypted at rest under the same envelope as your Wok secrets; GET returns a fingerprint, never the plaintext. DELETE reverts you to the metered built-in. Responses from BYOK-routed calls carry an X-LLM-Source: byok:<provider> header so you can tell which upstream answered.
Roadmap
- Per-key BYOK — assign different upstreams to different sllm_ keys within one org (today one BYOK per org).