Model Gateway

Your app can call TellWang's private vision-language model for summaries, search, chat, classification, and image understanding. Every plan includes a model allowance; generated images and SVG assets use prepaid credits.

Built-in model (shipped)

The gateway has one model: private. It accepts text and OpenAI-compatible image_url content blocks. Existing OpenAI-compatible libraries can call it:

ai.ts

import OpenAI from "openai";
const client = new OpenAI({
  baseURL: "https://tellwang.com/v1/llm/v1",
  apiKey: process.env.TELLWANG_LLM_KEY, // minted via POST /v1/llm-keys
});
const reply = await client.chat.completions.create({
  model: "private",
  messages: [{ role: "user", content: "Summarise this ticket…" }],
});

Free grants are sized by your plan; top-ups go through the same Stripe credit ledger the cp uses for everything else. See Account, plans & model gateway for the full API.

Model discovery (shipped)

SDKs that call client.models.list() work out of the box. The endpoint returns the OpenAI-spec shape {object:"list", data:[{id, object:"model", created, owned_by}]} with one entry: private.

models.sh

curl https://tellwang.com/v1/llm/v1/models \
  -H "Authorization: Bearer $SLLM_KEY"

Streaming responses (shipped)

For chat UIs that need a live typing effect — or long completions where waiting for the full response is bad UX — pass stream: true. The endpoint emits Server-Sent Events (SSE) data: {…} chunks per OpenAI's wire format. Token metering happens after the stream closes, off the final usage chunk; the gateway automatically forces stream_options.include_usage = true so accurate billing is preserved.

stream.ts

const stream = await client.chat.completions.create({
  model: "private",
  messages: [{ role: "user", content: "Write a 200-word brief…" }],
  stream: true,
});
for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || "");
}

The gateway pipes the upstream's SSE chunks through verbatim (Content-Type: text/event-stream, X-Accel-Buffering: no to suppress edge buffering). If a client disconnects mid-stream the remaining tokens still get debited from the meter for whatever was already delivered — matches the convention used by the upstream providers themselves.

Visual generation (shipped)

The same gateway also generates build-time assets for a Wok: raster images through Flux, and website-ready SVG/vector assets through Recraft. Wang uses this for logos, icons, favicons, hero art, and brand-consistent illustrations; apps can call the same endpoint from an edge function when they need a user-facing workflow.

svg-asset.sh

curl https://tellwang.com/v1/llm/v1/images/generations \
  -H "Authorization: Bearer $SLLM_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model":"recraft-svg",
    "prompt":"brand-consistent SVG icon set for a premium dental clinic, navy linework, warm gold accent",
    "style_id":"optional-recraft-style-uuid",
    "num_images":3,
    "image_size":"1024x1024"
  }'

recraft-svg is backed by Recraft's vector model and returns SVG assets when the Recraft upstream key is configured. Pass a Recraft style_id for a learned brand style, or put palette, line style, typography, and motif constraints directly in the prompt. The response includes cost_cents and image URLs; when a Wok bucket is supplied, outputs are stored under that Wok so the site can reference stable URLs.