Vibe-Coding an AI Agents Usage Dashboard

February 27, 2026 · 10 min

pythonflaskdockerapiobservabilityai-tools

None of these AI coding agents — ChatGPT/Codex, Claude, Kimi Code, Z-AI — publish their usage limits via a documented API. If you want to know how close you are to a rate limit, you open each dashboard separately and squint at progress bars. I wanted one view for all of them.

So I vibe-coded AgentsUsageDashboard: a single web dashboard plus a Stream Deck+ plugin that shows real-time usage, reset countdowns, and plan info for all four providers at a glance.

The web dashboard showing all four agents with session/weekly usage bars, status indicators, and a 14-day Codex chart.

The hard part wasn’t building the dashboard — it was reverse-engineering the endpoints. Figuring out that Kimi’s scope must be an array not a string, that Z-AI timestamps are in milliseconds, that Claude’s org list can contain non-dict entries — each of these cost hours of debugging. This article compresses those hours into a few lines, so you don’t repeat them.

⚠️ Important: this setup depends on unofficial, reverse-engineered endpoints. Assume they can change at any time: paths, headers, auth flow, and response fields.

Quick architecture

Backend: Python/Flask polling every 5 minutes.
Session strategy: one logged-in Firefox in Docker (noVNC), then reuse cookies/localStorage.
Transport: curl_cffi with Firefox impersonation for browser-like TLS fingerprinting.
Frontend: one vanilla HTML/CSS/JS dashboard (session + weekly bars, status, countdowns, Codex 14-day chart).
Infra: two containers (Firefox + Dashboard) with shared firefox_data volume.
Hardware layer: Stream Deck+ plugin for glanceable real-time usage.

Docker Compose

The whole infra is two containers. Firefox runs a headless browser with noVNC for manual login. The dashboard mounts the same volume read-only and scrapes cookies from it.

services:
  firefox:
    image: jlesage/firefox:latest
    ports:
      - "5800:5800"
    volumes:
      - firefox_data:/config
    environment:
      - TZ=Europe/Warsaw
    restart: unless-stopped
    healthcheck:
      test: ["CMD-SHELL", "wget -q --spider http://localhost:5800 || exit 1"]
      interval: 30s
      timeout: 5s
      retries: 5
      start_period: 30s

  dashboard:
    image: agent-stats-dashboard:latest
    ports:
      - "8777:8777"
    volumes:
      - firefox_data:/firefox:ro
    environment:
      - TZ=Europe/Warsaw
      - REFRESH_INTERVAL=300
      - ZAI_API_KEY=${ZAI_API_KEY:-}
    depends_on:
      firefox:
        condition: service_healthy
    restart: unless-stopped

volumes:
  firefox_data:

Open localhost:5800, log into ChatGPT, Kimi, and Claude in Firefox, and the dashboard picks up the sessions automatically. Z-AI uses an API key via env var instead.

Reading Firefox cookies safely

Firefox locks its SQLite databases while running. The trick: copy the DB (plus WAL and SHM files) to /tmp, then query the copy.

def _copy_sqlite(src_path, tmp_name):
    """Copy SQLite DB + WAL + SHM to /tmp for safe reading."""
    tmp_dir = Path(f"/tmp/{tmp_name}")
    tmp_dir.mkdir(exist_ok=True)
    tmp_db = tmp_dir / src_path.name
    shutil.copy2(src_path, tmp_db)
    for suffix in ["-wal", "-shm"]:
        wal = src_path.parent / f"{src_path.name}{suffix}"
        if wal.exists():
            shutil.copy2(wal, tmp_dir / f"{src_path.name}{suffix}")
    return tmp_db

def _read_cookies(domain):
    profile = _find_profile()       # auto-detect jlesage or standard
    tmp_db = _copy_sqlite(profile / "cookies.sqlite", "cookie_read")
    conn = sqlite3.connect(str(tmp_db))
    cur = conn.execute(
        "SELECT name, value FROM moz_cookies WHERE host LIKE ?",
        (f"%{domain}%",),
    )
    cookies = cur.fetchall()
    conn.close()
    return cookies

For localStorage (needed by Z-AI fallback), Firefox 79+ stores per-origin SQLite databases in a different path:

{profile}/storage/default/https+++chat.z.ai/ls/data.sqlite

Table data, columns key and value (UTF-8 blob). Same copy-before-read pattern applies.

Auth: four providers, four strategies

Each provider needs different auth. I wrapped each in a fetch_*() adapter that returns the same normalized shape.

Codex — cookie to bearer exchange:

GET https://chatgpt.com/api/auth/session
Cookie: __Secure-next-auth.session-token=...

→ { "accessToken": "eyJhb..." }

Kimi — the kimi-auth cookie is already the bearer token, but the endpoint uses Connect protocol:

POST https://www.kimi.com/apiv2/kimi.gateway.billing.v1.BillingService/GetUsages
Headers: connect-protocol-version: 1, x-msh-platform: web
Body: { "scope": ["FEATURE_CODING"] }   ← must be an array, not string

Claude — full cookie string, with an org lookup step:

GET https://claude.ai/api/organizations         → find org with "chat" capability
GET https://claude.ai/api/organizations/{org_id}/usage
Headers: anthropic-client-platform: web_claude.ai

Z-AI — API key in id.secret format, wrapped in JWT:

def _zai_jwt(api_key: str) -> str:
    kid, secret = api_key.split(".", 1)
    now_ms = int(time.time() * 1000)
    payload = {
        "api_key": kid,
        "exp": now_ms + 3600 * 1000,
        "timestamp": now_ms,
    }
    return jwt.encode(payload, secret, algorithm="HS256",
                      headers={"alg": "HS256", "sign_type": "SIGN"})

All HTTP calls go through curl_cffi with impersonate="ff120" for TLS fingerprint matching — without it, Cloudflare blocks you.

Response mapping: raw to normalized

Each provider returns a different shape. The adapter’s job is to map it into the internal schema. Here’s Codex as an example — the raw response from /backend-api/wham/usage:

{
  "plan_type": "plus",
  "rate_limit": {
    "primary_window": {
      "used_percent": 42.0,
      "reset_after_seconds": 14400,
      "reset_at": 1772143992
    },
    "secondary_window": {
      "used_percent": 88.0,
      "reset_after_seconds": 68976,
      "reset_at": 1772206105
    }
  }
}

Gets normalized to:

{
  "status": "ok",
  "plan": "plus",
  "session": { "usage_pct": 42.0, "remaining_seconds": 14400 },
  "weekly":  { "usage_pct": 88.0, "remaining_seconds": 68976 },
  "error": null
}

primary_window → session, secondary_window → weekly. The field used_percent maps to usage_pct (Codex also sometimes returns usage_percent — support both). Timestamps are converted to remaining_seconds in the adapter so the frontend never thinks about time math.

Kimi — GetUsages returns a nested structure where all numeric values are strings:

{
  "usages": [{
    "detail": {
      "limit": "500",
      "used": "123",
      "remaining": "377",
      "resetTime": "2026-03-06T00:00:00Z"
    },
    "limits": [{
      "detail": {
        "limit": "30",
        "remaining": "28",
        "resetTime": "2026-02-27T14:35:00Z"
      }
    }]
  }]
}

usages[0].detail is the weekly quota (requests used out of plan limit). usages[0].limits[0].detail is the rate limit — a 5-minute sliding window. All values are strings, so cast with int(). The percentage is calculated: used / limit * 100. Plan name comes from a separate call to GetSubscription → subscription.goods.title (e.g. “Allegretto”).

Claude — the usage endpoint returns windows keyed by time span, plus per-model breakdowns:

{
  "five_hour": {
    "utilization": 35.2,
    "resets_at": "2026-02-27T19:30:00Z"
  },
  "seven_day": {
    "utilization": 62.1,
    "resets_at": "2026-03-06T14:30:00Z"
  },
  "seven_day_sonnet": { "utilization": 45.0, "resets_at": "..." },
  "seven_day_opus":   { "utilization": 12.0, "resets_at": "..." }
}

five_hour → session, seven_day → weekly. The field utilization maps to usage_pct, resets_at is ISO-8601. The catch: any of these fields can be None instead of a dict. Not missing — present but None. So usage.get("five_hour", {}) still blows up because you get None, not a missing key. You need an explicit guard: val if isinstance(val, dict) else {}.

Z-AI — returns an array of limit objects with different type/unit combinations:

{
  "success": true,
  "data": {
    "level": "premium",
    "limits": [
      {
        "type": "TOKENS_LIMIT",
        "unit": 3,
        "percentage": 42.0,
        "nextResetTime": 1772143992000
      },
      {
        "type": "TOKENS_LIMIT",
        "unit": 6,
        "percentage": 88.0,
        "nextResetTime": 1772606105000
      },
      {
        "type": "TIME_LIMIT",
        "percentage": 15.0,
        "nextResetTime": 1772137200000
      }
    ]
  }
}

Decode by type + unit: TOKENS_LIMIT with unit=3 is the 5-hour session window, unit=6 is weekly. TIME_LIMIT is an hourly request cap (use as session fallback). The percentage field is the usage percent directly (0–100) — don’t calculate it from usage/limit. And nextResetTime is Unix milliseconds, not seconds — divide by 1000 before converting.

Every provider maps into the same { status, plan, session, weekly, error } shape — different source fields, same output.

The `/api/data` contract

The backend caches all normalized results and exposes them at GET /api/data:

{
  "codex": {
    "status": "ok",
    "plan": "plus",
    "session": { "usage_pct": 42.0, "remaining_seconds": 14400 },
    "weekly":  { "usage_pct": 88.0, "remaining_seconds": 68976 },
    "error": null,
    "last_success": "2026-02-27T14:30:00+00:00"
  },
  "kimi":   { "status": "ok", "session": { "..." }, "weekly": { "..." } },
  "claude": { "status": "ok", "session": { "..." }, "weekly": { "..." } },
  "zai":    { "status": "stale", "error": "timeout", "..." },
  "last_fetch": "2026-02-27T14:30:00+00:00",
  "next_refresh_at": "2026-02-27T14:35:00+00:00"
}

Status can be ok, error, offline, or stale (previous data available but last fetch failed). The frontend uses next_refresh_at to schedule its polling — no fixed interval, it syncs with the backend cycle.

Polling and cache

The backend runs a daemon thread that fetches all providers sequentially every 5 minutes. Thread-safe cache with two locks: one for read/write (_lock), one to prevent overlapping fetches (_fetch_lock).

REFRESH_INTERVAL = int(os.environ.get("REFRESH_INTERVAL", "300"))

def _do_fetch():
    if not _fetch_lock.acquire(blocking=False):
        return   # skip if already running
    try:
        results = {}
        for name, fetcher in FETCHERS:
            try:
                results[name] = fetcher()
            except Exception as e:
                # graceful degradation: keep stale data
                prev = _cache.get(name)
                if prev and prev.get("last_success"):
                    results[name] = {**prev, "status": "stale", "error": str(e)}
                else:
                    results[name] = {"status": "error", "error": str(e)}

        results["last_fetch"] = datetime.now(timezone.utc).isoformat()
        results["next_refresh_at"] = (
            datetime.now(timezone.utc) + timedelta(seconds=REFRESH_INTERVAL)
        ).isoformat()

        with _lock:
            _cache.update(results)
    finally:
        _fetch_lock.release()

On error, if there’s previous successful data it degrades to stale instead of disappearing. The frontend shows a status dot so you always know what’s live and what’s cached.

Stream Deck+ plugin

I also built a dedicated Stream Deck+ plugin so I can glance at usage without switching windows. Each of the four encoders (touch dials) shows one agent with two progress bars and color-coded status.

Stream Deck+ with all four AI agent usage monitors on the touch strip — Z.AI, Kimi, Codex, and Claude showing session and weekly bars.

Architecture: Node.js plugin using @elgato/streamdeck SDK. It polls the same /api/data endpoint, so the dashboard backend is the single source of truth.

Three views per encoder, cycled by rotating the dial:

Default — agent name, session bar, weekly bar, plan badge
Session detail — large percentage, bar, reset countdown
Weekly detail — same layout, weekly data

The layout is a custom JSON definition for the touch strip:

{
  "id": "agent-default",
  "items": [
    { "key": "agent-icon",    "type": "pixmap", "rect": [4, 2, 20, 20] },
    { "key": "agent-name",    "type": "text",   "rect": [28, 0, 120, 22] },
    { "key": "status-dot",    "type": "pixmap", "rect": [180, 4, 16, 16] },
    { "key": "session-label", "type": "text",   "rect": [4, 26, 58, 20] },
    { "key": "session-bar",   "type": "bar",    "rect": [66, 30, 130, 10] },
    { "key": "weekly-label",  "type": "text",   "rect": [4, 52, 58, 20] },
    { "key": "weekly-bar",    "type": "bar",    "rect": [66, 56, 130, 10] }
  ]
}

Color thresholds match the web dashboard: green below 40%, amber 40-70%, red above 70%. Push to refresh, touch to open the web dashboard.

Advice so you don’t repeat my mistakes

Normalize provider data early Define one internal schema (usage_pct, remaining_seconds, plan) and map each provider into it. Don’t let provider-specific shapes leak into UI.
Treat auth as provider-specific adapters Each provider has a completely different auth flow (see above). Keep them isolated — when one breaks, the others keep working.
Parse defensively, always Some fields are missing or None (especially Claude). Guard dict access and add sane fallbacks.
Expect timestamp/unit mismatches Some resets are seconds, some milliseconds (Z-AI). Convert once in adapter layer.
Respect browser storage reality Firefox 79+ uses LSNG per-origin SQLite files, not only legacy webappsstore.sqlite.
Avoid SQLite lock pain If Firefox writes while you read, WAL locks happen. Copy DB files to /tmp first, then query copies.
Keep the vibe, add guardrails Vibe coding is great for momentum, but for unstable APIs you still need adapter boundaries, retries, and structured logs. Fast iteration + defensive engineering is the sweet spot.

Endpoint gotchas worth knowing

Codex: usage fields may be used_percent or usage_percent; support both.
Kimi: scope for usage must be an array (["FEATURE_CODING"]), not a string.
Kimi: GetUsages and GetSubscription live under different service packages (billing.v1 vs order.v1).
Claude: find the org with "chat" capability, cache org UUID, and guard non-dict records.
Z-AI: nextResetTime is in milliseconds; token/auth errors come as success: false + 401.
Z-AI: limit types decode as TOKENS_LIMIT unit 3 = 5-hour session, unit 6 = weekly.

If you build this, design for drift: version adapters per provider, keep raw response logs, and assume tomorrow’s payload won’t be today’s payload.

The whole project took a day of vibe-coding, but most of that time was spent on reverse-engineering — intercepting requests in DevTools, guessing header combinations, decoding undocumented error formats. The actual dashboard code was fast once I knew what to call and how.

That’s the real value of this article: not the code (you can write your own), but the map. The endpoints, the auth quirks, the field name inconsistencies, the millisecond-vs-second traps — that’s what costs hours. Now you have it in one place.

The full source code — backend, frontend, Docker setup, and Stream Deck+ plugin — is on GitHub: AgentsUsageDashboard.