Product 2026-05-23 14 min read

We shipped the first observability API designed for AI agents — here is what changed

Most observability APIs were designed for a human clicking through a dashboard, then handed to AI agents as a courtesy. The retries got wrapped in an SDK. The auth got buried under an OAuth dance. The errors arrived as 500-page release notes. None of that ports to an agent. So we rebuilt the surface for the case where the caller is a Claude or GPT tool call that has 300 ms to make a decision.

By 24observe team · 2026-05-23

Table of Contents ▼

The reframe — what an agent needs that a dashboard does not

For about fifteen years, the standard observability vendor pitch was: "Look at our dashboard." That is the product. The API existed, sure, but it was a side door — useful for a one-off CSV export, an internal Grafana panel, maybe a Slack bot that paged the on-call. Everyone understood the API was a partial mirror of the UI, and the real product was the screen.

Then 2024 happened. By mid-2025, every infrastructure team we talked to had at least one running experiment with an LLM agent doing operational work — triage, runbook execution, on-call deflection, "is this incident worth waking someone up at 3am" classification. The serious teams had production pipelines. The very serious teams were starting to question whether they still needed all five of the observability dashboards they were paying for.

The honest thing we noticed: when you point an agent at most observability APIs, the agent gets confused. Not because the endpoints are wrong — they work. Because the API was designed for a caller who reads release notes, watches a screen for the loading spinner to finish, and intuits that "401" means their token expired. None of that is true for an agent.

Below is what we changed, concretely, when we sat down to design for the agent caller instead of the human one. None of it is about machine-learning. All of it is about contracts that are honest with non-human readers.

1. Rate-limit headers your agent can actually parse

Every major API documents rate limits somewhere — usually three layers down in the developer docs, described in prose like "100 requests per minute per IP". A human reads that sentence, understands it, and writes code that exponential-backs-off on a 429. An agent doesn't read the docs. It reads response headers.

We expose three rate-limit buckets directly in headers so an agent never has to guess what budget it has left:

Per-IP limit — the standard X-RateLimit-Limit / Remaining / Reset trio. This is the DoS layer; an agent calling from a stable IP rarely brushes against it.
Per-PAT mutation cap — X-PAT-Mut-Limit / Remaining / Reset. When you mint a token you can attach an optional daily mutation budget. Every mutating response shows the agent how many of its tokens remain that day, plus a unix timestamp for when the counter resets.
Per-PAT log-bytes cap — X-PAT-LogBytes-Limit / Used / Remaining / Reset. Same idea, but for the /logs/ingest endpoint, in bytes per day.

The reset value is always a unix timestamp, never a humanized "in 4 hours". An agent reads it, compares it to the current time, decides whether to slow down or to keep pushing. Nothing to parse.

On a 429 response we additionally send the standard Retry-After header, valued in seconds until the next reset. Any HTTP client library that natively understands Retry-After — and most do — will back off correctly with zero extra code on your side.

The why: agents are far more aggressive than humans about retrying. Without proactive budget visibility, an agent stuck in a retry loop can burn through a day's budget in twenty seconds. The headers turn that silent failure mode into a contract.

2. Event webhooks — the end of polling loops

The most common shape of an early agent integration is: while True: poll the incidents endpoint, check if anything new opened, take action. This works. It also burns API budget, adds delay proportional to your polling interval, and never quite tells you whether the agent saw an event because you can't tell "the event hasn't fired yet" from "the agent's last call was 30 seconds ago and the new event arrived 5 seconds ago."

We replaced that pattern with a Stripe-style event webhook system. You register a URL with one POST to /api/v1/webhook-subscriptions, listing the events you care about (incident.opened, incident.acknowledged, incident.resolved, with more coming). The moment an event fires anywhere in our system — a monitor flipping to down, a user acknowledging from the dashboard, a recovery auto-resolving — we POST a signed JSON envelope to your URL.

Three properties matter for the agent case:

Signed payloads. Every delivery carries X-24Observe-Timestamp and X-24Observe-Signature, the latter a Stripe-format HMAC-SHA256 over <timestamp>.<raw-body> using your org's webhook secret. Your endpoint verifies the signature in constant time and rejects anything that doesn't match. No replays, no spoofs, no caller IP allow-listing required.
Automatic retries with bounded blast radius. If your endpoint returns non-2xx or times out, we retry five times with exponential backoff and 30% jitter. After ten consecutive failures across events, we auto-disable the subscription and tell you why — so a deployment that breaks your webhook endpoint doesn't drain our queue or wake the on-call at 4 a.m.
A delivery log you can read. GET /api/v1/webhook-subscriptions/:id/deliveries returns the last fifty attempts with the status code we got, the first kilobyte of the response body we received, and the error message on failure. When your agent says "I never got the event," you can prove or disprove that in one HTTP call.

The result: the agent no longer has to poll. It registers a webhook at startup, sits waiting for HTTP POSTs, and reacts in the milliseconds between an event firing and the receiver returning 200. Polling still works — we kept all the read endpoints intact — but the polling pattern stops being the only way to know something happened.

3. Tool definitions for every agent framework

Every modern agent framework consumes function or tool definitions in a slightly different shape. OpenAI wants { type: "function", function: { name, description, parameters } }. Anthropic wants { name, description, input_schema }. LangChain wants the tuple plus the explicit HTTP method and path. Every integrator we talked to had hand-rolled a converter from our OpenAPI spec, half of them had bugs, all of them had to be re-run every time we added an endpoint.

We pre-converted the spec and publish it at three stable URLs:

GET /openapi/openai-tools.json — OpenAI function-calling shape, ready to pass to chat.completions.create({ tools: ... }).
GET /openapi/anthropic-tools.json — Anthropic tool-use shape, ready for messages.create({ tools: ... }).
GET /openapi/langchain-tools.json — LangChain StructuredTool plus method + path, ready for any framework that takes the flat tuple.

There is also a GET /openapi/tools-index.json discovery doc listing all formats and the API base URL, so an agent integration can ask "what formats do you publish?" first and pick the right one without hard-coding the path.

The OpenAPI spec stays the source of truth — we generate the tool formats from it on every deploy. When we add an endpoint, it shows up in all four documents automatically with the same schema. No documentation drift, no manual sync.

And, for the record — we don't publish an SDK in any language, and we don't plan to. The spec is the SDK. Anything an SDK would wrap is one HTTP call, one bearer token, and a JSON body.

4. PAT scopes that name the action, not the resource

The original Personal Access Token shape was the one almost every B2B SaaS ships: one token, all permissions, lives until you revoke it. That works for humans because a human is unlikely to leak the token and is even less likely to have it picked up by another agent in a coordinated attack. For agent integrations, the default of "all powers all the time" is unacceptable. A single token exfiltration takes out the whole org.

We split the token into a list of scopes, named by action rather than resource:

monitors:read, monitors:write — list and create monitors.
incidents:read, incidents:write — read incident history, acknowledge, resolve.
status-pages:read, status-pages:write — public-page management.
logs:read, logs:write — search and ingest log events.
audit:read — read the audit log (useful for the agent to know what the agent did).
tokens:write — mint and revoke further tokens.
maintenance:write — create and clear maintenance windows.
webhooks:read, webhooks:write — manage event subscriptions.
secrets:read — fetch the encrypted alert URLs back out (owner/admin only).
* — the original behaviour, for backward compatibility with existing tokens.

Pair scopes with the daily mutation cap from §1 and the blast radius of a leaked token shrinks to "what this particular agent was supposed to do, for the rest of today, until UTC midnight." A bot that only reads incidents can do nothing else. A bot that ingests logs cannot search them. The agent describes its job; the token enforces it.

One more agent-relevant detail: every PAT is attributed in the audit log. When a human asks "what did the agent actually do?" the answer is a SQL-style query against GET /api/v1/audit-logs?actor_pat_id=<id>. Two clicks to a complete history.

5. Idempotency-Key on every POST

The first time an agent retried a mutation because of a 504 and accidentally created two monitors, we knew the standard request shape wasn't enough. Humans retry conservatively. Agents retry aggressively, often before the response of the prior attempt has finished serializing across the wire.

Every mutating endpoint accepts the standard Idempotency-Key header. Send the same request twice with the same key, get the same response twice — never a duplicate row, never a double-page, never a duplicated incident open. The window is 24 hours; pick any opaque string up to 255 bytes as your key.

The implementation: we hash the request body together with the key and store the canonical response for replay. A retry that arrives with the same key and a different body returns 409 IDEMPOTENCY_KEY_REPLAY_CONFLICT — the framework caught you. A retry with the same key and the same body returns the cached response, byte-for-byte, with the same status code and the same JSON envelope.

Agent-side, the pattern that works: generate a UUID at the start of the operation, include it as the Idempotency-Key on every retry of that operation, regardless of which network layer failed. The agent's retry loop on a 5xx never costs you a duplicated record.

6. Errors with a code, not just a message

HTTP status codes are too coarse for branching. 403 Forbidden could mean five different things to your agent, and the message is for humans. We attach a stable, machine-readable code string to every 4xx response so the agent's error-handling logic can branch on the specific failure, not parse English.

The codes you'll see most often:

PAT_SCOPE_INSUFFICIENT — the token doesn't have the right scope. Mint a new one with the right scope; don't retry.
PAT_DAILY_LIMIT_EXCEEDED — the per-token daily cap is exhausted. Wait until reset (the header tells you when) or page a human.
PLAN_LIMIT_LOGS_VOLUME — the org's plan-level monthly log byte cap is exhausted. This isn't a per-PAT problem; you need to either upgrade or stop ingesting.
IDEMPOTENCY_KEY_REPLAY_CONFLICT — same key, different body. Your retry loop is buggy.
WEBHOOK_URL_UNSAFE — the URL you tried to register resolves to a private network, loopback, or link-local address. SSRF blocked at write time.
MONITOR_TARGET_UNSAFE — same SSRF guard, applied to monitor URLs.

The full list is documented in the API reference. The contract: codes are stable across deploys; we will never repurpose an existing code, only add new ones. An agent that branches on PAT_DAILY_LIMIT_EXCEEDED today will still match it next year.

What we still don't ship

We promised honesty about gaps. Here is the list, current as of this post:

APM / application performance traces. If your agent needs span-level latency data per request, you need a tracer like Datadog APM or Honeycomb. We index logs and emit metrics on uptime and response time, not full traces.
RUM / real-user monitoring. Same story — we're a synthetic + log shop today, not a browser SDK shop. Sentry, LogRocket, or Datadog RUM remain the right call.
Infrastructure agents. No cgroup-level CPU/memory/disk metrics. Prometheus + node_exporter are still the standard for that, and they're free.
ML-based anomaly detection on metrics. Our alerting is rule-based — threshold, consecutive failures, SLO breach. If you want "this morning's request rate looks unusual," you want Datadog Watchdog or New Relic Applied Intelligence, neither of which we replace.
SIEM-grade log retention. We retain 30 days by default; longer retention is on the roadmap for the enterprise plan. Splunk and Datadog Cloud SIEM are still the right call for compliance workloads.

None of these are roadmap items we're embarrassed about. They're product positions we made on purpose: we cover the observability problems that are simple enough to bill predictably, and we leave the problems that require a sales conversation to the vendors who do that well. If your agent needs anything in this list, the integrations above remain the right choice and we'd genuinely rather you use them.

How to start in the first hour

Concrete sequence for getting your agent on the API in under an hour, assuming you already have an account:

Mint a scoped PAT: POST /api/v1/me/tokens with the scopes your agent will use and an appropriate dailyMutationLimit. Save the token somewhere your agent can read; we don't store the plaintext.
Pull the tool-format file for your framework: curl https://api.24observe.com/openapi/anthropic-tools.json (or openai-tools / langchain-tools). Pass the array directly into your agent's tool configuration.
Register a webhook subscription if your agent should react to events: POST /api/v1/webhook-subscriptions { url, eventTypes }. Fetch the signing secret from GET /api/v1/me/webhook-secret and store it for signature verification.
Wire up signature verification on your receiver. The webhooks docs have a copy-paste verification function in Node, Python, and Go.
Test the end-to-end loop: POST /api/v1/monitors/:id/test-alert to fire an artificial incident, watch it land on your webhook receiver, watch your agent react. You can clean up by deleting the test monitor.
Wire up rate-limit-header parsing: read X-PAT-Mut-Remaining from every mutating response and slow down at 10% remaining. This is the difference between an agent that runs all day and an agent that runs for ninety seconds.

The whole loop fits in one terminal window. If anything in this list is slower than it sounds, tell us: /contact/ goes to a real person.

We don't think this is the final shape of an observability API for AI agents. Event types will grow as our system grows; rate-limit headers will probably expand to cover per-org quotas; the tool formats will track upstream changes in OpenAI, Anthropic, and LangChain conventions. What we're confident in is the principle: the human-clicking-a-dashboard caller is no longer the default. Designing for that assumption produces APIs that an agent has to fight against.

If you're building an agent against an observability surface and find a pattern that doesn't translate to ours, send us the failure mode. The point of building this in the open is to fix things faster than a roadmap meeting can.

All the surfaces described above are live today. Full docs at /docs/api-for-agents/; interactive reference at /docs/openapi/; account creation at login.24observe.com/register.