Nothing below is on a roadmap. Nothing is gated behind a "talk to sales" tier. Every check type, every alert channel, every integration — open the docs, mint a token, ship it.
Verify status codes, measure response time, flag slowdowns before full outages. Pass custom request headers (encrypted at rest) for endpoints that require auth, expect a specific status, and treat slow responses as degraded — paged before they go fully down.
Reachable-or-not checks for databases, message queues, anything that speaks TCP. Configurable timeouts. Fast.
Get warned 7 days before a certificate expires. Validates the full chain, not just "did the handshake complete." SNI-correct.
Classic reachability. Cross-platform. Fast.
Does the page still say "Order placed" — or is the 200 just a generic landing page? Catches silent content regressions that status codes miss.
Like TCP, but when the URL format does not include the port. Same anti-abuse defenses, same timeouts.
Inverted check — your job pings us when it succeeds. If a heartbeat does not arrive within your configured interval plus grace, we open an incident. Auto-resolved when the next heartbeat lands.
Checks are the easy part. What separates a tool from a service is what happens in the next sixty seconds — who gets paged, what your customers see, and whether your Sunday morning stays quiet.
Subscribe to incident.opened, incident.acknowledged, incident.resolved, monitor.status_changed, and log_alert.fired. We POST a signed JSON envelope to your URL — HMAC-SHA256 with your org's shared secret. Auto-retry up to 5 times with exponential backoff. Auto-disable after 10 consecutive failures so a broken receiver never wakes our on-call. Full per-subscription delivery log: status, HTTP code, response excerpt, payload bytes — visible via API and dashboard.
No SDK to keep in step. Pre-converted tool definitions at /openapi/openai-tools.json, /openapi/anthropic-tools.json, /openapi/langchain-tools.json — drop into your agent framework. Every authenticated response carries X-PAT-Mut-Limit / Remaining / Reset so an agent can read its budget without burning a mutation. Every mutating endpoint accepts Idempotency-Key; same key + different body returns 409 IDEMPOTENCY_KEY_REPLAY_CONFLICT so a buggy retry loop surfaces fast.
Mint a token with a narrow scope list: monitors:read, monitors:write, incidents:write, logs:write, webhooks:write, audit:read, and 8 others. Pair with dailyMutationLimit + dailyLogBytesLimit to bound the blast radius of a leaked token to a day. Audit log records every PAT-attributed mutation, so a human can answer "what did the agent actually do?" with one SQL-style query.
Public, slugged, optionally on your custom domain (status.yourco.com). Group monitors into components. Custom logo + accent color. Atom feed for incident subscribers. Optional password-protect for internal pages.
Auto-opened when a monitor crosses its threshold. Acknowledge to stop paging without resolving. Post updates ("we have identified the cause") that show on your public status page. Close it when fixed.
Schedule downtime by monitor or org-wide. Checks still run; alerts pause. Uptime math respects the window.
Email, generic webhook (signed), Slack, Discord, Telegram, Microsoft Teams, PagerDuty, Opsgenie. Per-channel test. Per-monitor consecutive-failure threshold so a single blip never pages you.
Set an uptime target (default 99.9%) and a rolling window (default 30 days). Get a green/red badge in your dashboard. Get notified when you drift below target.
Download all your monitor configs at any time. Migrate in or out without filing a support ticket.
A status badge image you can drop into your README. Live-updates from your monitor's actual state.
Logs are easy. Knowing which line out of a million is the one that woke up your customer is the hard part. Eight things we do once your events land — every one of them on every plan, none of them gated behind an "intelligence" SKU.
Lambda via CloudWatch + Kinesis Firehose. Heroku in one `heroku drains:add`. Vercel projects in one paste. Docker daemons via the syslog driver. systemd via journald. OpenTelemetry SDKs in any language. Vector and Fluent Bit pipelines. Or just curl JSON straight to a URL. Twelve sources, one ingest pipe, one set of quotas.
Set Content-Encoding: gzip, deflate, or br and ship a compressed body — the API decompresses natively. Measured 22× smaller for gzip and 38× for brotli on a 400-event pino batch. For a customer pushing 50 GB/day of logs, that is roughly $130/month off the AWS egress line item. Vector, Fluent Bit, and the OpenTelemetry collector each enable it with one config line; the bundled observe24-collector ships with gzip on by default.
No SPL to learn. No proprietary DSL. Plain substring works for the 80% case. KQL-lite (`level:error AND service:checkout`) covers the rest. Auto-extracted facets appear in the side rail — every primitive JSON field becomes a click-to-filter chip without you declaring a schema. Cursor pagination. 5-second hard timeout on every query.
Server-Sent Events. New matching events stream the moment they land. Same filters as search. 30-minute connection cap so a forgotten browser tab cannot leak forever. Works from curl (header auth) and from any modern browser (bind-cookie auth, no PAT in the URL).
Click "Patterns" on the Logs page. Each row is a normalized template ("ERROR DB query took <NUM>ms for user_id=<NUM> trace=<HEX>") with a count, a real sample, and the dominant level. Numbers, UUIDs, IPs, hex blobs, timestamps, and quoted strings all collapse. Compute-on-read against ClickHouse — no background job to fall behind, no extra table to bloat the bill.
Recurring stack traces auto-group by signature: normalized header + 1-3 frames, FNV-1a hashed. JavaScript, Python tracebacks, Java/JVM, Go panic, generic ERROR/FATAL. Same logical error across different deploys, paths, line numbers — same row. First-seen, last-seen, total count. Resolve when fixed. Ignore known noise. The thing Sentry charges a separate seat for, bundled.
Threshold alerts work when you know the magic number. Anomaly alerts work when you do not. Compares the current window against a rolling baseline (default 7 days, 3× spike) and fires through email, Slack, Discord, Teams, Telegram, or any HMAC-signed webhook. No ML configuration, no model tuning, no false fires on tiny baselines (built-in event-count floor). PagerDuty / Opsgenie / SMS / voice for log alerts is on the roadmap; today they route only from monitor checks.
Turn any log query into a chartable metric: count of matches per bucket (60s / 5m / 15m / 30m / 1h). Auto-refreshes every 60 seconds on the dashboard. Sparkline + last bucket + peak + window total. Alertable like a regular monitor. Datadog charges 10¢ per metric per month for this; here it is free.
Monitor alerts route through email, Slack, Discord, Microsoft Teams, Telegram, PagerDuty, Opsgenie, SMS, voice, or any HTTPS endpoint with HMAC-SHA256 signing — ten channels. Log alerts route through the first six today; the four pager / SMS / voice destinations are on the roadmap. Per-channel test. One incident per fire, never a hundred. Latch logic prevents re-paging on the same condition. No "Enterprise" tier locks anything behind a quote.
A SIEM where a detection is just another rule — same incidents, same cases, same API as your uptime checks. Threat-intel and enrichment run inline at ingest, so the signal is already on the event before a rule looks at it. Every item below ships today; every one is on every plan.
Nine packs — access, exfiltration, secret exposure, web-attacks, reliability, threat-intel, AI-agent security, MCP traffic, and platform self-monitoring — each rule carrying its MITRE ATT&CK technique (T1110, T1048, T1552 …). Brute-force, password spray, MFA-disabled, SQLi / XSS / traversal, secret-in-logs, prompt injection, runaway tool loops, MCP resource and protocol abuse. Enable a pack with one call; tune the queries + thresholds to your own log shapes.
Same query language as log search: `message:"failed login"` over a 5-minute window with a threshold, or an anomaly rule that fires at 3× a rolling baseline. A detection is just a log alert with a severity + ATT&CK tag — it opens an incident through the exact same pipeline, alert routing, and webhooks as a failed health check.
The signals a single stream can't express. Sequence rules (ClickHouse windowFunnel) catch "many failed logins THEN a success" for one user. Cardinality rules (uniqExact) catch "one source IP, ten distinct accounts" — password spray, account enumeration. One query per rule, every minute, joined on any column or attr.
Every public source IP is checked the moment an event lands — known-bad indicators, Tor exit nodes, DNSBL listings, VPN and datacenter ranges. Bring your own IOCs (IP, domain, hash), per-org and private, or ride the built-in feeds. A DNSBL or Tor verdict stamps `ioc_match`, so a one-line rule (`attrs.ioc_match:true`) fires.
Source IPs resolve to country, region, city, ASN, and org inline at ingest. A directory of identities (risk) and assets (criticality) you maintain stamps `identity_risk` and `asset_criticality` onto matching events — so "an error on a critical asset from a high-risk country" is a plain search (`asset_criticality:critical AND geo_country:RU`), not a join across three tools.
Common shapes — OpenSSH auth, nginx / apache access — are parsed into Elastic-Common-Schema-style fields (`ecs_event_outcome`, `ecs_source_ip`, `ecs_http_status` …) at ingest, so detections and search behave the same regardless of how the upstream log was formatted.
Group incidents into one investigation: status (open → investigating → contained → closed), severity, assignee, a notes timeline, and a true-/false-positive / benign disposition. Attach incidents across detections and correlations. The analyst workspace — org-scoped and audited like everything else.
A forward-cursor NDJSON endpoint streams your audit trail gap-free and dup-free, so Splunk, Datadog, Loki, or your own SIEM polls it on a cron. Signed event webhooks (`log_alert.fired`, `correlation_rule.fired`) route every detection to SOAR. The whole security layer is on every plan — no "security tier", no per-GB intelligence meter.
No "pro tier" gating what should be default.