title: "Synthetic uptime checks" description: "Run HTTP, TCP, and TLS-expiry probes from your own servers — multi-region monitoring without paying for someone else's probe network." last_updated: "2026-05-24"
Synthetic uptime checks Pro+
Synthetic uptime checks are user-defined probes — HTTP requests, TCP connections, TLS certificate expiry — that BoxWatch runs from your servers. Most uptime services probe from their own cloud-hosted network. BoxWatch dispatches checks to the agents you've already installed.
That choice has two practical consequences:
- You can probe services that aren't reachable from the public internet. Verify Redis on
10.0.0.5:6379fromweb-01. Verify a private status endpoint behind your VPC's firewall. No tunneling, no exposing the service. - You're already paying for the servers. There's no per-check billing. Your fleet's geographic distribution becomes your monitoring topology for free.
The trade-off: probe locations are wherever your servers happen to be. If you want a probe from "Tokyo," you need a server in Tokyo.
Check types
Three kinds in v1:
- HTTP —
curla URL, check status code and (optionally) the response body. - TCP — Open a TCP connection to
host:portand verify it succeeds within a timeout. - TLS expiry — Connect via TLS to
host:portand report when the cert expires.
There's no DNS resolution check in v1.
Adding a check
In the dashboard, go to Uptime → New check. Pick a type, target, and the servers it should run on. Save.
Or use the API:
curl -X POST https://api.boxwatch.app/uptime-checks \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "Main site",
"check_type": "http",
"target": "https://example.com/health",
"expected_status_codes": "200-299",
"max_latency_ms": 2000,
"body_contains": "OK",
"follow_redirects": 1,
"timeout_seconds": 10,
"probe_server_ids": [42, 43, 44]
}'A check must have at least one probe server. The request body's probe_server_ids is the list of servers (by ID) that will run the probe. Up to 100 entries per check.
Probe servers and multi-vantage
When you assign a check to N servers, it runs N times per cycle — once on each. Each server reports its own result independently. The API aggregates the results into a single check status.
Aggregation logic:
up— every probe is OK.degraded— at least one probe is down, but not a strict majority. The dashboard shows yellow; no alert fires.down— strict majority of probes are down (more than half). After two consecutivedownaggregations, an alert fires.
The two-tick flap guard is intentional. A single failed probe doesn't wake you up — a sustained majority outage does.
Schedule
Checks run on the agent's heartbeat cadence. That means the effective probe interval is your plan's push interval:
- Hobby — every 60 minutes
- Pro / Team — every 5 minutes
- Scale — every minute
There's no sub-tick scheduling in v1. If you assign a check to a Scale server and three Pro servers, the Scale server probes once a minute and the Pro servers probe every five minutes. The aggregate uses whatever the latest result from each probe is.
Alert types
Each toggle is per-check.
Down
Fires after two consecutive aggregated down observations. Default: on. Clears silently on the first non-down aggregate. (Set alert_on_recovery to also send a recovery notification.)
Recovery
Sends a "back up" notification on the first non-down aggregate after a down alert was sent. Default: off.
TLS cert expiring
For tls_expiry and HTTPS http checks, fires when the cert is within tls_warn_days_before_expiry days of expiring (default 14). Fires once when the threshold is first crossed; doesn't re-alert until the cert is renewed and re-crosses back.
HTTP options
The richest check type. Optional fields, all validated:
| Field | Type | Default | Notes |
|---|---|---|---|
expected_status_codes | string | "200-299" | Comma-separated list or ranges. e.g. "200,201" or "200-299,301". Codes must be 100-599. |
max_latency_ms | int | unset | If set, response slower than this counts as a failed probe (error_kind: latency_high). 1-60000. |
body_contains | string | unset | Literal-string match (grep -F), no regex. 1-500 chars. Failed match → error_kind: body_mismatch. |
follow_redirects | 0/1 | 1 | When 0, a 301/302 counts according to the status-code rule. |
timeout_seconds | int | 10 | 1-60. |
Custom request headers and request bodies for POST checks are on the roadmap. v1 uses GET with default headers.
TCP and TLS-expiry examples
TCP
curl -X POST https://api.boxwatch.app/uptime-checks \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "Redis from web tier",
"check_type": "tcp",
"target": "10.0.0.5:6379",
"timeout_seconds": 5,
"probe_server_ids": [42]
}'A TCP check is purely "did the connect succeed within the timeout?" No payload exchange.
TLS expiry
curl -X POST https://api.boxwatch.app/uptime-checks \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "example.com cert",
"check_type": "tls_expiry",
"target": "example.com:443",
"tls_warn_days_before_expiry": 21,
"probe_server_ids": [42]
}'The agent connects via TLS, reads the certificate, and reports cert_days_left. The "probe failed" state is reserved for connection failures — a cert that's still valid but close to expiring is a separate alert.
Plan limits
- All plans Hobby — 20 uptime checks
- Pro+ Pro — 100 uptime checks
- Team+ Team — unlimited
- Scale only Scale — unlimited
The cap is account-wide (not per-server). A single check that probes from 20 servers counts as one check against the cap.
API reference
List all checks with denormalized current status, probe count, and plan cap.
Create a check. probe_server_ids is required (array, 1-100 entries).
Detail view. Returns the check, the list of probe servers, the latest result from each probe, the last 100 results combined, and a 24h uptime percentage.
{
"check": { "id": 5, "name": "Main site", "last_status": "up", "..." },
"probe_servers": [
{ "id": 42, "hostname": "web-01" },
{ "id": 43, "hostname": "web-02" }
],
"per_probe_latest": [
{ "server_id": 42, "hostname": "web-01", "ok": 1, "status_code": 200, "latency_ms": 142 },
{ "server_id": 43, "hostname": "web-02", "ok": 1, "status_code": 200, "latency_ms": 156 }
],
"uptime_pct_24h": 99.83
}Update mutable fields and/or probe_server_ids (which fully replaces the assignment list).
Cascade-deletes the check, its probe assignments, and all stored results.
Why agent-side probing?
Three honest reasons:
- No probe-traffic tax. Cloud-provider monitoring services charge for the egress they generate, then bill you for it. Your agents already exist and already make HTTPS calls home; an extra probe is cheap.
- Internal-network reach. A SaaS probe network can't see your private subnets. Your agents already live inside them. A
tcpcheck against10.0.0.5:6379is trivial fromweb-01and impossible from anyone else. - Honest geography. "Probed from 14 regions" is mostly theater if the regions don't match where your users are. Probes from your actual production servers are the closest possible proxy for what your actual users experience.
Troubleshooting
"Check is failing but the URL loads fine in my browser"
First, check whether the probe server has jq installed. Uptime checks need jq to parse the agent's config cache; without it, the agent skips them entirely.
which jq || sudo apt-get install -y jqThen look at the per-probe error_kind in the dashboard. The most common reasons:
timeout— the server can't reach the URL at all. DNS, firewall, or the service is genuinely down.http_status— the URL returned a code not inexpected_status_codes. Adjust the codes or fix the endpoint.body_mismatch— yourbody_containsstring isn't in the response. Common when health endpoints change format.
"TLS expiry says expired but the cert is valid"
Clock skew on the probe server. Run timedatectl status and ensure NTP is on:
sudo timedatectl set-ntp trueThe TLS check compares the cert's notAfter to the local clock. A server 48 hours behind real time will flag a cert that expires today as already expired.
"I want different schedules for different checks"
Sub-tick scheduling isn't in v1. The workaround: put time-critical checks on a Scale server (1-minute cadence) and less-urgent checks on Pro/Team servers (5-minute cadence). The aggregate of a check uses whichever results have actually arrived.
"I deleted a server but its old probe results are still showing"
uptime_probe_results cascade-deletes from servers, so removing a server cleans up its probe history. If you're seeing stale data, refresh the dashboard.