Multi-region URL monitoring

You run a public API at https://api.yourapp.com. You want to know not just "is it up?" but "is it up from where my users actually are?" Geographic vantage matters: an outage in us-east-1 looks fine from your eu-west-1 monitoring box, and vice versa.

This recipe sets up one HTTP uptime check assigned to one server per region. BoxWatch aggregates the probe results; you get a single check that reflects regional reality.

What you'll end up with

One HTTP check called api.yourapp.com health.
Three probe servers, one each in US-East, US-West, and EU.
Aggregated status that goes down only when a majority of regions can't reach you.
Per-region latency charts as a free side-effect.

Why agents in your regions, not a SaaS probe network

The honest answer: because you already have servers there. If your fleet spans us-east-1, us-west-2, and eu-west-1, the geographic diversity of your monitoring topology comes for free. No per-region billing, no extra vendor.

The trade-off is that "probe regions" are wherever your boxes happen to live. If you want a probe in Tokyo and you don't have a server in Tokyo, you can't have one. The advice is to spin up one cheap VPS per region you care about and install the agent there — it's still cheaper than a per-check billing tier on a SaaS probe service.

Prerequisites

One BoxWatch agent installed per region you want to probe from. We'll assume three: us-east-monitor, us-west-monitor, eu-monitor. Install instructions at Installing the agent.
The public URL you want to probe. We'll use https://api.yourapp.com/health — assumed to return 200 OK with a body containing "OK".

Step 1: add an HTTP uptime check

Dashboard

Dashboard → Uptime → New check.
Name: api.yourapp.com health.
Check type: HTTP.
Target URL: https://api.yourapp.com/health.
Expected status codes: 200 (or 200-299 if your health endpoint sometimes returns 204).
Body contains (optional): OK. Catches the case where your health endpoint returns 200 but the body says "DEGRADED."
Max latency (optional): 2000 ms. Triggers a fail if the response takes longer than 2 seconds.
Follow redirects: usually on.
Timeout: 10 seconds.
Probe servers: pick us-east-monitor, us-west-monitor, eu-monitor.
Save.

API

POST/uptime-checks

Auth: bearer

export TOKEN="bw_..."
 
curl -fsS -X POST https://api.boxwatch.app/uptime-checks \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "api.yourapp.com health",
    "check_type": "http",
    "target": "https://api.yourapp.com/health",
    "expected_status_codes": "200",
    "max_latency_ms": 2000,
    "body_contains": "OK",
    "follow_redirects": 1,
    "timeout_seconds": 10,
    "probe_server_ids": [11, 22, 33],
    "alert_on_down": 1,
    "alert_on_recovery": 1
  }'

Field notes:

target is the full URL (with scheme), not separate host/port fields. Validator requires http:// or https://.
expected_status_codes accepts a single code (200), a comma list (200,204,301), or a range (200-299).
body_contains is a substring match, not a regex. Up to 500 characters. Case-sensitive.
probe_server_ids is the list of agents that will run the probe — typically one per region.

See the full Uptime Checks API reference.

Step 2: what aggregation looks like in practice

You have three regions. Each one probes the URL on its heartbeat (every 5 minutes). BoxWatch aggregates the three results:

3 of 3 OK → up.
1 of 3 down → degraded. Visible on the dashboard. No alert.
2 of 3 down → down (strict majority). After two consecutive down aggregations, alert fires.
3 of 3 down → down, alert after second consecutive sample.

The case that makes regional monitoring interesting is "1 of 3 down." That's where you see a problem isolated to one region. The dashboard shows yellow and tells you exactly which region — the per-probe table has one row per probe server with its last result. That data is often what you actually want during an incident, even when the global aggregate hasn't tipped over.

⚠

With only two vantages, "1 of 2 down" is exactly half — not a strict majority. The check stays degraded rather than going down. Practical upshot: if you only have two probe servers and one of them is in a region having an outage, you won't get alerted. Two regions is enough to detect a regional issue (you'll see it in the dashboard), but you need at least three to alert on one. Three is the magic number.

Step 3: route alerts

Email is on by default. Add Slack at Dashboard → Account → Notifications with your incoming-webhook URL. See Slack alerts.

When the check goes down, the alert message names the check, the time, and the failing probes — so the on-call engineer can immediately see "EU and US-West are failing, US-East is fine" and start triaging accordingly.

What you'll see in the dashboard

A combined latency chart with one line per probe server. Regional latency variance becomes visible at a glance — if your EU box is suddenly showing 800ms when it usually shows 80ms, that's a signal even when nothing is technically down.
Per-probe status pills: each region shows up / down / pending with a timestamp.
Recent probe results: status code, latency, error kind for each attempt.

Layer this with other signals

Multi-region URL monitoring is most honest when you treat it as one input among several:

Cloud-provider region status pages tell you about provider outages directly. BoxWatch tells you whether your specific app is reachable through them.
Server-side process monitoring on your API hosts tells you the API process is alive. See Monitor nginx across multiple hosts for that pattern.
Synthetic uptime catches things process monitoring misses — bad config, broken upstream, 500s under load.

Together, they answer different questions. Use them all.

Common gotchas

Probes from inside your own VPC. If your probe server is in the same VPC as the API, you're measuring intra-VPC reachability, not user reachability. Put at least one probe server somewhere outside your private network.
Health endpoint that lies. A /health endpoint that returns 200 regardless of dependency state is useless here. The body_contains check catches the simplest version of this ("OK" vs "DEGRADED"), but a well-designed health endpoint that actually exercises dependencies is worth its weight.
Aggressive max_latency_ms. A 200 ms threshold sounds reasonable until your EU-to-US-East probe starts hitting 250 ms during evening traffic and you wake up at 2 AM for a non-incident. Pick a number above your worst legitimate p95.