Process monitoring

Process monitoring lets you name specific processes that should be running on each server. The agent collects per-process CPU and memory data on every heartbeat, and BoxWatch fires an alert when a process disappears, restarts, or crosses a CPU or memory threshold.

This is host-side, not application-side. There's no SDK to import, no callback to register. You configure "watch nginx on web-01" in the dashboard, and the agent on web-01 does the rest.

How it works

You tell BoxWatch a process name to watch on a server (e.g., nginx, postgres, redis-server).
On the next heartbeat, the API tells the agent which processes to look at. The agent caches the list at /opt/boxwatch/processes.cache.
Each tick, the agent runs pgrep -x <name> for each watched process. It sums CPU and RSS across all matching PIDs and records the oldest start time.
The aggregated snapshot is sent in the heartbeat's processes array.
The API compares the new sample against the previous one and fires an alert on any of the four state transitions described below.

Matching is exact name (pgrep -x). No wildcards, no command-line substring search. This is deliberate — it keeps the agent fast and the security surface small. If you want to watch a Python process, watch python; if you have multiple Python processes, BoxWatch will aggregate them all under that name.

Multi-instance processes (a worker pool, an Nginx with N workers) are summed into one row. Count is the number of running PIDs; CPU and RSS are totals.

Adding a watched process

In the dashboard, open the server detail page and find the Processes panel. Click + Watch new, type the process name, and pick which alerts to enable.

A new watched process starts in pending state. On the next heartbeat (every 5 minutes on the standard push interval), the agent picks up the new config and starts reporting.

Over the API:

POST/servers/:id/processes

Auth: bearer

curl -X POST https://api.boxwatch.app/servers/42/processes \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "process_name": "nginx",
    "alert_on_down": 1,
    "alert_on_restart": 0,
    "alert_on_cpu": 1,
    "cpu_threshold_pct": 75,
    "alert_on_memory": 0
  }'

{
  "process": {
    "id": 17,
    "server_id": 42,
    "process_name": "nginx",
    "enabled": 1,
    "alert_on_down": 1,
    "alert_on_restart": 0,
    "alert_on_cpu": 1,
    "alert_on_memory": 0,
    "cpu_threshold_pct": 75.00,
    "memory_threshold_pct": 25.00,
    "last_sample_at": null,
    "last_count": null,
    "alerted_state": null,
    "status": "pending"
  }
}

Process name validation rejects whitespace, newlines, and shell metacharacters. Specifically, any of the following characters cause a 400 validation error:

;  |  &  $  `  \  "  '  <  >  (  )  {  }

Names that are 1-100 characters and look like a normal pgrep -x argument are accepted.

Alert types

There are four alert types, all toggleable per watched process.

Process down

Fires when the previous heartbeat had count > 0 and the new heartbeat has count = 0. Once alerted, the state is sticky — every subsequent zero-count heartbeat does not re-alert. Only state transitions trigger.

Process restarted

Fires when count > 0, the oldest start time has moved forward since the previous heartbeat, and a restart alert is enabled. Detection is automatic from oldest_start_unix — no agent-side state required.

There's one suppression rule: if a process was already in the down state and recovers with a new PID, the recovery looks like a restart by definition, so the restart alert is suppressed. The earlier down alert was the real signal.

Process CPU high

Fires when aggregate CPU (sum across all PIDs with this name) crosses cpu_threshold_pct. Default threshold is 50%. The alert clears the next sample that drops below the threshold.

Process memory high

Fires when aggregate RSS as a percent of host RAM crosses memory_threshold_pct. Default threshold is 25%. Clears the same way.

Priority order: a single heartbeat can match more than one condition, but only one alert fires. Priority is down > restarted > cpu_high > memory_high.

Anti-storm protections

State is sticky. A process that's been down for 12 hours doesn't generate 720 alerts on the 5-minute tick. It generates one. The state machine only fires on transitions.
Per-heartbeat cap. The heartbeat handler dispatches at most 10 process alerts per server per heartbeat. If you have 50 watched processes and they all die at once (host outage), you get the first 10. A summary log entry is written for the rest.
Down→restart suppression (described above) keeps a single incident from producing two alerts.

Account limits

Watched processes are capped per server, not per account. Every account gets up to 100 watched processes per server, free.

Hitting the cap returns HTTP 402 with scope: "per_server" and the current count/limit.

API reference

GET/servers/:id/processes

Auth: bearer

Lists all watched processes for a server, with denormalized latest sample values and the plan cap.

POST/servers/:id/processes

Auth: bearer

Add a watched process. Fields:

Field	Type	Default	Notes
`process_name`	string	required	1-100 chars. Matched via `pgrep -x`.
`enabled`	0/1	1	Set to 0 to pause without deleting.
`alert_on_down`	0/1	1
`alert_on_restart`	0/1	0
`alert_on_cpu`	0/1	0
`cpu_threshold_pct`	decimal	50.00	0.01-100.
`alert_on_memory`	0/1	0
`memory_threshold_pct`	decimal	25.00	0.01-100. Of host RAM.

GET/servers/:id/processes/:procId

Auth: bearer

Detail view. Returns the watched-process row plus up to 100 most recent samples and recent_restarts (count of unique restart events in the last 24 hours).

PUT/servers/:id/processes/:procId

Auth: bearer

Update mutable fields. Same field list as POST minus process_name is fixed (delete and re-add to rename).

DELETE/servers/:id/processes/:procId

Auth: bearer

Remove a watched process. Cascade-deletes its samples.

Why not full process-tree introspection?

BoxWatch isn't an APM. We deliberately watch only what you say to watch, by name. No tree walking, no command-line parsing, no auto-discovery. The result is:

The agent stays under 1000 lines of bash.
A new watched process appears on the next heartbeat without any code change.
The data volume is bounded by your configured list, not the entire host's process table.

If you want full process-tree introspection, BoxWatch can run alongside an APM tool — they don't conflict.

Troubleshooting

"I added `nginx` but the dashboard says `pending` for ages"

The agent picks up new watched processes on its next heartbeat (every 5 minutes) after the API tells it about them.

Force a manual run if you don't want to wait:

sudo /opt/boxwatch/agent.sh

"Process is running but BoxWatch says it's missing"

Check the exact name pgrep -x is matching against:

pgrep -x nginx

If that returns nothing but pgrep nginx (without -x) does, the binary's basename isn't nginx. Watch the actual name. Common gotchas: python3 vs. python, node vs. the wrapper script that exec's it.

"Spurious restart alerts every hour"

The process is restarting on its own. Check systemctl status <service> and look at the restart count. The agent isn't generating the events — your service manager is.

Process monitoring requires agent v2.0+. Re-run the install command to upgrade:

curl -sL https://boxwatch.app/install.sh | bash -s YOUR_AGENT_KEY

"macOS server isn't reporting process data"

The collection block uses /proc/<pid>/stat, which is Linux-only. macOS and BSD agents don't report process samples in v1.

ℹ

Process monitoring is included on every account. Every account can watch up to 100 processes per server, free.