Skip to main content

Alert Runbooks

Alert runbooks are split per alert under this directory so each alert annotation links to a dedicated procedure.

Shared Incident Workflow

  1. Confirm blast radius (single pod, single tenant, or fleet-wide).
  2. Validate health checks:
    • curl -fsS http://<proxy>:3100/ready
    • curl -fsS http://<victorialogs>:9428/health
  3. Check request failures and latency from metrics.
  4. Review proxy logs for translation, backend, timeout, or auth errors.
  5. Apply mitigation, then verify alert recovery criteria.

Runbooks