Alert Runbooks
Alert runbooks are split per alert under this directory so each alert annotation links to a dedicated procedure.
Shared Incident Workflow
- Confirm blast radius (
single pod,single tenant, orfleet-wide). - Validate health checks:
curl -fsS http://<proxy>:3100/readycurl -fsS http://<victorialogs>:9428/health
- Check request failures and latency from metrics.
- Review proxy logs for translation, backend, timeout, or auth errors.
- Apply mitigation, then verify alert recovery criteria.
Runbooks
- Deployment And Scaling Best Practices
- LokiVLProxyDown
- LokiVLProxyHighErrorRate
- LokiVLProxyHighLatency
- LokiVLProxyBackendHighLatency
- LokiVLProxyBackendUnreachable
- LokiVLProxyCircuitBreakerOpen
- LokiVLProxyTenantHighErrorRate
- LokiVLProxyRateLimiting
- LokiVLProxyClientBadRequestBurst
- LokiVLProxyGrafanaTupleContract
- LokiVLProxySystemResources