Skip to main content

LokiVLProxyBackendHighLatency

Signal: backend p95 latency above 5s for 5m.
Likely causes: VictoriaLogs saturation (CPU/IO), heavy scans, storage contention.

Triage

histogram_quantile(0.95, sum(rate(loki_vl_proxy_backend_duration_seconds_bucket[5m])) by (le, endpoint))
Correlate slow endpoints with request rate:
- sum(rate(loki_vl_proxy_requests_total[5m])) by (endpoint, status)
Check backend resource pressure (CPU, disk IO, memory, queue depth).

Mitigation

Reduce expensive query load and tenant bursts.
Scale backend resources and check storage performance.
Increase metadata cache TTLs if repeated metadata scans are dominant.

Recovery Criteria

backend p95 below threshold for at least 15m.
user-facing query latency and error rates return to baseline.

Prevention

Apply Deployment And Scaling Best Practices for cache tuning and backend-aware autoscaling.

Triage
Mitigation
Recovery Criteria
Prevention