LokiVLProxyBackendHighLatency
- Signal: backend p95 latency above 5s for 5m.
- Likely causes: VictoriaLogs saturation (CPU/IO), heavy scans, storage contention.
Triage
histogram_quantile(0.95, sum(rate(loki_vl_proxy_backend_duration_seconds_bucket[5m])) by (le, endpoint))- Correlate slow endpoints with request rate:
sum(rate(loki_vl_proxy_requests_total[5m])) by (endpoint, status)
- Check backend resource pressure (CPU, disk IO, memory, queue depth).
Mitigation
- Reduce expensive query load and tenant bursts.
- Scale backend resources and check storage performance.
- Increase metadata cache TTLs if repeated metadata scans are dominant.
Recovery Criteria
- backend p95 below threshold for at least 15m.
- user-facing query latency and error rates return to baseline.
Prevention
Apply Deployment And Scaling Best Practices for cache tuning and backend-aware autoscaling.