LokiVLProxyRateLimiting
- Signal: sustained
reason="rate_limited"client errors. - Likely causes: burst traffic, low client limits, retry storms.
Triage
sum(rate(loki_vl_proxy_client_errors_total{reason="rate_limited"}[5m])) by (client, endpoint)- Check per-client request volume and retry behavior.
- Compare incident traffic with the current built-in limiter defaults (
50 req/sper client, burst100).
Mitigation
- Shape traffic at Grafana, ingress, or an outer proxy layer if the built-in limiter is too strict for the workload.
- Add client-side backoff/jitter and reduce retries.
- Isolate abusive client traffic if needed.
Recovery Criteria
- Rate-limited errors return to expected baseline.
- Client-facing SLO returns to normal.
Prevention
Apply Deployment And Scaling Best Practices for client shaping, limit sizing, and retry-control practices.