LokiVLProxyTenantHighErrorRate
- Signal: tenant-level
5xxratio above 10% for 5m. - Likely causes: tenant-specific query patterns, auth/routing drift, per-tenant burst load.
Triage
sum(rate(loki_vl_proxy_tenant_requests_total{tenant="<tenant>",status=~"5.."}[5m])) by (endpoint)- Inspect tenant query profile and query-length outliers.
- Validate tenant mapping/fanout configuration.
- Check whether incident is isolated to one tenant or multiple.
Mitigation
- Throttle tenant bursts or adjust tenant limits.
- Fix tenant routing/auth mapping.
- Work with tenant to correct expensive or invalid queries.
Recovery Criteria
- Tenant error ratio clears threshold.
- Other tenants remain stable during and after mitigation.
Prevention
Apply Deployment And Scaling Best Practices for tenant isolation, fanout limits, and noisy-tenant controls.