LokiVLProxyDown
- Signal:
up{job="<proxy-job>"} == 0for 1m. - Likely causes: pod crash, readiness failure, service misrouting, network policy.
Triage
kubectl -n <ns> get pods -l app.kubernetes.io/name=loki-vl-proxy -o widekubectl -n <ns> describe pod <pod>kubectl -n <ns> logs <pod> --since=10mkubectl -n <ns> get endpoints <service-name>
Mitigation
- Restart failed pod, roll back last deployment, or scale replicas.
- If readiness fails due to backend dependency, escalate as backend incident.
Recovery Criteria
up == 1/readyreturns200- dashboard query traffic recovers.
Prevention
Apply Deployment And Scaling Best Practices for multi-replica topology, rollout safety, and probe strategy.