Deployment And Scaling Best Practices
Use these defaults to reduce incident frequency and keep recovery time short.
Topology
- Run at least
2replicas in production. - Use
podDisruptionBudget.minAvailable: 1. - Keep backend and proxy in low-latency network zones when possible.
- Enable
peerCache.enabled=truefor multi-replica fleets.
Resource Sizing
- Start with:
- requests:
200m CPU,256Mi memory - limits:
1000m CPU,1Gi memory
- requests:
- Increase memory first when cache miss spikes follow OOM pressure.
- Keep
goMemLimitPercentenabled and set explicitgoMemLimitin tightly constrained clusters.
Cache Strategy
- Keep short TTLs for live query paths (
query,query_range) where freshness matters. - Use longer TTLs for metadata paths (
labels,detected_fields,detected_field_values). - For larger working sets, run
StatefulSet+ persistentdisk-cache-path. - Monitor cache hit ratio and tune
cache-maxbefore scaling backend blindly.
Autoscaling
- Enable HPA with CPU target around
65-75%. - Scale on request rate or latency if custom metrics are available.
- Avoid aggressive downscale windows that cause cache churn.
Health And Probes
- Keep liveness/readiness on
/ready. - Add synthetic e2e probes that execute a real lightweight query path, not only
/ready. - Alert separately on backend p95 latency and backend-unreachable signals.
Rollout Safety
- Use rolling updates with max unavailable
0where possible. - Gate rollout by:
- error-rate increase
- p95/p99 latency increase
- backend
502burst
- Roll back quickly if all three regress simultaneously.
Multi-Tenant Hardening
- Keep tenant map explicit and audited.
- Cap multi-tenant fanout and merged payload limits.
- Watch tenant-level error and latency histograms to catch noisy tenants early.
Recommended SRE Checklist
replicaCount >= 2, PDB enabled.- HPA enabled with conservative downscale.
- Cache policy reviewed for query vs metadata endpoints.
- Backend p95 and proxy p99 alerts enabled.
- Synthetic e2e probe running every 1-5 minutes.
- Runbooks linked in alert annotations and tested in game days.