Operations Guide
Deployment​
Minimum Requirements​
| Resource | Minimum | Recommended |
|---|---|---|
| CPU | 50m | 200m |
| Memory | 64Mi | 256Mi |
| Replicas | 1 | 2+ (with PDB) |
The proxy is stateless (except optional disk cache). Scale horizontally without coordination.
Key scaling controls (all tunable via CLI flags):
-max-concurrent 100— global concurrent backend query cap-rate-limit-per-second 50/-rate-limit-burst 100— per-client token bucket-cb-fail-threshold 5/-cb-open-duration 10s— backend circuit breaker- use Grafana refresh policy, ingress shaping, HPA, and cache tuning as complementary levers
Helm Deployment​
helm install loki-vl-proxy oci://ghcr.io/reliablyobserve/charts/loki-vl-proxy \
--version <release> \
--set extraArgs.backend=http://victorialogs:9428 \
--set extraArgs.label-style=underscores
# Local chart (development)
helm install loki-vl-proxy ./charts/loki-vl-proxy \
--set extraArgs.backend=http://victorialogs:9428 \
--set extraArgs.label-style=underscores
For multi-replica fleets with HPA, prefer peerCache.enabled=true over static peer lists. The chart creates a headless service and the proxy refreshes DNS-discovered peers automatically, so scaling events do not require manual replica or peer updates.
For Grafana Logs Drilldown pattern discovery, keep the default extraArgs.patterns-enabled=true or set it explicitly during rollout if you need to control the surface area:
extraArgs:
backend: http://victorialogs:9428
label-style: underscores
patterns-enabled: "true"
Required Configuration​
| Flag | Required | Description |
|---|---|---|
-backend | Yes | VictoriaLogs URL |
-listen | No | Listen address (default :3100) |
-label-style | No | passthrough (default) or underscores |
Backend Auth Forwarding​
If VictoriaLogs authentication is delegated from upstream clients, you can forward client Authorization to backend explicitly:
-forward-authorization=true
Equivalent manual mode:
-forward-headers=Authorization
Use this only in trusted topologies (for example Grafana/auth-proxy -> Loki-VL-proxy -> VictoriaLogs).
Operational Assets​
Treat these as one versioned operational package:
| Asset | Canonical source | Purpose |
|---|---|---|
| Grafana operations dashboard | dashboard/loki-vl-proxy.json | Three-section layout: Section 1 — SLO/SLI + Health (8-stat top strip: circuit breaker, active requests, QPS, error %, P99 client latency, P95 backend latency, cache hit ratio, uptime; plus SLI time-series rows). Section 2 — Client → Proxy → VL + Resources (client visibility: request rate by route, errors by reason, query length, per-client inflight, latency by route; proxy internals: coalescing, internal ops, response tuple mode, tenant QPS; VL backend: upstream fanout, window count, backend latency, fetch/merge latency, adaptive parallelism; process resources: CPU, memory, goroutines, GC, network, disk I/O, PSI pressure). Section 3 — Deep Proxy Internals (cache tiers: T0/L1/L2/L3 hit/miss, sizes, stale hits, backend fallthrough; peer cache fleet: cluster members, hit/miss, write-through, hot read-ahead, error breakdown; query-range windowing: window cache, prefilter efficiency, retries, partial responses, prefilter duration, adaptive parallelism trace; patterns engine: in-memory count/bytes, mining rate, source line pipeline, snapshot hits/reuse, persistence; HTTP connection lifecycle: states, rotation reasons, transitions; tenant deep dive: per-tenant QPS/P99/errors) |
| Alert rules | alerting/loki-vl-proxy-prometheusrule.yaml | PrometheusRule/vmalert-oriented alert set with standardized labels and annotations |
| SRE runbooks | docs/runbooks/alerts.md | Index plus per-alert runbook files referenced directly from alert runbook_url |
When using the Helm chart, the runtime templates consume synced copies in charts/loki-vl-proxy/{dashboards,alerting}. Keep canonical and chart copies aligned with:
./scripts/ci/sync_observability_assets.sh sync
./scripts/ci/sync_observability_assets.sh --check
--check is already enforced in CI to prevent drift.
Preventive Scaling And Deployment​
Use the dedicated guide for prevention-oriented operations hardening:
Critical defaults to reduce incident frequency:
- run at least 2 replicas with PDB enabled
- enable HPA with conservative downscale
- tune cache TTLs differently for query paths vs metadata paths
- monitor backend p95 and proxy p99 histograms, not averages
- add synthetic in-cluster e2e query probes in addition to
/ready
Multi-Tenancy​
Tenant Mapping Strategies​
The proxy maps X-Scope-OrgID headers to VictoriaLogs tenant IDs. Three strategies are available depending on deployment size and dynamism.
1. Inline JSON (-tenant-map)​
Best for small, static tenant maps. The entire map is provided directly as a CLI flag or env var value:
-tenant-map='{"team-a":"vl-tenant-1","team-b":"vl-tenant-2"}'
This requires a proxy restart to update.
2. File-based (-tenant-map-file)​
Best for Kubernetes environments where tenant maps are mounted as ConfigMaps. The proxy hot-reloads the file on SIGHUP and also polls for mtime changes on the configured interval:
-tenant-map-file=/etc/proxy/tenants.yaml
-tenant-map-reload-interval=30s
The default reload interval is 30s. To trigger an immediate reload without restarting the proxy:
kill -HUP <pid>
In Helm, configure a lifecycle hook to send SIGHUP on ConfigMap updates:
lifecycle:
postStart:
exec:
command: ["/bin/sh", "-c", "kill -HUP 1"]
Polling every 30s means changes are picked up automatically even without an explicit signal, which suits ConfigMap-mounted files that are updated by an external controller.
3. Label-based (-tenant-label)​
Routes per-query based on a label field value in the incoming stream. Useful when a single VictoriaLogs tenant holds multi-tenant data distinguished by a label such as service.name:
-tenant-label=service.name
When set, the proxy extracts the label value from the query or push request and uses it as the VictoriaLogs tenant ID, without requiring the client to set X-Scope-OrgID.
-require-tenant-header Flag​
-require-tenant-header=true enforces that every request carries an X-Scope-OrgID header (returns HTTP 401 if missing) without enabling full auth. This is useful for catching misconfigured clients in multi-tenant setups without a full auth proxy.
This is distinct from -auth.enabled: the latter enables credential validation, while -require-tenant-header only checks for header presence.
Health Check Endpoints​
The proxy exposes three operational endpoints:
| Endpoint | Purpose | Kubernetes probe |
|---|---|---|
/alive | Liveness — confirms the process is running | livenessProbe |
/ready | Readiness — confirms the proxy is ready to serve traffic (backend reachable, warm-up complete) | readinessProbe |
/metrics | Prometheus metrics scrape | ServiceMonitor / scrape config |
If /ready stays non-ok immediately after a restart, check whether patterns or indexed label-values startup warm is configured — those persistence restores can intentionally hold readiness at 503 until warm-up completes.
Translation Modes​
Translation guidance moved to dedicated docs:
- Translation Modes Guide for mode selection and exact underscore vs dotted behavior
- Configuration for flag reference
- Translation Reference for LogQL-to-LogsQL execution mapping
Operational recommendation:
- use
label-style=underscoreswhen upstream VL stores dotted OTel fields - use
metadata-field-mode=hybridfor mixed Loki + OTel field workflows - use
metadata-field-mode=translatedfor strict Loki-style field surfaces - use
metadata-field-mode=nativefor OTel-native field-only surfaces
Capacity Planning​
Memory​
| Component | Memory per Unit |
|---|---|
| L1 cache | ~50MB per 10k entries |
| L2 disk cache (bbolt) | ~10MB mmap overhead |
| Per active query | ~1-5MB (depends on result size) |
| Singleflight coalescing buffer | Up to 256MB per unique query |
| Base process | ~20MB |
Formula: base(20MB) + cache(entries × 5KB) + concurrent_queries × 3MB
Default -cache-max is 10000 (binary default). The Helm chart ships 50000 to suit light-to-moderate production use. For 50k cache entries and 100 concurrent queries: ~570MB recommended limit.
CPU​
The proxy is CPU-light. Main costs:
- JSON marshaling/unmarshaling (~70% of CPU)
- LogQL→LogsQL translation (~10%)
- Label translation (~5%)
- HTTP overhead (~15%)
Guideline: 1 CPU core handles ~2000 req/s.
Disk Cache​
L2 disk cache with bbolt:
- 1 million entries ≈ 2-5GB on disk (gzip compressed)
- Write amplification: ~2x with bbolt
- Use fast SSD (NVMe) for the cache volume
- Set
disk-cache-flush-size=500anddisk-cache-flush-interval=10sfor batched writes
Performance Tuning​
Cache TTLs​
Default TTLs are conservative. Adjust for your query patterns:
-cache-ttl=120s # Increase for stable label sets
-cache-max=50000 # Increase for high-cardinality environments
| Endpoint | Default TTL | Recommendation |
|---|---|---|
| labels | 60s | 120-300s if label set is stable |
| label_values | 60s | 60-120s |
| series | 30s | 30-60s |
| detected_fields | 30s | 30-60s |
| query_range | 10s | 5-30s depending on freshness needs |
| query | 10s | 5-30s |
Concurrency Limits​
-http-max-header-bytes=1048576 # 1MB default
-http-max-body-bytes=10485760 # 10MB default
The proxy uses singleflight to coalesce identical concurrent queries. N identical requests → 1 backend request.
Built-In Traffic Guards​
All traffic guard controls are tunable via CLI flags (or extraArgs in the Helm chart):
| Flag | Default | Description |
|---|---|---|
-rate-limit-per-second | 50 | Per-client request rate (req/s) |
-rate-limit-burst | 100 | Per-client burst allowance |
-max-concurrent | 100 | Global concurrent backend query cap |
-cb-fail-threshold | 5 | Failures within window to open circuit breaker |
-cb-open-duration | 10s | How long circuit breaker stays open |
-cb-window-duration | 30s | Failure counting window |
If defaults are too strict or too loose for your workload, tune at the proxy first, then complement with:
- reduced Grafana auto-refresh and retry pressure
- ingress or service-mesh shaping in front of the proxy
- scale out replicas and raise cache effectiveness before pushing more uncached load
Monitoring​
See the dedicated Observability Guide for the full metrics catalog, JSON log schema, OTLP push configuration, and collector/agent integration examples.
Metrics​
The proxy exposes Prometheus metrics at /metrics:
Use the Observability Guide as the canonical catalog for:
- every documented
loki_vl_proxy_*metric family - cardinality level (
Low,Medium,High (capped)) for each family - scrape versus OTLP field/label mapping
- the new fanout and proxy-internal operation metrics/log fields
| Metric | Type | Primary dimensions | Description |
|---|---|---|---|
loki_vl_proxy_requests_total | counter | system, direction, endpoint, route, status | Total requests by downstream Loki route or upstream backend route |
loki_vl_proxy_request_duration_seconds | histogram | system, direction, endpoint, route | End-to-end request latency |
loki_vl_proxy_backend_duration_seconds | histogram | system, direction, endpoint, route | Upstream-only latency for VictoriaLogs and rules/alerts backends |
loki_vl_proxy_cache_hits_by_endpoint / loki_vl_proxy_cache_misses_by_endpoint | counter | system, direction, endpoint, route | Cache efficiency by normalized route |
loki_vl_proxy_tenant_requests_total / loki_vl_proxy_client_requests_total | counter | tenant/client plus route dimensions | Hot tenants and clients per route |
loki_vl_proxy_process_* | gauges/counters | metric family specific | Runtime, CPU, memory, disk, network, and PSI health |
Key Ratios to Monitor​
- Route cache hit ratio:
cache_hits_by_endpoint / (cache_hits_by_endpoint + cache_misses_by_endpoint)byendpoint,route— target >80% on stable metadata paths - Downstream error rate:
requests_total{system="loki",direction="downstream",status=~"5.."}over total downstream requests — target <1% - Upstream latency:
backend_duration_secondsbyendpoint,route— use this to separate VictoriaLogs slowness from proxy-side work - End-to-end latency:
request_duration_seconds{system="loki",direction="downstream"}byendpoint,route— compare with upstream latency and request logs
OTLP Push​
Push metrics to an OTLP collector:
-otlp-endpoint=http://otel-collector:4318/v1/metrics
-otlp-interval=30s
-otlp-compression=gzip
The OTLP exporter reuses the same core proxy metric names that /metrics exposes, so dashboards and alert logic can stay aligned across scrape and push modes.
For exact proxy-only overhead on translated paths, use structured request logs with proxy.overhead_ms, proxy.duration_ms, and upstream.duration_ms. The metrics intentionally keep route-aware end-to-end and upstream histograms, while logs carry the per-request decomposition.
Troubleshooting​
No Data in Grafana​
- Check proxy health:
curl http://proxy:3100/ready - Check VL backend:
curl http://vl:9428/health - Check proxy logs for translation errors
- Verify label-style matches your VL ingestion format
- Check
/loki/api/v1/labelsfor available labels
Label Names Don't Match​
| Symptom | Cause | Fix |
|---|---|---|
| Dots in Grafana labels | label-style=passthrough with dotted VL data | Set label-style=underscores |
| Empty label_values for service_name | VL stores service.name, query asks service_name | Set label-style=underscores |
| Grafana Drilldown "failed to fetch" | Volume/stats endpoint issue | Check proxy logs, ensure VL v1.49+ |
High Memory Usage​
- Reduce
-cache-max(default 10000) - Reduce
-http-max-body-bytes - Add memory limits in Kubernetes
- Check for singleflight amplification (many unique queries)
High Latency​
- Keep
-response-compression=gzipfor broad Loki/Grafana compatibility;autonow behaves the same on the frontend for legacy configs - Set
-response-compression-min-bytesaround1024to avoid wasting CPU on small metadata/control responses - Increase cache TTLs
- Check VL backend latency via metrics
- Rely on built-in singleflight coalescing for identical concurrent reads
Circuit Breaker Tripping​
The circuit breaker opens after consecutive backend 5xx responses. Check:
- VL backend health and logs
- Network connectivity between proxy and VL
- VL resource usage (CPU/memory/disk)
Backup & Recovery​
The proxy is stateless. Only the optional disk cache needs backup:
- L1 cache: In-memory, rebuilds on restart
- L2 disk cache: bbolt file at
-disk-cache-path. Can be deleted safely — will be repopulated. - Configuration: All config is CLI flags / env vars. Store in Helm values or ConfigMap.
Scaling​
Horizontal Scaling​
horizontalPodAutoscaling:
enabled: true
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
Pod Disruption Budget​
podDisruptionBudget:
enabled: true
minAvailable: 1
Multi-Zone Deployment​
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app.kubernetes.io/name: loki-vl-proxy