Skip to main content

Operations Guide

Deployment​

Minimum Requirements​

ResourceMinimumRecommended
CPU50m200m
Memory64Mi256Mi
Replicas12+ (with PDB)

The proxy is stateless (except optional disk cache). Scale horizontally without coordination.

Key scaling controls (all tunable via CLI flags):

  • -max-concurrent 100 — global concurrent backend query cap
  • -rate-limit-per-second 50 / -rate-limit-burst 100 — per-client token bucket
  • -cb-fail-threshold 5 / -cb-open-duration 10s — backend circuit breaker
  • use Grafana refresh policy, ingress shaping, HPA, and cache tuning as complementary levers

Helm Deployment​

helm install loki-vl-proxy oci://ghcr.io/reliablyobserve/charts/loki-vl-proxy \
--version <release> \
--set extraArgs.backend=http://victorialogs:9428 \
--set extraArgs.label-style=underscores

# Local chart (development)
helm install loki-vl-proxy ./charts/loki-vl-proxy \
--set extraArgs.backend=http://victorialogs:9428 \
--set extraArgs.label-style=underscores

For multi-replica fleets with HPA, prefer peerCache.enabled=true over static peer lists. The chart creates a headless service and the proxy refreshes DNS-discovered peers automatically, so scaling events do not require manual replica or peer updates.

For Grafana Logs Drilldown pattern discovery, keep the default extraArgs.patterns-enabled=true or set it explicitly during rollout if you need to control the surface area:

extraArgs:
backend: http://victorialogs:9428
label-style: underscores
patterns-enabled: "true"

Required Configuration​

FlagRequiredDescription
-backendYesVictoriaLogs URL
-listenNoListen address (default :3100)
-label-styleNopassthrough (default) or underscores

Backend Auth Forwarding​

If VictoriaLogs authentication is delegated from upstream clients, you can forward client Authorization to backend explicitly:

-forward-authorization=true

Equivalent manual mode:

-forward-headers=Authorization

Use this only in trusted topologies (for example Grafana/auth-proxy -> Loki-VL-proxy -> VictoriaLogs).


Operational Assets​

Treat these as one versioned operational package:

AssetCanonical sourcePurpose
Grafana operations dashboarddashboard/loki-vl-proxy.jsonThree-section layout: Section 1 — SLO/SLI + Health (8-stat top strip: circuit breaker, active requests, QPS, error %, P99 client latency, P95 backend latency, cache hit ratio, uptime; plus SLI time-series rows). Section 2 — Client → Proxy → VL + Resources (client visibility: request rate by route, errors by reason, query length, per-client inflight, latency by route; proxy internals: coalescing, internal ops, response tuple mode, tenant QPS; VL backend: upstream fanout, window count, backend latency, fetch/merge latency, adaptive parallelism; process resources: CPU, memory, goroutines, GC, network, disk I/O, PSI pressure). Section 3 — Deep Proxy Internals (cache tiers: T0/L1/L2/L3 hit/miss, sizes, stale hits, backend fallthrough; peer cache fleet: cluster members, hit/miss, write-through, hot read-ahead, error breakdown; query-range windowing: window cache, prefilter efficiency, retries, partial responses, prefilter duration, adaptive parallelism trace; patterns engine: in-memory count/bytes, mining rate, source line pipeline, snapshot hits/reuse, persistence; HTTP connection lifecycle: states, rotation reasons, transitions; tenant deep dive: per-tenant QPS/P99/errors)
Alert rulesalerting/loki-vl-proxy-prometheusrule.yamlPrometheusRule/vmalert-oriented alert set with standardized labels and annotations
SRE runbooksdocs/runbooks/alerts.mdIndex plus per-alert runbook files referenced directly from alert runbook_url

When using the Helm chart, the runtime templates consume synced copies in charts/loki-vl-proxy/{dashboards,alerting}. Keep canonical and chart copies aligned with:

./scripts/ci/sync_observability_assets.sh sync
./scripts/ci/sync_observability_assets.sh --check

--check is already enforced in CI to prevent drift.


Preventive Scaling And Deployment​

Use the dedicated guide for prevention-oriented operations hardening:

Critical defaults to reduce incident frequency:

  • run at least 2 replicas with PDB enabled
  • enable HPA with conservative downscale
  • tune cache TTLs differently for query paths vs metadata paths
  • monitor backend p95 and proxy p99 histograms, not averages
  • add synthetic in-cluster e2e query probes in addition to /ready

Multi-Tenancy​

Tenant Mapping Strategies​

The proxy maps X-Scope-OrgID headers to VictoriaLogs tenant IDs. Three strategies are available depending on deployment size and dynamism.

1. Inline JSON (-tenant-map)​

Best for small, static tenant maps. The entire map is provided directly as a CLI flag or env var value:

-tenant-map='{"team-a":"vl-tenant-1","team-b":"vl-tenant-2"}'

This requires a proxy restart to update.

2. File-based (-tenant-map-file)​

Best for Kubernetes environments where tenant maps are mounted as ConfigMaps. The proxy hot-reloads the file on SIGHUP and also polls for mtime changes on the configured interval:

-tenant-map-file=/etc/proxy/tenants.yaml
-tenant-map-reload-interval=30s

The default reload interval is 30s. To trigger an immediate reload without restarting the proxy:

kill -HUP <pid>

In Helm, configure a lifecycle hook to send SIGHUP on ConfigMap updates:

lifecycle:
postStart:
exec:
command: ["/bin/sh", "-c", "kill -HUP 1"]

Polling every 30s means changes are picked up automatically even without an explicit signal, which suits ConfigMap-mounted files that are updated by an external controller.

3. Label-based (-tenant-label)​

Routes per-query based on a label field value in the incoming stream. Useful when a single VictoriaLogs tenant holds multi-tenant data distinguished by a label such as service.name:

-tenant-label=service.name

When set, the proxy extracts the label value from the query or push request and uses it as the VictoriaLogs tenant ID, without requiring the client to set X-Scope-OrgID.


-require-tenant-header Flag​

-require-tenant-header=true enforces that every request carries an X-Scope-OrgID header (returns HTTP 401 if missing) without enabling full auth. This is useful for catching misconfigured clients in multi-tenant setups without a full auth proxy.

This is distinct from -auth.enabled: the latter enables credential validation, while -require-tenant-header only checks for header presence.


Health Check Endpoints​

The proxy exposes three operational endpoints:

EndpointPurposeKubernetes probe
/aliveLiveness — confirms the process is runninglivenessProbe
/readyReadiness — confirms the proxy is ready to serve traffic (backend reachable, warm-up complete)readinessProbe
/metricsPrometheus metrics scrapeServiceMonitor / scrape config

If /ready stays non-ok immediately after a restart, check whether patterns or indexed label-values startup warm is configured — those persistence restores can intentionally hold readiness at 503 until warm-up completes.


Translation Modes​

Translation guidance moved to dedicated docs:

Operational recommendation:

  • use label-style=underscores when upstream VL stores dotted OTel fields
  • use metadata-field-mode=hybrid for mixed Loki + OTel field workflows
  • use metadata-field-mode=translated for strict Loki-style field surfaces
  • use metadata-field-mode=native for OTel-native field-only surfaces

Capacity Planning​

Memory​

ComponentMemory per Unit
L1 cache~50MB per 10k entries
L2 disk cache (bbolt)~10MB mmap overhead
Per active query~1-5MB (depends on result size)
Singleflight coalescing bufferUp to 256MB per unique query
Base process~20MB

Formula: base(20MB) + cache(entries × 5KB) + concurrent_queries × 3MB

Default -cache-max is 10000 (binary default). The Helm chart ships 50000 to suit light-to-moderate production use. For 50k cache entries and 100 concurrent queries: ~570MB recommended limit.

CPU​

The proxy is CPU-light. Main costs:

  • JSON marshaling/unmarshaling (~70% of CPU)
  • LogQL→LogsQL translation (~10%)
  • Label translation (~5%)
  • HTTP overhead (~15%)

Guideline: 1 CPU core handles ~2000 req/s.

Disk Cache​

L2 disk cache with bbolt:

  • 1 million entries ≈ 2-5GB on disk (gzip compressed)
  • Write amplification: ~2x with bbolt
  • Use fast SSD (NVMe) for the cache volume
  • Set disk-cache-flush-size=500 and disk-cache-flush-interval=10s for batched writes

Performance Tuning​

Cache TTLs​

Default TTLs are conservative. Adjust for your query patterns:

-cache-ttl=120s # Increase for stable label sets
-cache-max=50000 # Increase for high-cardinality environments
EndpointDefault TTLRecommendation
labels60s120-300s if label set is stable
label_values60s60-120s
series30s30-60s
detected_fields30s30-60s
query_range10s5-30s depending on freshness needs
query10s5-30s

Concurrency Limits​

-http-max-header-bytes=1048576 # 1MB default
-http-max-body-bytes=10485760 # 10MB default

The proxy uses singleflight to coalesce identical concurrent queries. N identical requests → 1 backend request.

Built-In Traffic Guards​

All traffic guard controls are tunable via CLI flags (or extraArgs in the Helm chart):

FlagDefaultDescription
-rate-limit-per-second50Per-client request rate (req/s)
-rate-limit-burst100Per-client burst allowance
-max-concurrent100Global concurrent backend query cap
-cb-fail-threshold5Failures within window to open circuit breaker
-cb-open-duration10sHow long circuit breaker stays open
-cb-window-duration30sFailure counting window

If defaults are too strict or too loose for your workload, tune at the proxy first, then complement with:

  • reduced Grafana auto-refresh and retry pressure
  • ingress or service-mesh shaping in front of the proxy
  • scale out replicas and raise cache effectiveness before pushing more uncached load

Monitoring​

See the dedicated Observability Guide for the full metrics catalog, JSON log schema, OTLP push configuration, and collector/agent integration examples.

Metrics​

The proxy exposes Prometheus metrics at /metrics:

Use the Observability Guide as the canonical catalog for:

  • every documented loki_vl_proxy_* metric family
  • cardinality level (Low, Medium, High (capped)) for each family
  • scrape versus OTLP field/label mapping
  • the new fanout and proxy-internal operation metrics/log fields
MetricTypePrimary dimensionsDescription
loki_vl_proxy_requests_totalcountersystem, direction, endpoint, route, statusTotal requests by downstream Loki route or upstream backend route
loki_vl_proxy_request_duration_secondshistogramsystem, direction, endpoint, routeEnd-to-end request latency
loki_vl_proxy_backend_duration_secondshistogramsystem, direction, endpoint, routeUpstream-only latency for VictoriaLogs and rules/alerts backends
loki_vl_proxy_cache_hits_by_endpoint / loki_vl_proxy_cache_misses_by_endpointcountersystem, direction, endpoint, routeCache efficiency by normalized route
loki_vl_proxy_tenant_requests_total / loki_vl_proxy_client_requests_totalcountertenant/client plus route dimensionsHot tenants and clients per route
loki_vl_proxy_process_*gauges/countersmetric family specificRuntime, CPU, memory, disk, network, and PSI health

Key Ratios to Monitor​

  • Route cache hit ratio: cache_hits_by_endpoint / (cache_hits_by_endpoint + cache_misses_by_endpoint) by endpoint,route — target >80% on stable metadata paths
  • Downstream error rate: requests_total{system="loki",direction="downstream",status=~"5.."} over total downstream requests — target <1%
  • Upstream latency: backend_duration_seconds by endpoint,route — use this to separate VictoriaLogs slowness from proxy-side work
  • End-to-end latency: request_duration_seconds{system="loki",direction="downstream"} by endpoint,route — compare with upstream latency and request logs

OTLP Push​

Push metrics to an OTLP collector:

-otlp-endpoint=http://otel-collector:4318/v1/metrics
-otlp-interval=30s
-otlp-compression=gzip

The OTLP exporter reuses the same core proxy metric names that /metrics exposes, so dashboards and alert logic can stay aligned across scrape and push modes.

For exact proxy-only overhead on translated paths, use structured request logs with proxy.overhead_ms, proxy.duration_ms, and upstream.duration_ms. The metrics intentionally keep route-aware end-to-end and upstream histograms, while logs carry the per-request decomposition.


Troubleshooting​

No Data in Grafana​

  1. Check proxy health: curl http://proxy:3100/ready
  2. Check VL backend: curl http://vl:9428/health
  3. Check proxy logs for translation errors
  4. Verify label-style matches your VL ingestion format
  5. Check /loki/api/v1/labels for available labels

Label Names Don't Match​

SymptomCauseFix
Dots in Grafana labelslabel-style=passthrough with dotted VL dataSet label-style=underscores
Empty label_values for service_nameVL stores service.name, query asks service_nameSet label-style=underscores
Grafana Drilldown "failed to fetch"Volume/stats endpoint issueCheck proxy logs, ensure VL v1.49+

High Memory Usage​

  • Reduce -cache-max (default 10000)
  • Reduce -http-max-body-bytes
  • Add memory limits in Kubernetes
  • Check for singleflight amplification (many unique queries)

High Latency​

  • Keep -response-compression=gzip for broad Loki/Grafana compatibility; auto now behaves the same on the frontend for legacy configs
  • Set -response-compression-min-bytes around 1024 to avoid wasting CPU on small metadata/control responses
  • Increase cache TTLs
  • Check VL backend latency via metrics
  • Rely on built-in singleflight coalescing for identical concurrent reads

Circuit Breaker Tripping​

The circuit breaker opens after consecutive backend 5xx responses. Check:

  • VL backend health and logs
  • Network connectivity between proxy and VL
  • VL resource usage (CPU/memory/disk)

Backup & Recovery​

The proxy is stateless. Only the optional disk cache needs backup:

  • L1 cache: In-memory, rebuilds on restart
  • L2 disk cache: bbolt file at -disk-cache-path. Can be deleted safely — will be repopulated.
  • Configuration: All config is CLI flags / env vars. Store in Helm values or ConfigMap.

Scaling​

Horizontal Scaling​

horizontalPodAutoscaling:
enabled: true
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70

Pod Disruption Budget​

podDisruptionBudget:
enabled: true
minAvailable: 1

Multi-Zone Deployment​

topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app.kubernetes.io/name: loki-vl-proxy