Operations Guide

Deployment

Minimum Requirements

Resource	Minimum	Recommended
CPU	50m	200m
Memory	64Mi	256Mi
Replicas	1	2+ (with PDB)

The proxy is stateless (except optional disk cache). Scale horizontally without coordination.

Key scaling controls (all tunable via CLI flags):

-max-concurrent 100 — global concurrent backend query cap
-rate-limit-per-second 50 / -rate-limit-burst 100 — per-client token bucket
-cb-fail-threshold 5 / -cb-open-duration 10s — backend circuit breaker
use Grafana refresh policy, ingress shaping, HPA, and cache tuning as complementary levers

Helm Deployment

helm install loki-vl-proxy oci://ghcr.io/reliablyobserve/charts/loki-vl-proxy \
  --version <release> \
  --set extraArgs.backend=http://victorialogs:9428 \
  --set extraArgs.label-style=underscores

# Local chart (development)
helm install loki-vl-proxy ./charts/loki-vl-proxy \
  --set extraArgs.backend=http://victorialogs:9428 \
  --set extraArgs.label-style=underscores

For multi-replica fleets with HPA, prefer peerCache.enabled=true over static peer lists. The chart creates a headless service and the proxy refreshes DNS-discovered peers automatically, so scaling events do not require manual replica or peer updates.

For Grafana Logs Drilldown pattern discovery, keep the default extraArgs.patterns-enabled=true or set it explicitly during rollout if you need to control the surface area:

extraArgs:
  backend: http://victorialogs:9428
  label-style: underscores
  patterns-enabled: "true"

Required Configuration

Flag	Required	Description
`-backend`	Yes	VictoriaLogs URL
`-listen`	No	Listen address (default `:3100`)
`-label-style`	No	`passthrough` (default) or `underscores`

Backend Auth Forwarding

If VictoriaLogs authentication is delegated from upstream clients, you can forward client Authorization to backend explicitly:

-forward-authorization=true

Equivalent manual mode:

-forward-headers=Authorization

Use this only in trusted topologies (for example Grafana/auth-proxy -> Loki-VL-proxy -> VictoriaLogs).

Operational Assets

Treat these as one versioned operational package:

Asset	Canonical source	Purpose
Grafana operations dashboard	`dashboard/loki-vl-proxy.json`	Three-section layout: Section 1 — SLO/SLI + Health (8-stat top strip: circuit breaker, active requests, QPS, error %, P99 client latency, P95 backend latency, cache hit ratio, uptime; plus SLI time-series rows). Section 2 — Client → Proxy → VL + Resources (client visibility: request rate by route, errors by reason, query length, per-client inflight, latency by route; proxy internals: coalescing, internal ops, response tuple mode, tenant QPS; VL backend: upstream fanout, window count, backend latency, fetch/merge latency, adaptive parallelism; process resources: CPU, memory, goroutines, GC, network, disk I/O, PSI pressure). Section 3 — Deep Proxy Internals (cache tiers: T0/L1/L2/L3 hit/miss, sizes, stale hits, backend fallthrough; peer cache fleet: cluster members, hit/miss, write-through, hot read-ahead, error breakdown; query-range windowing: window cache, prefilter efficiency, retries, partial responses, prefilter duration, adaptive parallelism trace; patterns engine: in-memory count/bytes, mining rate, source line pipeline, snapshot hits/reuse, persistence; HTTP connection lifecycle: states, rotation reasons, transitions; tenant deep dive: per-tenant QPS/P99/errors)
Alert rules	`alerting/loki-vl-proxy-prometheusrule.yaml`	PrometheusRule/vmalert-oriented alert set with standardized labels and annotations
SRE runbooks	`docs/runbooks/alerts.md`	Index plus per-alert runbook files referenced directly from alert `runbook_url`

When using the Helm chart, the runtime templates consume synced copies in charts/loki-vl-proxy/{dashboards,alerting}. Keep canonical and chart copies aligned with:

./scripts/ci/sync_observability_assets.sh sync
./scripts/ci/sync_observability_assets.sh --check

--check is already enforced in CI to prevent drift.

Preventive Scaling And Deployment

Use the dedicated guide for prevention-oriented operations hardening:

docs/runbooks/deployment-best-practices.md

Critical defaults to reduce incident frequency:

run at least 2 replicas with PDB enabled
enable HPA with conservative downscale
tune cache TTLs differently for query paths vs metadata paths
monitor backend p95 and proxy p99 histograms, not averages
add synthetic in-cluster e2e query probes in addition to /ready

Multi-Tenancy

Tenant Mapping Strategies

The proxy maps X-Scope-OrgID headers to VictoriaLogs tenant IDs. Three strategies are available depending on deployment size and dynamism.

1. Inline JSON (`-tenant-map`)

Best for small, static tenant maps. The entire map is provided directly as a CLI flag or env var value:

-tenant-map='{"team-a":"vl-tenant-1","team-b":"vl-tenant-2"}'

This requires a proxy restart to update.

2. File-based (`-tenant-map-file`)

Best for Kubernetes environments where tenant maps are mounted as ConfigMaps. The proxy hot-reloads the file on SIGHUP and also polls for mtime changes on the configured interval:

-tenant-map-file=/etc/proxy/tenants.yaml
-tenant-map-reload-interval=30s

The default reload interval is 30s. To trigger an immediate reload without restarting the proxy:

kill -HUP <pid>

In Helm, configure a lifecycle hook to send SIGHUP on ConfigMap updates:

lifecycle:
  postStart:
    exec:
      command: ["/bin/sh", "-c", "kill -HUP 1"]

Polling every 30s means changes are picked up automatically even without an explicit signal, which suits ConfigMap-mounted files that are updated by an external controller.

3. Label-based (`-tenant-label`)

Routes per-query based on a label field value in the incoming stream. Useful when a single VictoriaLogs tenant holds multi-tenant data distinguished by a label such as service.name:

-tenant-label=service.name

When set, the proxy extracts the label value from the query or push request and uses it as the VictoriaLogs tenant ID, without requiring the client to set X-Scope-OrgID.

`-require-tenant-header` Flag

-require-tenant-header=true enforces that every request carries an X-Scope-OrgID header (returns HTTP 401 if missing) without enabling full auth. This is useful for catching misconfigured clients in multi-tenant setups without a full auth proxy.

This is distinct from -auth.enabled: the latter enables credential validation, while -require-tenant-header only checks for header presence.

Health Check Endpoints

The proxy exposes three operational endpoints:

Endpoint	Purpose	Kubernetes probe
`/alive`	Liveness — confirms the process is running	`livenessProbe`
`/ready`	Readiness — confirms the proxy is ready to serve traffic (backend reachable, warm-up complete)	`readinessProbe`
`/metrics`	Prometheus metrics scrape	ServiceMonitor / scrape config

If /ready stays non-ok immediately after a restart, check whether patterns or indexed label-values startup warm is configured — those persistence restores can intentionally hold readiness at 503 until warm-up completes.

Translation Modes

Translation guidance moved to dedicated docs:

Translation Modes Guide for mode selection and exact underscore vs dotted behavior
Configuration for flag reference
Translation Reference for LogQL-to-LogsQL execution mapping

Operational recommendation:

use label-style=underscores when upstream VL stores dotted OTel fields
use metadata-field-mode=hybrid for mixed Loki + OTel field workflows
use metadata-field-mode=translated for strict Loki-style field surfaces
use metadata-field-mode=native for OTel-native field-only surfaces

Capacity Planning

Memory

Component	Memory per Unit
L1 cache	~50MB per 10k entries
L2 disk cache (bbolt)	~10MB mmap overhead
Per active query	~1-5MB (depends on result size)
Singleflight coalescing buffer	Up to 256MB per unique query
Base process	~20MB

Formula: base(20MB) + cache(entries × 5KB) + concurrent_queries × 3MB

Default -cache-max is 10000 (binary default). The Helm chart ships 50000 to suit light-to-moderate production use. For 50k cache entries and 100 concurrent queries: ~570MB recommended limit.

CPU

The proxy is CPU-light. Main costs:

JSON marshaling/unmarshaling (~70% of CPU)
LogQL→LogsQL translation (~10%)
Label translation (~5%)
HTTP overhead (~15%)

Guideline: 1 CPU core handles ~2000 req/s.

Disk Cache

L2 disk cache with bbolt:

1 million entries ≈ 2-5GB on disk (gzip compressed)
Write amplification: ~2x with bbolt
Use fast SSD (NVMe) for the cache volume
Set disk-cache-flush-size=500 and disk-cache-flush-interval=10s for batched writes

Performance Tuning

Cache TTLs

Default TTLs are conservative. Adjust for your query patterns:

-cache-ttl=120s          # Increase for stable label sets
-cache-max=50000         # Increase for high-cardinality environments

Endpoint	Default TTL	Recommendation
labels	60s	120-300s if label set is stable
label_values	60s	60-120s
series	30s	30-60s
detected_fields	30s	30-60s
query_range	10s	5-30s depending on freshness needs
query	10s	5-30s

Concurrency Limits

-http-max-header-bytes=1048576   # 1MB default
-http-max-body-bytes=10485760    # 10MB default

The proxy uses singleflight to coalesce identical concurrent queries. N identical requests → 1 backend request.

Built-In Traffic Guards

All traffic guard controls are tunable via CLI flags (or extraArgs in the Helm chart):

Flag	Default	Description
`-rate-limit-per-second`	`50`	Per-client request rate (req/s)
`-rate-limit-burst`	`100`	Per-client burst allowance
`-max-concurrent`	`100`	Global concurrent backend query cap
`-cb-fail-threshold`	`5`	Failures within window to open circuit breaker
`-cb-open-duration`	`10s`	How long circuit breaker stays open
`-cb-window-duration`	`30s`	Failure counting window

If defaults are too strict or too loose for your workload, tune at the proxy first, then complement with:

reduced Grafana auto-refresh and retry pressure
ingress or service-mesh shaping in front of the proxy
scale out replicas and raise cache effectiveness before pushing more uncached load

Monitoring

See the dedicated Observability Guide for the full metrics catalog, JSON log schema, OTLP push configuration, and collector/agent integration examples.

Metrics

The proxy exposes Prometheus metrics at /metrics:

Use the Observability Guide as the canonical catalog for:

every documented loki_vl_proxy_* metric family
cardinality level (Low, Medium, High (capped)) for each family
scrape versus OTLP field/label mapping
the new fanout and proxy-internal operation metrics/log fields

Metric	Type	Primary dimensions	Description
`loki_vl_proxy_requests_total`	counter	`system`, `direction`, `endpoint`, `route`, `status`	Total requests by downstream Loki route or upstream backend route
`loki_vl_proxy_request_duration_seconds`	histogram	`system`, `direction`, `endpoint`, `route`	End-to-end request latency
`loki_vl_proxy_backend_duration_seconds`	histogram	`system`, `direction`, `endpoint`, `route`	Upstream-only latency for VictoriaLogs and rules/alerts backends
`loki_vl_proxy_cache_hits_by_endpoint` / `loki_vl_proxy_cache_misses_by_endpoint`	counter	`system`, `direction`, `endpoint`, `route`	Cache efficiency by normalized route
`loki_vl_proxy_tenant_requests_total` / `loki_vl_proxy_client_requests_total`	counter	tenant/client plus route dimensions	Hot tenants and clients per route
`loki_vl_proxy_process_*`	gauges/counters	metric family specific	Runtime, CPU, memory, disk, network, and PSI health

Key Ratios to Monitor

Route cache hit ratio: cache_hits_by_endpoint / (cache_hits_by_endpoint + cache_misses_by_endpoint) by endpoint,route — target >80% on stable metadata paths
Downstream error rate: requests_total{system="loki",direction="downstream",status=~"5.."} over total downstream requests — target <1%
Upstream latency: backend_duration_seconds by endpoint,route — use this to separate VictoriaLogs slowness from proxy-side work
End-to-end latency: request_duration_seconds{system="loki",direction="downstream"} by endpoint,route — compare with upstream latency and request logs

OTLP Push

Push metrics to an OTLP collector:

-otlp-endpoint=http://otel-collector:4318/v1/metrics
-otlp-interval=30s
-otlp-compression=gzip

The OTLP exporter reuses the same core proxy metric names that /metrics exposes, so dashboards and alert logic can stay aligned across scrape and push modes.

For exact proxy-only overhead on translated paths, use structured request logs with proxy.overhead_ms, proxy.duration_ms, and upstream.duration_ms. The metrics intentionally keep route-aware end-to-end and upstream histograms, while logs carry the per-request decomposition.

Troubleshooting

No Data in Grafana

Check proxy health: curl http://proxy:3100/ready
Check VL backend: curl http://vl:9428/health
Check proxy logs for translation errors
Verify label-style matches your VL ingestion format
Check /loki/api/v1/labels for available labels

Label Names Don't Match

Symptom	Cause	Fix
Dots in Grafana labels	`label-style=passthrough` with dotted VL data	Set `label-style=underscores`
Empty label_values for service_name	VL stores `service.name`, query asks `service_name`	Set `label-style=underscores`
Grafana Drilldown "failed to fetch"	Volume/stats endpoint issue	Check proxy logs, ensure VL v1.49+

High Memory Usage

Reduce -cache-max (default 10000)
Reduce -http-max-body-bytes
Add memory limits in Kubernetes
Check for singleflight amplification (many unique queries)

High Latency

Keep -response-compression=gzip for broad Loki/Grafana compatibility; auto now behaves the same on the frontend for legacy configs
Set -response-compression-min-bytes around 1024 to avoid wasting CPU on small metadata/control responses
Increase cache TTLs
Check VL backend latency via metrics
Rely on built-in singleflight coalescing for identical concurrent reads

Circuit Breaker Tripping

The circuit breaker opens after consecutive backend 5xx responses. Check:

VL backend health and logs
Network connectivity between proxy and VL
VL resource usage (CPU/memory/disk)

Backup & Recovery

The proxy is stateless. Only the optional disk cache needs backup:

L1 cache: In-memory, rebuilds on restart
L2 disk cache: bbolt file at -disk-cache-path. Can be deleted safely — will be repopulated.
Configuration: All config is CLI flags / env vars. Store in Helm values or ConfigMap.

Scaling

Horizontal Scaling

horizontalPodAutoscaling:
  enabled: true
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

Pod Disruption Budget

podDisruptionBudget:
  enabled: true
  minAvailable: 1

Multi-Zone Deployment

topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: ScheduleAnyway
    labelSelector:
      matchLabels:
        app.kubernetes.io/name: loki-vl-proxy

Deployment​

Minimum Requirements​

Helm Deployment​

Required Configuration​

Backend Auth Forwarding​

Operational Assets​

Preventive Scaling And Deployment​

Multi-Tenancy​

Tenant Mapping Strategies​

1. Inline JSON (-tenant-map)​

2. File-based (-tenant-map-file)​

3. Label-based (-tenant-label)​

-require-tenant-header Flag​

Health Check Endpoints​

Translation Modes​

Capacity Planning​

Memory​

CPU​

Disk Cache​

Performance Tuning​

Cache TTLs​

Concurrency Limits​

Built-In Traffic Guards​

Monitoring​

Metrics​

Key Ratios to Monitor​

OTLP Push​

Troubleshooting​

No Data in Grafana​

Label Names Don't Match​

High Memory Usage​

High Latency​

Circuit Breaker Tripping​

Backup & Recovery​

Scaling​

Horizontal Scaling​

Pod Disruption Budget​

Multi-Zone Deployment​