Skip to main content

Scaling & Capacity Planning

Resource Model

Loki-VL-proxy is a stateless HTTP proxy with optional disk cache. Resource consumption scales with:

  • CPU: proportional to request rate + LogQL translation complexity
  • Memory: proportional to L1 cache size + concurrent request response buffers
  • Disk: proportional to L2 cache size (compressed, ~3.4x reduction with gzip)
  • Network: proportional to request rate × response size

Resource Projections

Based on benchmarks (Go 1.26.2, single core, typical Grafana dashboard workload with 60% cache hit rate):

Single Replica

Metric10 req/s100 req/s1,000 req/s10,000 req/s
CPU (cores)0.010.10.5-1.04-8
Memory (MB)50-100100-256256-512512-2048
L1 Cache (entries)1,0005,00010,00050,000
L1 Cache (MB)10-5050-100100-256256-512
Network in (Mbps)0.111050-100
Network out (Mbps)0.5550200-500
VL queries/s4 (60% cached)404002,000-4,000
Goroutines20-5050-200200-1,0001,000-5,000

Disk Cache (L2)

Disk SizeEntries (compressed)Hit Rate BoostUse Case
1 GB~18k+10-15%Small team, development
5 GB~94k+15-25%Medium org, shared dashboards
10 GB~189k+20-30%Large org, heavy dashboard usage
50 GB~948k+25-35%Enterprise, long TTL requirements

Compression ratio: ~29% (gzip), meaning 1 GB on disk holds ~3.4 GB of raw cache data.

Fleet (Multiple Replicas)

Total req/sReplicasCPU (total)Memory (total)Disk (total)Cache Hit Rate
1010.1 core128 MB1 GB60-70%
10020.5 cores512 MB2 GB65-75%
1,00042-4 cores1-2 GB10 GB70-80%
10,000168-16 cores4-8 GB40 GB75-85%

With fleet peer cache, the effective cache size is L1_per_pod × N + L2_per_pod × N, and cache hit rate improves because each key lives on exactly one peer (no duplication across pods).

Workload Profiles

Dashboard-Heavy (Grafana auto-refresh)

Characteristics: 70-85% repeat queries, 15-30 panels per dashboard, 5-60s refresh
Cache behavior: Very high hit rate after first load
Bottleneck: Memory (large response buffers), Network out
DashboardsUsersreq/sRecommended
10510-501 pod, 256MB, 1GB disk
502550-2502 pods, 512MB, 5GB disk
200100200-10004 pods, 1GB, 10GB disk
10005001000-50008-16 pods, 2GB, 10GB disk

Explore-Heavy (Ad-hoc queries)

Characteristics: 20-30% repeat queries, unique time ranges, high cardinality
Cache behavior: Lower hit rate, disk cache catches some repeats
Bottleneck: CPU (translation), VL backend throughput
Active Usersreq/sRecommended
55-201 pod, 256MB, 1GB disk
2525-1002 pods, 512MB, 5GB disk
100100-5004 pods, 1GB, 10GB disk
500500-20008 pods, 2GB, 20GB disk

Mixed (Production Typical)

Characteristics: 50% dashboards, 30% alerts, 20% explore
Cache behavior: Moderate hit rate, steady state after warm-up
Bottleneck: Balanced CPU/memory/network

Helm Values by Scale

Small (< 100 req/s)

replicaCount: 1
workload:
kind: StatefulSet
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 256Mi
extraArgs:
cache-ttl: "60s"
cache-max: "5000"
disk-cache-path: "/cache/proxy.db"
disk-cache-compress: "true"
persistence:
enabled: true
size: 1Gi

Medium (100-1,000 req/s)

replicaCount: 3
workload:
kind: StatefulSet
resources:
requests:
cpu: 250m
memory: 256Mi
limits:
cpu: "1"
memory: 512Mi
extraArgs:
cache-ttl: "60s"
cache-max: "10000"
disk-cache-path: "/cache/proxy.db"
disk-cache-compress: "true"
peerCache:
enabled: true
discovery: dns
persistence:
enabled: true
size: 5Gi
horizontalPodAutoscaling:
enabled: true
minReplicas: 2
maxReplicas: 6
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70

Large (1,000-10,000 req/s)

replicaCount: 8
workload:
kind: StatefulSet
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: "2"
memory: 2Gi
extraArgs:
cache-ttl: "120s"
cache-max: "50000"
disk-cache-path: "/cache/proxy.db"
disk-cache-compress: "true"
disk-cache-flush-size: "500"
peerCache:
enabled: true
discovery: dns
persistence:
enabled: true
size: 10Gi
storageClass: gp3 # high IOPS SSD
horizontalPodAutoscaling:
enabled: true
minReplicas: 4
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
podDisruptionBudget:
minAvailable: 2

The current chart does not expose direct rate-limit, max-concurrency, or circuit-breaker tuning flags. The proxy currently uses built-in defaults in code:

  • per-client rate limit 50 req/s
  • per-client burst 100
  • global concurrent backend queries 100
  • circuit breaker open after 5 failures for 10s

Use HPA, cache sizing, Grafana refresh policy, and outer traffic shaping as the main scaling controls.

Monitoring Metrics

Per-Tenant (Rate, Throughput, Latency)

# Request rate per tenant
sum(rate(loki_vl_proxy_tenant_requests_total{system="loki",direction="downstream"}[5m])) by (tenant)

# P99 latency per tenant
histogram_quantile(0.99, sum(rate(loki_vl_proxy_tenant_request_duration_seconds_bucket{system="loki",direction="downstream"}[5m])) by (le, tenant))

# Error rate per tenant
sum(rate(loki_vl_proxy_tenant_requests_total{status=~"4..|5.."}[5m])) by (tenant)
/ sum(rate(loki_vl_proxy_tenant_requests_total[5m])) by (tenant)

Per-Client (User Identity)

Client identity is resolved from (priority order):

  1. Trusted user headers when -metrics.trust-proxy-headers=true (X-Grafana-User, X-Forwarded-User, X-Webauth-User, X-Auth-Request-User)
  2. X-Scope-OrgID header (tenant)
  3. X-Forwarded-For when -metrics.trust-proxy-headers=true
  4. Remote IP (fallback)

Datasource/basic-auth credentials are tracked separately and are not used as client identity.

# Request rate per client
sum(rate(loki_vl_proxy_client_requests_total{system="loki",direction="downstream"}[5m])) by (client)

# Throughput per client (bytes/s)
sum(rate(loki_vl_proxy_client_response_bytes_total[5m])) by (client)

# P95 latency per client
histogram_quantile(0.95, sum(rate(loki_vl_proxy_client_request_duration_seconds_bucket{system="loki",direction="downstream"}[5m])) by (le, client))

# Top 10 clients by request count
topk(10, sum(rate(loki_vl_proxy_client_requests_total[5m])) by (client))

# Clients generating the most 429s
topk(10, sum(rate(loki_vl_proxy_client_status_total{status="429"}[5m])) by (client))

# Clients issuing the largest queries
topk(10, histogram_quantile(0.95, sum(rate(loki_vl_proxy_client_query_length_chars_bucket{system="loki",direction="downstream"}[5m])) by (le, client)))

Fleet Peer-Cache

# Current remote peers per proxy
loki_vl_proxy_peer_cache_peers

# Total fleet members in the hash ring
loki_vl_proxy_peer_cache_cluster_members

# Peer cache effectiveness
rate(loki_vl_proxy_peer_cache_hits_total[5m])
/
(rate(loki_vl_proxy_peer_cache_hits_total[5m]) + rate(loki_vl_proxy_peer_cache_misses_total[5m]))

# Peer fetch failure rate
rate(loki_vl_proxy_peer_cache_errors_total[5m])

Cache Efficiency

# Overall cache hit rate
loki_vl_proxy_cache_hits_total / (loki_vl_proxy_cache_hits_total + loki_vl_proxy_cache_misses_total)

# VL backend load (actual queries reaching VL)
sum(rate(loki_vl_proxy_backend_duration_seconds_count[5m]))

# Backend requests saved by coalescing
rate(loki_vl_proxy_coalesced_saved_total[5m])

Resource Utilization

# Heap usage vs limit
go_memstats_alloc_bytes / go_memstats_sys_bytes

# Active goroutines (proxy concurrency)
go_goroutines

# GC pressure
rate(go_gc_cycles_total[5m])

Availability Patterns

Single-Region HA

  • Minimum 3 replicas for HA
  • PDB ensures at least 2 pods during rolling updates
  • Anti-affinity spreads pods across nodes/zones
  • Proxy is stateless — any pod can serve any request

Multi-Region

Disk I/O Characteristics

The L2 disk cache uses bbolt (B+ tree) optimized for sequential I/O:

OperationLatencyIOPSNotes
Read (cache hit)~22µs~45,000/sSequential B+ tree scan
Write (batch flush)~200µs/batch~5,000 entries/sWrite-back buffer, async
CompactionBackgroundMinimalbbolt self-manages

Disk type recommendations:

  • gp3/gp2 SSD: Best for < 1,000 req/s
  • io2/io1 SSD: For > 1,000 req/s with high cache churn
  • st1 HDD: Viable for read-heavy, low-churn workloads (bbolt's sequential I/O is HDD-friendly)

Cost Estimation

Example: AWS EKS, us-east-1, 1,000 req/s workload

ComponentSpecMonthly Cost
4x t3.medium pods2 vCPU, 4GB RAM~$120
4x 10GB gp3 volumes3,000 IOPS, 125 MB/s~$14
VictoriaLogs (backend)VariesVaries
Total proxy infra~$134/month

Treat this as proxy-only infrastructure cost, not a full backend comparison. Actual Loki versus VictoriaLogs economics depend on ingestion profile, retention window, storage class, and query mix. Loki's own docs note that high-cardinality labels hurt performance and cost-effectiveness, while VictoriaLogs publishes lower-RAM and lower-disk claims and third-party case studies report materially lower CPU, RAM, and storage usage on search-heavy workloads. Use the route-aware metrics and cache ratios in this project to validate the savings against your own traffic rather than assuming a fixed multiplier.