Skip to main content

Benchmarks

Measured on Apple M3 Max (14 cores), Go 1.26.2, -benchmem.

Per-Request Latency

OperationLatencyAllocsBytes/opNotes
Labels (cache hit)2.0 us256.6 KBServe from in-memory cache
QueryRange (cache hit)118 us600142 KBQuery translation + cache lookup
wrapAsLokiResponse2.8 us582.6 KBJSON re-envelope
VL NDJSON to Loki streams (100 lines)170 us311870 KBParse + group + convert (pooled)
LogQL translation~5 us~20~2 KBString manipulation (no AST)

Throughput

ScenarioRequestsConcurrencyThroughputCache Hit %Memory Growth
Labels (cache hit)100,000100175,726 req/s98.2%0.5 MB
QueryRange (cache miss, 1ms backend)5,0005012,976 req/s0%-

Scaling Profile (No Cache — Raw Proxy Overhead)

ProfileRequestsConcurrencyThroughputAvg LatencyTotal AllocLive HeapErrors
low (100 rps)1,000108,062 req/s124 us136 MB0.9 MB0
medium (1K rps)5,0005012,465 req/s80 us572 MB1.3 MB0
high (10K rps)20,00020039,057 req/s26 us1,331 MB8.7 MB0

Key observations:

  • Live heap stays <10 MB even at 20K requests — GC keeps up
  • Total alloc is high (~70 KB/request) due to JSON parse/serialize — this is GC pressure, not leak
  • No errors at 200 concurrent connections (after connection pool tuning)

Scaling Profile (With Cache)

ProfileRequestsConcurrencyThroughputAvg LatencyLive Heap
low (100 rps)1,000108,207 req/s122 us1.1 MB
medium (1K rps)5,0005012,821 req/s78 us1.1 MB

Cache provides marginal throughput improvement but dramatically reduces backend load (98%+ hit rate).

Resource Usage at Scale

Measured from load tests (proxy overhead only, excludes network I/O):

Load (req/s)CPU (single core)Memory (steady state)Notes
100<1%~10 MBIdle, mostly cache hits
1,000~8%~20 MBMix of cache hits/misses
10,000~30%~50 MBSignificant cache miss rate, backend-bound
40,000+~100%~100 MBCPU-bound, needs horizontal scaling

The proxy is CPU-bound at high load. Memory usage is stable — the cache has a fixed maximum size (configurable via -cache-max). Scaling strategy:

  • < 1,000 req/s: Single replica, 100m CPU, 128Mi memory
  • 1,000-10,000 req/s: 2-3 replicas with HPA on CPU
  • > 10,000 req/s: HPA with 5+ replicas, tune cache-max for hit rate

Connection Pool Tuning

The proxy's HTTP transport is tuned for high-concurrency single-backend proxying:

transport.MaxIdleConns = 256 // total idle connections
transport.MaxIdleConnsPerHost = 256 // all slots for VL (single backend)
transport.MaxConnsPerHost = 0 // unlimited concurrent connections
transport.IdleConnTimeout = 90s // reuse connections

Go's defaults (MaxIdleConnsPerHost=2) cause ephemeral port exhaustion at >50 concurrent requests. Our tuning eliminates this — tested clean at 200 concurrency, 33K req/s.

Known Hot Paths

  1. VL NDJSON to Loki streams (3118 allocs/100 lines, down from 3417): Optimized with byte scanning (no strings.Split), sync.Pool for JSON entry maps, pre-allocated slice estimates. 49% memory reduction from original. Remaining allocs are from json.Unmarshal internals — further gains need a custom tokenizer.

  2. QueryRange cache hit (600 allocs/request): Even on cache hit, response bytes are re-parsed and re-serialized. Serving raw cached bytes would eliminate this overhead.

Running Benchmarks

# All proxy benchmarks
go test ./internal/proxy/ -bench . -benchmem -run "^$" -count=3

# Translator benchmarks
go test ./internal/translator/ -bench . -benchmem -run "^$" -count=3

# Cache benchmarks
go test ./internal/cache/ -bench . -benchmem -run "^$" -count=3

# Load tests (requires no -short flag)
go test ./internal/proxy/ -run "TestLoad" -v -timeout=60s

# Profile CPU
go test ./internal/proxy/ -bench BenchmarkVLLogsToLokiStreams -cpuprofile=cpu.prof
go tool pprof cpu.prof

# Profile memory
go test ./internal/proxy/ -bench BenchmarkVLLogsToLokiStreams -memprofile=mem.prof
go tool pprof mem.prof