Skip to main content

Cache and cost control

Use cache tiers and fleet cache to suppress repeated backend work

The strongest practical efficiency story in Loki-VL-proxy is not a generic head-to-head marketing claim. It is the concrete read-path work the proxy can eliminate with its 4-tier cache stack: Tier0 (compat), L1 in-memory (256 MB default), L2 disk (bbolt), and L3 peer cache (consistent hash + zstd) — plus a circuit breaker and request coalescer that protect the backend under load.

Tier0 compat cache
Safe GET Loki-shaped responses can return before most compatibility work runs
Best for hot repeated read paths.
L1: 256 MB in-memory
Hot in-process LRU cache with near-zero overhead on repeated dashboard hits
Default 256 MB, tunable per deployment.
L2: bbolt disk cache
Persistent local cache survives RAM pressure and pod restarts
0.45 µs uncompressed read, 3.9 µs compressed read.
L3: peer cache (consistent hash + zstd)
Fleet-wide reuse across replicas via consistent-hash ownership and zstd-compressed transfer
52 ns warm shadow-copy hit after first owner fetch.
LayerPlain-English roleWhat it buys you
Tier0 (compat)Fast answer cache at the Loki-compatible frontend — keyed on exact request shape.Repeated Grafana reads can return before most proxy logic runs. query_range warm hit: 0.64–0.67 µs.
L1 in-memory (256 MB default)Hot LRU cache inside the local process.Best-case latency for repeated dashboards and Explore refreshes. Tunable per deployment.
L2 disk (bbolt)Persistent local cache backed by bbolt B-tree.Survives RAM pressure and pod restarts. 0.45 µs uncompressed, 3.9 µs compressed read.
L3 peer cache (consistent hash + zstd)Fleet-wide reuse between replicas using consistent-hash ownership and zstd-compressed transfer.52 ns warm shadow-copy hit. One warm pod serves the rest of the fleet without backend round-trips.
PathSlow pathFast pathWhy it matters
query_range4.58 ms cold miss with delayed backend0.64-0.67 us warm cache hitRepeated dashboards stop behaving like backend-bound requests.
detected_field_values2.76 ms without Tier00.71 us with Tier0Drilldown metadata becomes effectively instant after warm-up.
L2 disk cachebackend refill path0.45 us uncompressed read, 3.9 us compressed readPersistent cache stays cheap enough to matter on hot paths.
L3 peer cachebackend or owner refetch52 ns warm shadow-copy hitA warm fleet can reuse work instead of repeating it.

Circuit breaker and request coalescer

  • Sliding 30s window circuit breaker opens after 5 failures, shielding VictoriaLogs from cascading retries.
  • Request coalescer deduplicates in-flight identical requests — concurrent Grafana panel refreshes become a single backend call.
  • Prefilter eliminates ~81.6% of empty-window backend calls on long-range queries before they reach VictoriaLogs.
  • These mechanisms work alongside the cache tiers: coalescing prevents parallel requests from causing redundant cache misses.

Where the proxy can be cheaper than an uncached read path

  • Repeated dashboards hammering the same `query_range` windows.
  • Explore or Drilldown metadata paths that users refresh over and over.
  • Replica fleets where the same query otherwise fans out into repeated backend calls.
  • Long-range historical reads that benefit from split-window reuse and prefiltering.

What the project does not claim

  • It does not publish a blanket native Loki versus VictoriaLogs total-cost benchmark.
  • It does not claim every workload is faster through a compatibility layer.
  • It does claim explicit cache, coalescing, and route-aware tuning levers on the read path.
  • It does publish the benchmark and runtime signals needed to judge those levers honestly.

Metrics that prove cache value

  • `loki_vl_proxy_cache_hits_by_endpoint` and `_misses_by_endpoint` by route.
  • `loki_vl_proxy_window_cache_hit_total` and `_miss_total` for long-range queries.
  • `loki_vl_proxy_window_fetch_seconds` and `_merge_seconds` for range-work cost.
  • `loki_vl_proxy_peer_cache_hits_total`, `_misses_total`, and `_errors_total` for fleet behavior.

How to think about cost control

The practical cost story is about suppressing repeated backend work, not about hiding the backend. When cache hit ratio rises on hot routes, VictoriaLogs work per user action goes down and user latency usually follows.