Skip to main content

Cache and cost control

Use cache tiers and fleet cache to suppress repeated backend work

The strongest practical efficiency story in Loki-VL-proxy is not a generic head-to-head marketing claim. It is the concrete read-path work the proxy can eliminate with Tier0, local cache, disk cache, peer cache, and long-range query window reuse.

Tier0 edge cache
Safe GET Loki-shaped responses can return before most compatibility work runs
Best for hot repeated read paths.
L1 to L3 stack
Memory, disk, and peer reuse reduce repeated backend work at different scopes
Local pod, persistent pod, or fleet-wide.
Window cache for long ranges
Long query_range requests can reuse split history windows instead of refetching them whole
Useful for 2d and 7d dashboards.
Operator-visible levers
The project exports the metrics needed to decide whether the cache stack is paying off
This is not hidden magic.
LayerPlain-English roleWhat it buys you
Tier0Fast answer cache at the Loki-compatible frontend.Repeated Grafana reads can return before most proxy logic runs.
L1 memoryHot cache inside the local process.Best-case latency for repeated dashboards and Explore refreshes.
L2 diskPersistent local cache.Useful cache survives beyond RAM pressure and restarts.
L3 peer cacheFleet-wide reuse between replicas.One warm pod can make the rest of the fleet cheaper and faster.
PathSlow pathFast pathWhy it matters
query_range4.58 ms cold miss with delayed backend0.64-0.67 us warm cache hitRepeated dashboards stop behaving like backend-bound requests.
detected_field_values2.76 ms without Tier00.71 us with Tier0Drilldown metadata becomes effectively instant after warm-up.
L2 disk cachebackend refill path0.45 us uncompressed read, 3.9 us compressed readPersistent cache stays cheap enough to matter on hot paths.
L3 peer cachebackend or owner refetch52 ns warm shadow-copy hitA warm fleet can reuse work instead of repeating it.

Where the proxy can be cheaper than an uncached read path

  • Repeated dashboards hammering the same `query_range` windows.
  • Explore or Drilldown metadata paths that users refresh over and over.
  • Replica fleets where the same query otherwise fans out into repeated backend calls.
  • Long-range historical reads that benefit from split-window reuse and prefiltering.

What the project does not claim

  • It does not publish a blanket native Loki versus VictoriaLogs total-cost benchmark.
  • It does not claim every workload is faster through a compatibility layer.
  • It does claim explicit cache, coalescing, and route-aware tuning levers on the read path.
  • It does publish the benchmark and runtime signals needed to judge those levers honestly.

Metrics that prove cache value

  • `loki_vl_proxy_cache_hits_by_endpoint` and `_misses_by_endpoint` by route.
  • `loki_vl_proxy_window_cache_hit_total` and `_miss_total` for long-range queries.
  • `loki_vl_proxy_window_fetch_seconds` and `_merge_seconds` for range-work cost.
  • `loki_vl_proxy_peer_cache_hits_total`, `_misses_total`, and `_errors_total` for fleet behavior.

How to think about cost control

The practical cost story is about suppressing repeated backend work, not about hiding the backend. When cache hit ratio rises on hot routes, VictoriaLogs work per user action goes down and user latency usually follows.