Cache and cost control

Use cache tiers and fleet cache to suppress repeated backend work

The strongest practical efficiency story in Loki-VL-proxy is not a generic head-to-head marketing claim. It is the concrete read-path work the proxy can eliminate with its 4-tier cache stack: Tier0 (compat), L1 in-memory (256 MB default), L2 disk (bbolt), and L3 peer cache (consistent hash + zstd) — plus a circuit breaker and request coalescer that protect the backend under load.

Open the performance docs Read fleet-cache architecture

Tier0 compat cache

Safe GET Loki-shaped responses can return before most compatibility work runs

Best for hot repeated read paths.

L1: 256 MB in-memory

Hot in-process LRU cache with near-zero overhead on repeated dashboard hits

Default 256 MB, tunable per deployment.

L2: bbolt disk cache

Persistent local cache survives RAM pressure and pod restarts

0.45 µs uncompressed read, 3.9 µs compressed read.

L3: peer cache (consistent hash + zstd)

Fleet-wide reuse across replicas via consistent-hash ownership and zstd-compressed transfer

52 ns warm shadow-copy hit after first owner fetch.

Layer	Plain-English role	What it buys you
Tier0 (compat)	Fast answer cache at the Loki-compatible frontend — keyed on exact request shape.	Repeated Grafana reads can return before most proxy logic runs. `query_range` warm hit: 0.64–0.67 µs.
L1 in-memory (256 MB default)	Hot LRU cache inside the local process.	Best-case latency for repeated dashboards and Explore refreshes. Tunable per deployment.
L2 disk (bbolt)	Persistent local cache backed by bbolt B-tree.	Survives RAM pressure and pod restarts. 0.45 µs uncompressed, 3.9 µs compressed read.
L3 peer cache (consistent hash + zstd)	Fleet-wide reuse between replicas using consistent-hash ownership and zstd-compressed transfer.	52 ns warm shadow-copy hit. One warm pod serves the rest of the fleet without backend round-trips.

Path	Slow path	Fast path	Why it matters
query_range	4.58 ms cold miss with delayed backend	0.64-0.67 us warm cache hit	Repeated dashboards stop behaving like backend-bound requests.
detected_field_values	2.76 ms without Tier0	0.71 us with Tier0	Drilldown metadata becomes effectively instant after warm-up.
L2 disk cache	backend refill path	0.45 us uncompressed read, 3.9 us compressed read	Persistent cache stays cheap enough to matter on hot paths.
L3 peer cache	backend or owner refetch	52 ns warm shadow-copy hit	A warm fleet can reuse work instead of repeating it.

Circuit breaker and request coalescer

Sliding 30s window circuit breaker opens after 5 failures, shielding VictoriaLogs from cascading retries.
Request coalescer deduplicates in-flight identical requests — concurrent Grafana panel refreshes become a single backend call.
Prefilter eliminates ~81.6% of empty-window backend calls on long-range queries before they reach VictoriaLogs.
These mechanisms work alongside the cache tiers: coalescing prevents parallel requests from causing redundant cache misses.

Where the proxy can be cheaper than an uncached read path

Repeated dashboards hammering the same `query_range` windows.
Explore or Drilldown metadata paths that users refresh over and over.
Replica fleets where the same query otherwise fans out into repeated backend calls.
Long-range historical reads that benefit from split-window reuse and prefiltering.

What the project does not claim

It does not publish a blanket native Loki versus VictoriaLogs total-cost benchmark.
It does not claim every workload is faster through a compatibility layer.
It does claim explicit cache, coalescing, and route-aware tuning levers on the read path.
It does publish the benchmark and runtime signals needed to judge those levers honestly.

Metrics that prove cache value

`loki_vl_proxy_cache_hits_by_endpoint` and `_misses_by_endpoint` by route.
`loki_vl_proxy_window_cache_hit_total` and `_miss_total` for long-range queries.
`loki_vl_proxy_window_fetch_seconds` and `_merge_seconds` for range-work cost.
`loki_vl_proxy_peer_cache_hits_total`, `_misses_total`, and `_errors_total` for fleet behavior.

How to think about cost control

The practical cost story is about suppressing repeated backend work, not about hiding the backend. When cache hit ratio rises on hot routes, VictoriaLogs work per user action goes down and user latency usually follows.