Cache and cost control

Use cache tiers and fleet cache to suppress repeated backend work

The strongest practical efficiency story in Loki-VL-proxy is not a generic head-to-head marketing claim. It is the concrete read-path work the proxy can eliminate with Tier0, local cache, disk cache, peer cache, and long-range query window reuse.

Open the performance docs Read fleet-cache architecture

Tier0 edge cache

Safe GET Loki-shaped responses can return before most compatibility work runs

Best for hot repeated read paths.

L1 to L3 stack

Memory, disk, and peer reuse reduce repeated backend work at different scopes

Local pod, persistent pod, or fleet-wide.

Window cache for long ranges

Long query_range requests can reuse split history windows instead of refetching them whole

Useful for 2d and 7d dashboards.

Operator-visible levers

The project exports the metrics needed to decide whether the cache stack is paying off

This is not hidden magic.

Layer	Plain-English role	What it buys you
Tier0	Fast answer cache at the Loki-compatible frontend.	Repeated Grafana reads can return before most proxy logic runs.
L1 memory	Hot cache inside the local process.	Best-case latency for repeated dashboards and Explore refreshes.
L2 disk	Persistent local cache.	Useful cache survives beyond RAM pressure and restarts.
L3 peer cache	Fleet-wide reuse between replicas.	One warm pod can make the rest of the fleet cheaper and faster.

Path	Slow path	Fast path	Why it matters
query_range	4.58 ms cold miss with delayed backend	0.64-0.67 us warm cache hit	Repeated dashboards stop behaving like backend-bound requests.
detected_field_values	2.76 ms without Tier0	0.71 us with Tier0	Drilldown metadata becomes effectively instant after warm-up.
L2 disk cache	backend refill path	0.45 us uncompressed read, 3.9 us compressed read	Persistent cache stays cheap enough to matter on hot paths.
L3 peer cache	backend or owner refetch	52 ns warm shadow-copy hit	A warm fleet can reuse work instead of repeating it.

Where the proxy can be cheaper than an uncached read path

Repeated dashboards hammering the same `query_range` windows.
Explore or Drilldown metadata paths that users refresh over and over.
Replica fleets where the same query otherwise fans out into repeated backend calls.
Long-range historical reads that benefit from split-window reuse and prefiltering.

What the project does not claim

It does not publish a blanket native Loki versus VictoriaLogs total-cost benchmark.
It does not claim every workload is faster through a compatibility layer.
It does claim explicit cache, coalescing, and route-aware tuning levers on the read path.
It does publish the benchmark and runtime signals needed to judge those levers honestly.

Metrics that prove cache value

`loki_vl_proxy_cache_hits_by_endpoint` and `_misses_by_endpoint` by route.
`loki_vl_proxy_window_cache_hit_total` and `_miss_total` for long-range queries.
`loki_vl_proxy_window_fetch_seconds` and `_merge_seconds` for range-work cost.
`loki_vl_proxy_peer_cache_hits_total`, `_misses_total`, and `_errors_total` for fleet behavior.

How to think about cost control

The practical cost story is about suppressing repeated backend work, not about hiding the backend. When cache hit ratio rises on hot routes, VictoriaLogs work per user action goes down and user latency usually follows.

Use cache tiers and fleet cache to suppress repeated backend work

Where the proxy can be cheaper than an uncached read path

What the project does not claim

Metrics that prove cache value

How to think about cost control

Related docs