Cost and performance

Compare VictoriaLogs and Loki with a source-backed cost lens

The cost story is not that a proxy magically makes every logging stack cheap. The defensible argument is narrower: Loki's own docs describe a label-indexed system that is sensitive to high-cardinality labels, VictoriaLogs publishes an all-field index and lower-resource claims, and Loki-VL-proxy adds concrete read-path caches plus observability so repeated Grafana traffic can cost less.

Read the cache guide Read the monitoring guide

Label-only vs all-field

Loki indexes labels; VictoriaLogs indexes all fields

That difference matters most on broad or text-heavy searches.

High-cardinality posture

Loki warns about high-cardinality labels; VictoriaLogs supports high-cardinality fields

The proxy keeps those richer fields usable without forcing Grafana off Loki semantics.

Cache stack on top

Tier0, L1, L2, L3, and window cache can suppress repeated read work

This is where the proxy adds its own efficiency story.

Workload dependent

The right answer depends on retention, search mix, and dashboard repetition

This page separates vendor claims from project benchmarks and third-party reports.

Loki docs publish big tiers

Grafana’s own sizing guide reaches 431 vCPU / 857 Gi at 3-30 TB/day

That gives the cost discussion a real published compute floor instead of a hand-wavy cluster sketch.

Dimension	What official Loki docs say	What VictoriaLogs docs or published reports say	What Loki-VL-proxy adds
Indexing strategy	Loki docs: labels index streams, but the content of each log line is not indexed.	VictoriaLogs docs: all fields are indexed and the query model supports full-text search across fields.	The proxy keeps the Loki read contract in front of that backend so Grafana can stay on the native Loki datasource.
High-cardinality behavior	Loki docs recommend low-cardinality labels and warn that high cardinality hurts performance and cost-effectiveness.	VictoriaLogs docs say high-cardinality values such as `trace_id`, `user_id`, and `ip` work fine as fields as long as they are not used as stream fields.	The proxy lets Grafana keep Loki-safe label surfaces while VictoriaLogs keeps the richer field model underneath.
Search-heavy workloads	Broad or text-heavy searches can devolve into stream selection plus line filtering because line content is not indexed.	VictoriaLogs publishes fast full-text search as a core capability, and third-party benchmarks report materially faster broad-search latency on large datasets.	Tier0, L1/L2/L3, and window cache can further suppress repeated read work after the first expensive search path.
Operational shape	Loki can run single-binary, but its scalable architecture is microservices-based with multiple components.	VictoriaLogs docs position the backend as a simple single executable on the easy path, but they also document cluster mode with `vlinsert`, `vlselect`, `vlstorage`, replication, multi-level cluster setup, and HA patterns across independent availability zones.	The proxy adds one small read-side compatibility layer with route-aware metrics and structured logs instead of hiding translation work inside clients, and can sit in front of either single-node or clustered VictoriaLogs.
Published resource claims	Grafana docs do not market a universal fixed savings ratio; they emphasize label strategy, storage, and deployment architecture.	VictoriaLogs docs publish up to `30x` less RAM and up to `15x` less disk than Loki or Elasticsearch, while TrueFoundry reports `≈40%` less storage and much lower CPU and RAM on its workload.	The proxy adds its own small runtime cost, but published project benchmarks show it remains CPU-light and can sharply reduce repeated backend work through caching.
Published large-workload sizing	Grafana’s own sizing guide reaches `431 vCPU / 857 Gi` at `3-30 TB/day` and `1221 vCPU / 2235 Gi` around `30 TB/day` before query spikes.	VictoriaLogs docs do not publish an equivalent distributed tier matrix on the same shape; the safer claim is lower-resource posture plus stronger compression and search behavior on published comparisons.	The proxy does not change backend ingest economics by itself, but it keeps the read side small and can cut repeated backend work through tiered caches and route-aware control.
Cross-AZ traffic posture	Loki docs say distributors forward writes to a replication factor that is generally `3`, queriers query all ingesters for in-memory data, and the zone-aware replication design explicitly lists minimizing cross-zone traffic costs as a non-goal.	VictoriaLogs cluster docs support independent clusters in separate availability zones plus advanced multi-level cluster setup, which lets operators keep most normal reads local and reserve cross-AZ fanout for HA or global queries.	The proxy can stay AZ-local on the read path and adds `zstd`/`gzip` compression on the hops it controls, but it does not invent backend replication savings that the VictoriaLogs docs do not quantify.

Why the Loki cost floor matters

Grafana already publishes large distributed Loki sizing floors by ingest throughput, so the high-end compute side is not a vague anti-Loki argument.
At `3-30 TB/day`, the published Loki floor is `431 vCPU / 857 Gi` before storage and before the `10x` querier-spike warning in the same docs.
That is why this project’s cost page converts Loki’s own sizing guide into on-demand EC2 floors before comparing it with a smaller `VictoriaLogs + Loki-VL-proxy` reference pack.
The proxy layer is intentionally modeled as a small read-path tax, not as the source of backend ingest savings.

Where the savings argument is strongest

Search-heavy workloads where users often scan broad time ranges for words, phrases, or IDs.
Data models with many useful fields and high-cardinality values that should stay as fields rather than labels.
Repeated Grafana dashboard, Explore, or Drilldown reads that can hit Tier0, local cache, disk cache, or peer cache.
Migrations where you want VictoriaLogs economics without forcing Grafana and Loki API clients to change first.

Where to be precise instead of hype-driven

Do not present the proxy as a generic ingestion benchmark; standard Loki push stays blocked.
Do not treat third-party workload numbers as universal truths for every cluster.
Do not attribute VictoriaLogs backend savings to the proxy itself; the proxy adds read-path suppression and migration control.
Do compare end-to-end client latency with upstream latency so you can see whether the proxy or the backend owns the cost.

What the proxy measurably contributes

`query_range` warm hits in the published project benchmark land at `0.64-0.67 us` versus `4.58 ms` on the cold delayed path.
`detected_field_values` warm hits land at `0.71 us` versus `2.76 ms` without Tier0.
Peer-cache warm shadow-copy hits land at `52 ns` after the first owner fetch.
Long-range prefiltering cut backend query calls by about `81.6%` on the published benchmark shape.

How to verify the savings in another environment

Track `loki_vl_proxy_requests_total` and `loki_vl_proxy_request_duration_seconds` by `endpoint` and `route`.
Compare `loki_vl_proxy_backend_duration_seconds` with downstream latency to isolate proxy overhead from VictoriaLogs slowness.
Watch `loki_vl_proxy_cache_hits_by_endpoint` and `_misses_by_endpoint` to see whether repeated reads are really being suppressed.
Use structured logs with `proxy.overhead_ms` and `upstream.duration_ms` for exact per-request decomposition.

Loki published sizing converted to EC2

Loki docs ingest tier	Published base request	Illustrative EC2 floor	Monthly compute floor
<3 TB/day	38 vCPU / 59 Gi	3 x c7i.4xlarge	$1,489.20 / month
3-30 TB/day	431 vCPU / 857 Gi	27 x c7i.4xlarge	$13,402.80 / month
~30 TB/day	1221 vCPU / 2235 Gi	77 x c7i.4xlarge	$38,222.80 / month

This uses simple `c7i.4xlarge` on-demand packing in `us-east-1` to turn Grafana's published CPU and memory requests into an operator-readable monthly floor. These AWS rows are pure calculations to put `$$` around the comparison, not observed cloud bills.

Illustrative monthly cost scenarios

Scenario	Active users	Ingest	Raw ingest/day	Loki total	Proxy + VL total	Monthly delta	Savings
Small	100	100k lines/s	2.16 TB/day	$1,681.20	$369.16	$1,312.04	78.0%
Medium	1,000	500k lines/s	10.8 TB/day	$14,362.80	$1,101.20	$13,261.60	92.3%
Large	10,000	1M lines/s	21.6 TB/day	$15,322.80	$2,388.55	$12,934.25	84.4%

These scenarios assume `7d` retention, `250 B` average raw line size, and a conservative VictoriaLogs storage factor of `10x`, even though some real deployments observe much higher data-block-only compression ratios.

Real-life tested VictoriaLogs baseline

Real snapshot: `800 M` total entries, `112 M` ingested in `24h`, `310 GiB` ingested in `24h`, and `40.5 GiB` on disk.
The observed compression ratio is `54.9`, which implies about `5.65 GiB/day` of compressed data blocks.
`800 M / 112 M per day` implies about `7.14d` of retained data, which matches the `40.5 GiB` disk footprint closely.
Average raw event size in this tested setup is about `2.9 KiB`, which is far larger than the earlier generic `250 B` planning model.
This is a write-heavy calibration point because observed read traffic is `0 rps`, so it is useful for storage and ingest-tier math, not for proving read-path cache savings by itself.
`available CPU = 43` and `available memory = 43 GiB` are cluster headroom signals, not service consumption, so they are not used as the VictoriaLogs compute baseline.

Scaling the real-life tested baseline to Loki floors

Scale	Raw ingest/day	VictoriaLogs retained `~7.1d`	Estimated Loki retained `~7.1d`	VictoriaLogs gp3	Loki gp3	Loki published tier	Loki compute floor
1x	0.333 TB/day	40.5 GiB	64.3 GiB	$3.24	$5.14	<3 TB/day	$1,489.20 / month
10x	3.33 TB/day	405 GiB	642.9 GiB	$32.40	$51.43	3-30 TB/day	$13,402.80 / month
30x	9.99 TB/day	1,215 GiB	1,928.6 GiB	$97.20	$154.29	3-30 TB/day	$13,402.80 / month
100x	33.29 TB/day	4,050 GiB	6,428.6 GiB	$324.00	$514.29	~30 TB/day	$38,222.80 / month

This uses the real-life tested `40.5 GiB` retained VictoriaLogs footprint as the base, then applies the same conservative `VL = 63% of Loki` retained-bytes assumption used in the docs cost model.

Real-life tested compute envelope vs Loki floor

Scale	Raw ingest/day	Scaled VL envelope	Illustrative VL EC2 floor	VL compute	Loki compute	Loki / VL CPU	Loki / VL memory
1x	0.333 TB/day	1.2 cores / 5.85 GiB	1 x c7i.xlarge	$124.10 / month	$1,489.20 / month	31.7x	10.1x
10x	3.33 TB/day	12 cores / 58.5 GiB	4 x c7i.2xlarge	$992.80 / month	$13,402.80 / month	35.9x	14.6x
30x	9.99 TB/day	36 cores / 175.5 GiB	6 x c7i.4xlarge	$2,978.40 / month	$13,402.80 / month	12.0x	4.9x
100x	33.29 TB/day	120 cores / 585 GiB	19 x c7i.4xlarge	$9,431.60 / month	$38,222.80 / month	10.2x	3.8x

This uses the measured VictoriaLogs process envelope from the same real-life tested setup: about `1.2` cores and `5.85 GiB` total across `vlstorage`, `vlinsert`, and `vlselect`.

What this comparison means

At the exact real-life tested baseline, the VictoriaLogs service envelope is small enough to fit on a single `c7i.xlarge`, while Loki’s published throughput floor for the same ingest tier is already `3 x c7i.4xlarge`.
Even when the measured VictoriaLogs envelope is scaled linearly, Loki’s published floor stays materially larger on both CPU and memory.
This does not prove that VictoriaLogs scales perfectly linearly; it shows that the real-life tested baseline is far below Loki’s published distributed floor at the same ingest tier.
That is the right way to compare here: a real-life tested VictoriaLogs envelope versus Loki’s own published cluster-sizing floor, not marketing slogans versus marketing slogans.

Real-life tested steady-state high-load envelope

Scenario	Raw ingest/day	VictoriaLogs retained `~7.1d`	Estimated Loki retained `~7.1d`	Scaled VL envelope	Illustrative VL EC2 floor	Loki published tier	Loki compute floor	Loki cross-AZ write payload/day	Effective inter-AZ monthly cost
Real-life tested steady-state high load	0.56 TB/day	68.3 GiB	108.4 GiB	2.0 cores / 9.9 GiB	1 x c7i.2xlarge	<3 TB/day	$1,489.20 / month	1,046 GiB/day	$627.60 / month

This row uses the higher real-life tested envelope of about `2.5k` events per second and about `6.5 MB/s` raw ingest bandwidth.
It is intentionally separate from the daily average snapshot so the page shows both the average storage baseline and the heavier sustained operating shape.
Even at this higher steady-state envelope, the tested VictoriaLogs setup remains far below Loki’s first published distributed compute floor.

3-AZ VictoriaLogs topology note

Topology	Minimum pod shape	Cost-model treatment
3 x vlstorage per AZ	3 x vlinsert, 3 x vlselect, 9 x vlstorage	keep the combined compute envelope used in the main tables
4 x vlstorage per AZ	3 x vlinsert, 3 x vlselect, 12 x vlstorage	keep the combined compute envelope used in the main tables

This captures the normal production pod shape for a 3-AZ cluster with one `vlinsert` and one `vlselect` per AZ plus `3-4` `vlstorage` pods per AZ.
The cost worksheet still uses combined compute in the main tables so the comparison stays about total service envelope rather than node-placement policy.
The measured `vlstorage` footprint used elsewhere is for the tested `vlstorage` service envelope as a whole, not per storage pod.

Inter-AZ write replication cost floor

Scale	Raw ingest/day	Loki cross-AZ write payload/day	Illustrative monthly inter-AZ cost
1x	310 GiB/day	620 GiB/day	$372.00 / month
10x	3,100 GiB/day	6,200 GiB/day	$3,720.00 / month
30x	9,300 GiB/day	18,600 GiB/day	$11,160.00 / month
100x	31,000 GiB/day	62,000 GiB/day	$37,200.00 / month

This models AWS inter-AZ transfer at an effective `$0.02/GB` crossed once because EC2 pricing charges `$0.01/GB` in and `$0.01/GB` out across Availability Zones in the same Region. For a 3-AZ Loki cluster with replication factor `3`, the simple write floor is one local replica plus two remote replicas. These network-dollar rows are also worksheet calculations, not observed AWS billing lines.

Why the VictoriaLogs shape can differ

VictoriaLogs cluster docs support independent clusters in separate AZs and advanced multi-level cluster setup.
That lets operators keep normal reads AZ-local and reserve cross-AZ fanout for explicit global or failover queries.
The proxy adds `zstd` and `gzip` on the read path it controls, which reduces client and peer-cache transport bytes for repeated reads.
I did not attach a hard VictoriaLogs inter-AZ dollar figure because the docs do not publish a stable per-hop replication compression ratio, and inventing one would make the model less honest.
In the tested setup, `0 rps` reads means the measurable network bill is dominated by write replication, not by query fanout.

Published numbers worth citing carefully

VictoriaLogs docs: up to `30x` less RAM and up to `15x` less disk than Loki or Elasticsearch.
VictoriaLogs docs: all fields are indexed and high-cardinality values work unless promoted to stream fields.
Some real deployments observe `50-60x` VictoriaLogs compression ratios on the data-block metric, but that excludes `indexdb` and should be treated as a lower-bound, not the full storage bill.
TrueFoundry `500 GB / 7 day` benchmark: `≈40%` less storage and materially lower CPU and RAM than Loki on its workload.
TrueFoundry broad-search results: VictoriaLogs was faster on its needle-in-haystack and negative-match tests.
Grafana’s own Loki sizing guide publishes a `3-30 TB/day` base cluster at `431 vCPU / 857 Gi` and a `~30 TB/day` cluster at `1221 vCPU / 2235 Gi` before query spikes, which makes the compute side of the cost story concrete.

Published Loki behaviors worth keeping in mind

Loki docs: labels are for low-cardinality values and line content is not indexed.
Loki docs: high-cardinality labels build a huge index, flush tiny chunks, and reduce performance and cost-effectiveness.
Loki docs: scalable deployments are multi-component and query-frontend based.
Loki docs: OTel resource attributes promoted to labels are rewritten from dots to underscores, which the proxy can mirror on the Grafana side.
Loki docs: unoptimized queries can need `10x` the suggested querier resources, so the published tier tables are a floor, not a worst case.
Loki costs grow fast when the workload crosses published ingest tiers, because those tiers already assume a sizeable distributed footprint before storage and object-transfer overhead.

Follow-up docs and sources

Docs cost model Docs comparison matrix Loki labels docs Loki architecture docs VictoriaLogs overview VictoriaLogs key concepts TrueFoundry benchmark report Project performance docs Project benchmarks