Skip to main content

Cost and performance

Compare VictoriaLogs and Loki with a source-backed cost lens

The cost story is not that a proxy magically makes every logging stack cheap. The defensible argument is narrower: Loki's own docs describe a label-indexed system that is sensitive to high-cardinality labels, VictoriaLogs publishes an all-field index and lower-resource claims, and Loki-VL-proxy adds concrete read-path caches plus observability so repeated Grafana traffic can cost less.

Label-only vs all-field
Loki indexes labels; VictoriaLogs indexes all fields
That difference matters most on broad or text-heavy searches.
High-cardinality posture
Loki warns about high-cardinality labels; VictoriaLogs supports high-cardinality fields
The proxy keeps those richer fields usable without forcing Grafana off Loki semantics.
Cache stack on top
Tier0, L1, L2, L3, and window cache can suppress repeated read work
This is where the proxy adds its own efficiency story.
Workload dependent
The right answer depends on retention, search mix, and dashboard repetition
This page separates vendor claims from project benchmarks and third-party reports.
Loki docs publish big tiers
Grafana’s own sizing guide reaches 431 vCPU / 857 Gi at 3-30 TB/day
That gives the cost discussion a real published compute floor instead of a hand-wavy cluster sketch.
DimensionWhat official Loki docs sayWhat VictoriaLogs docs or published reports sayWhat Loki-VL-proxy adds
Indexing strategyLoki docs: labels index streams, but the content of each log line is not indexed.VictoriaLogs docs: all fields are indexed and the query model supports full-text search across fields.The proxy keeps the Loki read contract in front of that backend so Grafana can stay on the native Loki datasource.
High-cardinality behaviorLoki docs recommend low-cardinality labels and warn that high cardinality hurts performance and cost-effectiveness.VictoriaLogs docs say high-cardinality values such as `trace_id`, `user_id`, and `ip` work fine as fields as long as they are not used as stream fields.The proxy lets Grafana keep Loki-safe label surfaces while VictoriaLogs keeps the richer field model underneath.
Search-heavy workloadsBroad or text-heavy searches can devolve into stream selection plus line filtering because line content is not indexed.VictoriaLogs publishes fast full-text search as a core capability, and third-party benchmarks report materially faster broad-search latency on large datasets.Tier0, L1/L2/L3, and window cache can further suppress repeated read work after the first expensive search path.
Operational shapeLoki can run single-binary, but its scalable architecture is microservices-based with multiple components.VictoriaLogs docs position the backend as a simple single executable on the easy path, but they also document cluster mode with `vlinsert`, `vlselect`, `vlstorage`, replication, multi-level cluster setup, and HA patterns across independent availability zones.The proxy adds one small read-side compatibility layer with route-aware metrics and structured logs instead of hiding translation work inside clients, and can sit in front of either single-node or clustered VictoriaLogs.
Published resource claimsGrafana docs do not market a universal fixed savings ratio; they emphasize label strategy, storage, and deployment architecture.VictoriaLogs docs publish up to `30x` less RAM and up to `15x` less disk than Loki or Elasticsearch, while TrueFoundry reports `≈40%` less storage and much lower CPU and RAM on its workload.The proxy adds its own small runtime cost, but published project benchmarks show it remains CPU-light and can sharply reduce repeated backend work through caching.
Published large-workload sizingGrafana’s own sizing guide reaches `431 vCPU / 857 Gi` at `3-30 TB/day` and `1221 vCPU / 2235 Gi` around `30 TB/day` before query spikes.VictoriaLogs docs do not publish an equivalent distributed tier matrix on the same shape; the safer claim is lower-resource posture plus stronger compression and search behavior on published comparisons.The proxy does not change backend ingest economics by itself, but it keeps the read side small and can cut repeated backend work through tiered caches and route-aware control.
Cross-AZ traffic postureLoki docs say distributors forward writes to a replication factor that is generally `3`, queriers query all ingesters for in-memory data, and the zone-aware replication design explicitly lists minimizing cross-zone traffic costs as a non-goal.VictoriaLogs cluster docs support independent clusters in separate availability zones plus advanced multi-level cluster setup, which lets operators keep most normal reads local and reserve cross-AZ fanout for HA or global queries.The proxy can stay AZ-local on the read path and adds `zstd`/`gzip` compression on the hops it controls, but it does not invent backend replication savings that the VictoriaLogs docs do not quantify.

Why the Loki cost floor matters

  • Grafana already publishes large distributed Loki sizing floors by ingest throughput, so the high-end compute side is not a vague anti-Loki argument.
  • At `3-30 TB/day`, the published Loki floor is `431 vCPU / 857 Gi` before storage and before the `10x` querier-spike warning in the same docs.
  • That is why this project’s cost page converts Loki’s own sizing guide into on-demand EC2 floors before comparing it with a smaller `VictoriaLogs + Loki-VL-proxy` reference pack.
  • The proxy layer is intentionally modeled as a small read-path tax, not as the source of backend ingest savings.

Where the savings argument is strongest

  • Search-heavy workloads where users often scan broad time ranges for words, phrases, or IDs.
  • Data models with many useful fields and high-cardinality values that should stay as fields rather than labels.
  • Repeated Grafana dashboard, Explore, or Drilldown reads that can hit Tier0, local cache, disk cache, or peer cache.
  • Migrations where you want VictoriaLogs economics without forcing Grafana and Loki API clients to change first.

Where to be precise instead of hype-driven

  • Do not present the proxy as a generic ingestion benchmark; standard Loki push stays blocked.
  • Do not treat third-party workload numbers as universal truths for every cluster.
  • Do not attribute VictoriaLogs backend savings to the proxy itself; the proxy adds read-path suppression and migration control.
  • Do compare end-to-end client latency with upstream latency so you can see whether the proxy or the backend owns the cost.

What the proxy measurably contributes

  • `query_range` warm hits in the published project benchmark land at `0.64-0.67 us` versus `4.58 ms` on the cold delayed path.
  • `detected_field_values` warm hits land at `0.71 us` versus `2.76 ms` without Tier0.
  • Peer-cache warm shadow-copy hits land at `52 ns` after the first owner fetch.
  • Long-range prefiltering cut backend query calls by about `81.6%` on the published benchmark shape.

How to verify the savings in another environment

  • Track `loki_vl_proxy_requests_total` and `loki_vl_proxy_request_duration_seconds` by `endpoint` and `route`.
  • Compare `loki_vl_proxy_backend_duration_seconds` with downstream latency to isolate proxy overhead from VictoriaLogs slowness.
  • Watch `loki_vl_proxy_cache_hits_by_endpoint` and `_misses_by_endpoint` to see whether repeated reads are really being suppressed.
  • Use structured logs with `proxy.overhead_ms` and `upstream.duration_ms` for exact per-request decomposition.

Loki published sizing converted to EC2

Loki docs ingest tierPublished base requestIllustrative EC2 floorMonthly compute floor
<3 TB/day38 vCPU / 59 Gi3 x c7i.4xlarge$1,489.20 / month
3-30 TB/day431 vCPU / 857 Gi27 x c7i.4xlarge$13,402.80 / month
~30 TB/day1221 vCPU / 2235 Gi77 x c7i.4xlarge$38,222.80 / month

This uses simple `c7i.4xlarge` on-demand packing in `us-east-1` to turn Grafana's published CPU and memory requests into an operator-readable monthly floor. These AWS rows are pure calculations to put `$$` around the comparison, not observed cloud bills.

Illustrative monthly cost scenarios

ScenarioActive usersIngestRaw ingest/dayLoki totalProxy + VL totalMonthly deltaSavings
Small100100k lines/s2.16 TB/day$1,681.20$369.16$1,312.0478.0%
Medium1,000500k lines/s10.8 TB/day$14,362.80$1,101.20$13,261.6092.3%
Large10,0001M lines/s21.6 TB/day$15,322.80$2,388.55$12,934.2584.4%

These scenarios assume `7d` retention, `250 B` average raw line size, and a conservative VictoriaLogs storage factor of `10x`, even though some real deployments observe much higher data-block-only compression ratios.

Real-life tested VictoriaLogs baseline

  • Real snapshot: `800 M` total entries, `112 M` ingested in `24h`, `310 GiB` ingested in `24h`, and `40.5 GiB` on disk.
  • The observed compression ratio is `54.9`, which implies about `5.65 GiB/day` of compressed data blocks.
  • `800 M / 112 M per day` implies about `7.14d` of retained data, which matches the `40.5 GiB` disk footprint closely.
  • Average raw event size in this tested setup is about `2.9 KiB`, which is far larger than the earlier generic `250 B` planning model.
  • This is a write-heavy calibration point because observed read traffic is `0 rps`, so it is useful for storage and ingest-tier math, not for proving read-path cache savings by itself.
  • `available CPU = 43` and `available memory = 43 GiB` are cluster headroom signals, not service consumption, so they are not used as the VictoriaLogs compute baseline.

Scaling the real-life tested baseline to Loki floors

ScaleRaw ingest/dayVictoriaLogs retained `~7.1d`Estimated Loki retained `~7.1d`VictoriaLogs gp3Loki gp3Loki published tierLoki compute floor
1x0.333 TB/day40.5 GiB64.3 GiB$3.24$5.14<3 TB/day$1,489.20 / month
10x3.33 TB/day405 GiB642.9 GiB$32.40$51.433-30 TB/day$13,402.80 / month
30x9.99 TB/day1,215 GiB1,928.6 GiB$97.20$154.293-30 TB/day$13,402.80 / month
100x33.29 TB/day4,050 GiB6,428.6 GiB$324.00$514.29~30 TB/day$38,222.80 / month

This uses the real-life tested `40.5 GiB` retained VictoriaLogs footprint as the base, then applies the same conservative `VL = 63% of Loki` retained-bytes assumption used in the docs cost model.

Real-life tested compute envelope vs Loki floor

ScaleRaw ingest/dayScaled VL envelopeIllustrative VL EC2 floorVL computeLoki computeLoki / VL CPULoki / VL memory
1x0.333 TB/day1.2 cores / 5.85 GiB1 x c7i.xlarge$124.10 / month$1,489.20 / month31.7x10.1x
10x3.33 TB/day12 cores / 58.5 GiB4 x c7i.2xlarge$992.80 / month$13,402.80 / month35.9x14.6x
30x9.99 TB/day36 cores / 175.5 GiB6 x c7i.4xlarge$2,978.40 / month$13,402.80 / month12.0x4.9x
100x33.29 TB/day120 cores / 585 GiB19 x c7i.4xlarge$9,431.60 / month$38,222.80 / month10.2x3.8x

This uses the measured VictoriaLogs process envelope from the same real-life tested setup: about `1.2` cores and `5.85 GiB` total across `vlstorage`, `vlinsert`, and `vlselect`.

What this comparison means

  • At the exact real-life tested baseline, the VictoriaLogs service envelope is small enough to fit on a single `c7i.xlarge`, while Loki’s published throughput floor for the same ingest tier is already `3 x c7i.4xlarge`.
  • Even when the measured VictoriaLogs envelope is scaled linearly, Loki’s published floor stays materially larger on both CPU and memory.
  • This does not prove that VictoriaLogs scales perfectly linearly; it shows that the real-life tested baseline is far below Loki’s published distributed floor at the same ingest tier.
  • That is the right way to compare here: a real-life tested VictoriaLogs envelope versus Loki’s own published cluster-sizing floor, not marketing slogans versus marketing slogans.

Real-life tested steady-state high-load envelope

ScenarioRaw ingest/dayVictoriaLogs retained `~7.1d`Estimated Loki retained `~7.1d`Scaled VL envelopeIllustrative VL EC2 floorLoki published tierLoki compute floorLoki cross-AZ write payload/dayEffective inter-AZ monthly cost
Real-life tested steady-state high load0.56 TB/day68.3 GiB108.4 GiB2.0 cores / 9.9 GiB1 x c7i.2xlarge<3 TB/day$1,489.20 / month1,046 GiB/day$627.60 / month
  • This row uses the higher real-life tested envelope of about `2.5k` events per second and about `6.5 MB/s` raw ingest bandwidth.
  • It is intentionally separate from the daily average snapshot so the page shows both the average storage baseline and the heavier sustained operating shape.
  • Even at this higher steady-state envelope, the tested VictoriaLogs setup remains far below Loki’s first published distributed compute floor.

3-AZ VictoriaLogs topology note

TopologyMinimum pod shapeCost-model treatment
3 x vlstorage per AZ3 x vlinsert, 3 x vlselect, 9 x vlstoragekeep the combined compute envelope used in the main tables
4 x vlstorage per AZ3 x vlinsert, 3 x vlselect, 12 x vlstoragekeep the combined compute envelope used in the main tables
  • This captures the normal production pod shape for a 3-AZ cluster with one `vlinsert` and one `vlselect` per AZ plus `3-4` `vlstorage` pods per AZ.
  • The cost worksheet still uses combined compute in the main tables so the comparison stays about total service envelope rather than node-placement policy.
  • The measured `vlstorage` footprint used elsewhere is for the tested `vlstorage` service envelope as a whole, not per storage pod.

Inter-AZ write replication cost floor

ScaleRaw ingest/dayLoki cross-AZ write payload/dayIllustrative monthly inter-AZ cost
1x310 GiB/day620 GiB/day$372.00 / month
10x3,100 GiB/day6,200 GiB/day$3,720.00 / month
30x9,300 GiB/day18,600 GiB/day$11,160.00 / month
100x31,000 GiB/day62,000 GiB/day$37,200.00 / month

This models AWS inter-AZ transfer at an effective `$0.02/GB` crossed once because EC2 pricing charges `$0.01/GB` in and `$0.01/GB` out across Availability Zones in the same Region. For a 3-AZ Loki cluster with replication factor `3`, the simple write floor is one local replica plus two remote replicas. These network-dollar rows are also worksheet calculations, not observed AWS billing lines.

Why the VictoriaLogs shape can differ

  • VictoriaLogs cluster docs support independent clusters in separate AZs and advanced multi-level cluster setup.
  • That lets operators keep normal reads AZ-local and reserve cross-AZ fanout for explicit global or failover queries.
  • The proxy adds `zstd` and `gzip` on the read path it controls, which reduces client and peer-cache transport bytes for repeated reads.
  • I did not attach a hard VictoriaLogs inter-AZ dollar figure because the docs do not publish a stable per-hop replication compression ratio, and inventing one would make the model less honest.
  • In the tested setup, `0 rps` reads means the measurable network bill is dominated by write replication, not by query fanout.

Published numbers worth citing carefully

  • VictoriaLogs docs: up to `30x` less RAM and up to `15x` less disk than Loki or Elasticsearch.
  • VictoriaLogs docs: all fields are indexed and high-cardinality values work unless promoted to stream fields.
  • Some real deployments observe `50-60x` VictoriaLogs compression ratios on the data-block metric, but that excludes `indexdb` and should be treated as a lower-bound, not the full storage bill.
  • TrueFoundry `500 GB / 7 day` benchmark: `≈40%` less storage and materially lower CPU and RAM than Loki on its workload.
  • TrueFoundry broad-search results: VictoriaLogs was faster on its needle-in-haystack and negative-match tests.
  • Grafana’s own Loki sizing guide publishes a `3-30 TB/day` base cluster at `431 vCPU / 857 Gi` and a `~30 TB/day` cluster at `1221 vCPU / 2235 Gi` before query spikes, which makes the compute side of the cost story concrete.

Published Loki behaviors worth keeping in mind

  • Loki docs: labels are for low-cardinality values and line content is not indexed.
  • Loki docs: high-cardinality labels build a huge index, flush tiny chunks, and reduce performance and cost-effectiveness.
  • Loki docs: scalable deployments are multi-component and query-frontend based.
  • Loki docs: OTel resource attributes promoted to labels are rewritten from dots to underscores, which the proxy can mirror on the Grafana side.
  • Loki docs: unoptimized queries can need `10x` the suggested querier resources, so the published tier tables are a floor, not a worst case.
  • Loki costs grow fast when the workload crosses published ingest tiers, because those tiers already assume a sizeable distributed footprint before storage and object-transfer overhead.