Cost and performance
Compare VictoriaLogs and Loki with a source-backed cost lens
The cost story is not that a proxy magically makes every logging stack cheap. The defensible argument is narrower: Loki's own docs describe a label-indexed system that is sensitive to high-cardinality labels, VictoriaLogs publishes an all-field index and lower-resource claims, and Loki-VL-proxy adds concrete read-path caches plus observability so repeated Grafana traffic can cost less.
| Dimension | What official Loki docs say | What VictoriaLogs docs or published reports say | What Loki-VL-proxy adds |
|---|---|---|---|
| Indexing strategy | Loki docs: labels index streams, but the content of each log line is not indexed. | VictoriaLogs docs: all fields are indexed and the query model supports full-text search across fields. | The proxy keeps the Loki read contract in front of that backend so Grafana can stay on the native Loki datasource. |
| High-cardinality behavior | Loki docs recommend low-cardinality labels and warn that high cardinality hurts performance and cost-effectiveness. | VictoriaLogs docs say high-cardinality values such as `trace_id`, `user_id`, and `ip` work fine as fields as long as they are not used as stream fields. | The proxy lets Grafana keep Loki-safe label surfaces while VictoriaLogs keeps the richer field model underneath. |
| Search-heavy workloads | Broad or text-heavy searches can devolve into stream selection plus line filtering because line content is not indexed. | VictoriaLogs publishes fast full-text search as a core capability, and third-party benchmarks report materially faster broad-search latency on large datasets. | Tier0, L1/L2/L3, and window cache can further suppress repeated read work after the first expensive search path. |
| Operational shape | Loki can run single-binary, but its scalable architecture is microservices-based with multiple components. | VictoriaLogs docs position the backend as a simple single executable on the easy path, but they also document cluster mode with `vlinsert`, `vlselect`, `vlstorage`, replication, multi-level cluster setup, and HA patterns across independent availability zones. | The proxy adds one small read-side compatibility layer with route-aware metrics and structured logs instead of hiding translation work inside clients, and can sit in front of either single-node or clustered VictoriaLogs. |
| Published resource claims | Grafana docs do not market a universal fixed savings ratio; they emphasize label strategy, storage, and deployment architecture. | VictoriaLogs docs publish up to `30x` less RAM and up to `15x` less disk than Loki or Elasticsearch, while TrueFoundry reports `≈40%` less storage and much lower CPU and RAM on its workload. | The proxy adds its own small runtime cost, but published project benchmarks show it remains CPU-light and can sharply reduce repeated backend work through caching. |
| Published large-workload sizing | Grafana’s own sizing guide reaches `431 vCPU / 857 Gi` at `3-30 TB/day` and `1221 vCPU / 2235 Gi` around `30 TB/day` before query spikes. | VictoriaLogs docs do not publish an equivalent distributed tier matrix on the same shape; the safer claim is lower-resource posture plus stronger compression and search behavior on published comparisons. | The proxy does not change backend ingest economics by itself, but it keeps the read side small and can cut repeated backend work through tiered caches and route-aware control. |
| Cross-AZ traffic posture | Loki docs say distributors forward writes to a replication factor that is generally `3`, queriers query all ingesters for in-memory data, and the zone-aware replication design explicitly lists minimizing cross-zone traffic costs as a non-goal. | VictoriaLogs cluster docs support independent clusters in separate availability zones plus advanced multi-level cluster setup, which lets operators keep most normal reads local and reserve cross-AZ fanout for HA or global queries. | The proxy can stay AZ-local on the read path and adds `zstd`/`gzip` compression on the hops it controls, but it does not invent backend replication savings that the VictoriaLogs docs do not quantify. |
Why the Loki cost floor matters
- Grafana already publishes large distributed Loki sizing floors by ingest throughput, so the high-end compute side is not a vague anti-Loki argument.
- At `3-30 TB/day`, the published Loki floor is `431 vCPU / 857 Gi` before storage and before the `10x` querier-spike warning in the same docs.
- That is why this project’s cost page converts Loki’s own sizing guide into on-demand EC2 floors before comparing it with a smaller `VictoriaLogs + Loki-VL-proxy` reference pack.
- The proxy layer is intentionally modeled as a small read-path tax, not as the source of backend ingest savings.
Where the savings argument is strongest
- Search-heavy workloads where users often scan broad time ranges for words, phrases, or IDs.
- Data models with many useful fields and high-cardinality values that should stay as fields rather than labels.
- Repeated Grafana dashboard, Explore, or Drilldown reads that can hit Tier0, local cache, disk cache, or peer cache.
- Migrations where you want VictoriaLogs economics without forcing Grafana and Loki API clients to change first.
Where to be precise instead of hype-driven
- Do not present the proxy as a generic ingestion benchmark; standard Loki push stays blocked.
- Do not treat third-party workload numbers as universal truths for every cluster.
- Do not attribute VictoriaLogs backend savings to the proxy itself; the proxy adds read-path suppression and migration control.
- Do compare end-to-end client latency with upstream latency so you can see whether the proxy or the backend owns the cost.
What the proxy measurably contributes
- `query_range` warm hits in the published project benchmark land at `0.64-0.67 us` versus `4.58 ms` on the cold delayed path.
- `detected_field_values` warm hits land at `0.71 us` versus `2.76 ms` without Tier0.
- Peer-cache warm shadow-copy hits land at `52 ns` after the first owner fetch.
- Long-range prefiltering cut backend query calls by about `81.6%` on the published benchmark shape.
How to verify the savings in another environment
- Track `loki_vl_proxy_requests_total` and `loki_vl_proxy_request_duration_seconds` by `endpoint` and `route`.
- Compare `loki_vl_proxy_backend_duration_seconds` with downstream latency to isolate proxy overhead from VictoriaLogs slowness.
- Watch `loki_vl_proxy_cache_hits_by_endpoint` and `_misses_by_endpoint` to see whether repeated reads are really being suppressed.
- Use structured logs with `proxy.overhead_ms` and `upstream.duration_ms` for exact per-request decomposition.
Loki published sizing converted to EC2
| Loki docs ingest tier | Published base request | Illustrative EC2 floor | Monthly compute floor |
|---|---|---|---|
| <3 TB/day | 38 vCPU / 59 Gi | 3 x c7i.4xlarge | $1,489.20 / month |
| 3-30 TB/day | 431 vCPU / 857 Gi | 27 x c7i.4xlarge | $13,402.80 / month |
| ~30 TB/day | 1221 vCPU / 2235 Gi | 77 x c7i.4xlarge | $38,222.80 / month |
This uses simple `c7i.4xlarge` on-demand packing in `us-east-1` to turn Grafana's published CPU and memory requests into an operator-readable monthly floor. These AWS rows are pure calculations to put `$$` around the comparison, not observed cloud bills.
Illustrative monthly cost scenarios
| Scenario | Active users | Ingest | Raw ingest/day | Loki total | Proxy + VL total | Monthly delta | Savings |
|---|---|---|---|---|---|---|---|
| Small | 100 | 100k lines/s | 2.16 TB/day | $1,681.20 | $369.16 | $1,312.04 | 78.0% |
| Medium | 1,000 | 500k lines/s | 10.8 TB/day | $14,362.80 | $1,101.20 | $13,261.60 | 92.3% |
| Large | 10,000 | 1M lines/s | 21.6 TB/day | $15,322.80 | $2,388.55 | $12,934.25 | 84.4% |
These scenarios assume `7d` retention, `250 B` average raw line size, and a conservative VictoriaLogs storage factor of `10x`, even though some real deployments observe much higher data-block-only compression ratios.
Real-life tested VictoriaLogs baseline
- Real snapshot: `800 M` total entries, `112 M` ingested in `24h`, `310 GiB` ingested in `24h`, and `40.5 GiB` on disk.
- The observed compression ratio is `54.9`, which implies about `5.65 GiB/day` of compressed data blocks.
- `800 M / 112 M per day` implies about `7.14d` of retained data, which matches the `40.5 GiB` disk footprint closely.
- Average raw event size in this tested setup is about `2.9 KiB`, which is far larger than the earlier generic `250 B` planning model.
- This is a write-heavy calibration point because observed read traffic is `0 rps`, so it is useful for storage and ingest-tier math, not for proving read-path cache savings by itself.
- `available CPU = 43` and `available memory = 43 GiB` are cluster headroom signals, not service consumption, so they are not used as the VictoriaLogs compute baseline.
Scaling the real-life tested baseline to Loki floors
| Scale | Raw ingest/day | VictoriaLogs retained `~7.1d` | Estimated Loki retained `~7.1d` | VictoriaLogs gp3 | Loki gp3 | Loki published tier | Loki compute floor |
|---|---|---|---|---|---|---|---|
| 1x | 0.333 TB/day | 40.5 GiB | 64.3 GiB | $3.24 | $5.14 | <3 TB/day | $1,489.20 / month |
| 10x | 3.33 TB/day | 405 GiB | 642.9 GiB | $32.40 | $51.43 | 3-30 TB/day | $13,402.80 / month |
| 30x | 9.99 TB/day | 1,215 GiB | 1,928.6 GiB | $97.20 | $154.29 | 3-30 TB/day | $13,402.80 / month |
| 100x | 33.29 TB/day | 4,050 GiB | 6,428.6 GiB | $324.00 | $514.29 | ~30 TB/day | $38,222.80 / month |
This uses the real-life tested `40.5 GiB` retained VictoriaLogs footprint as the base, then applies the same conservative `VL = 63% of Loki` retained-bytes assumption used in the docs cost model.
Real-life tested compute envelope vs Loki floor
| Scale | Raw ingest/day | Scaled VL envelope | Illustrative VL EC2 floor | VL compute | Loki compute | Loki / VL CPU | Loki / VL memory |
|---|---|---|---|---|---|---|---|
| 1x | 0.333 TB/day | 1.2 cores / 5.85 GiB | 1 x c7i.xlarge | $124.10 / month | $1,489.20 / month | 31.7x | 10.1x |
| 10x | 3.33 TB/day | 12 cores / 58.5 GiB | 4 x c7i.2xlarge | $992.80 / month | $13,402.80 / month | 35.9x | 14.6x |
| 30x | 9.99 TB/day | 36 cores / 175.5 GiB | 6 x c7i.4xlarge | $2,978.40 / month | $13,402.80 / month | 12.0x | 4.9x |
| 100x | 33.29 TB/day | 120 cores / 585 GiB | 19 x c7i.4xlarge | $9,431.60 / month | $38,222.80 / month | 10.2x | 3.8x |
This uses the measured VictoriaLogs process envelope from the same real-life tested setup: about `1.2` cores and `5.85 GiB` total across `vlstorage`, `vlinsert`, and `vlselect`.
What this comparison means
- At the exact real-life tested baseline, the VictoriaLogs service envelope is small enough to fit on a single `c7i.xlarge`, while Loki’s published throughput floor for the same ingest tier is already `3 x c7i.4xlarge`.
- Even when the measured VictoriaLogs envelope is scaled linearly, Loki’s published floor stays materially larger on both CPU and memory.
- This does not prove that VictoriaLogs scales perfectly linearly; it shows that the real-life tested baseline is far below Loki’s published distributed floor at the same ingest tier.
- That is the right way to compare here: a real-life tested VictoriaLogs envelope versus Loki’s own published cluster-sizing floor, not marketing slogans versus marketing slogans.
Real-life tested steady-state high-load envelope
| Scenario | Raw ingest/day | VictoriaLogs retained `~7.1d` | Estimated Loki retained `~7.1d` | Scaled VL envelope | Illustrative VL EC2 floor | Loki published tier | Loki compute floor | Loki cross-AZ write payload/day | Effective inter-AZ monthly cost |
|---|---|---|---|---|---|---|---|---|---|
| Real-life tested steady-state high load | 0.56 TB/day | 68.3 GiB | 108.4 GiB | 2.0 cores / 9.9 GiB | 1 x c7i.2xlarge | <3 TB/day | $1,489.20 / month | 1,046 GiB/day | $627.60 / month |
- This row uses the higher real-life tested envelope of about `2.5k` events per second and about `6.5 MB/s` raw ingest bandwidth.
- It is intentionally separate from the daily average snapshot so the page shows both the average storage baseline and the heavier sustained operating shape.
- Even at this higher steady-state envelope, the tested VictoriaLogs setup remains far below Loki’s first published distributed compute floor.
3-AZ VictoriaLogs topology note
| Topology | Minimum pod shape | Cost-model treatment |
|---|---|---|
| 3 x vlstorage per AZ | 3 x vlinsert, 3 x vlselect, 9 x vlstorage | keep the combined compute envelope used in the main tables |
| 4 x vlstorage per AZ | 3 x vlinsert, 3 x vlselect, 12 x vlstorage | keep the combined compute envelope used in the main tables |
- This captures the normal production pod shape for a 3-AZ cluster with one `vlinsert` and one `vlselect` per AZ plus `3-4` `vlstorage` pods per AZ.
- The cost worksheet still uses combined compute in the main tables so the comparison stays about total service envelope rather than node-placement policy.
- The measured `vlstorage` footprint used elsewhere is for the tested `vlstorage` service envelope as a whole, not per storage pod.
Inter-AZ write replication cost floor
| Scale | Raw ingest/day | Loki cross-AZ write payload/day | Illustrative monthly inter-AZ cost |
|---|---|---|---|
| 1x | 310 GiB/day | 620 GiB/day | $372.00 / month |
| 10x | 3,100 GiB/day | 6,200 GiB/day | $3,720.00 / month |
| 30x | 9,300 GiB/day | 18,600 GiB/day | $11,160.00 / month |
| 100x | 31,000 GiB/day | 62,000 GiB/day | $37,200.00 / month |
This models AWS inter-AZ transfer at an effective `$0.02/GB` crossed once because EC2 pricing charges `$0.01/GB` in and `$0.01/GB` out across Availability Zones in the same Region. For a 3-AZ Loki cluster with replication factor `3`, the simple write floor is one local replica plus two remote replicas. These network-dollar rows are also worksheet calculations, not observed AWS billing lines.
Why the VictoriaLogs shape can differ
- VictoriaLogs cluster docs support independent clusters in separate AZs and advanced multi-level cluster setup.
- That lets operators keep normal reads AZ-local and reserve cross-AZ fanout for explicit global or failover queries.
- The proxy adds `zstd` and `gzip` on the read path it controls, which reduces client and peer-cache transport bytes for repeated reads.
- I did not attach a hard VictoriaLogs inter-AZ dollar figure because the docs do not publish a stable per-hop replication compression ratio, and inventing one would make the model less honest.
- In the tested setup, `0 rps` reads means the measurable network bill is dominated by write replication, not by query fanout.
Published numbers worth citing carefully
- VictoriaLogs docs: up to `30x` less RAM and up to `15x` less disk than Loki or Elasticsearch.
- VictoriaLogs docs: all fields are indexed and high-cardinality values work unless promoted to stream fields.
- Some real deployments observe `50-60x` VictoriaLogs compression ratios on the data-block metric, but that excludes `indexdb` and should be treated as a lower-bound, not the full storage bill.
- TrueFoundry `500 GB / 7 day` benchmark: `≈40%` less storage and materially lower CPU and RAM than Loki on its workload.
- TrueFoundry broad-search results: VictoriaLogs was faster on its needle-in-haystack and negative-match tests.
- Grafana’s own Loki sizing guide publishes a `3-30 TB/day` base cluster at `431 vCPU / 857 Gi` and a `~30 TB/day` cluster at `1221 vCPU / 2235 Gi` before query spikes, which makes the compute side of the cost story concrete.
Published Loki behaviors worth keeping in mind
- Loki docs: labels are for low-cardinality values and line content is not indexed.
- Loki docs: high-cardinality labels build a huge index, flush tiny chunks, and reduce performance and cost-effectiveness.
- Loki docs: scalable deployments are multi-component and query-frontend based.
- Loki docs: OTel resource attributes promoted to labels are rewritten from dots to underscores, which the proxy can mirror on the Grafana side.
- Loki docs: unoptimized queries can need `10x` the suggested querier resources, so the published tier tables are a floor, not a worst case.
- Loki costs grow fast when the workload crosses published ingest tiers, because those tiers already assume a sizeable distributed footprint before storage and object-transfer overhead.