Roadmap
Completed​
-
LogQL -> LogsQL translation (stream selectors, line filters, parsers, metric queries)
-
Response format conversion (VL NDJSON -> Loki streams, VL stats -> Prometheus matrix/vector)
-
Request coalescing (singleflight: N queries -> 1 backend request)
-
Rate limiting (per-client token bucket + global concurrency)
-
Circuit breaker (closed->open->half-open with built-in defaults)
-
Query normalization (sort matchers, collapse whitespace for cache keys)
-
Tiered cache (per-endpoint TTLs, L1 in-memory, L2 bbolt on-disk)
-
Multitenancy (string->int tenant mapping, numeric passthrough, SIGHUP reload)
-
WebSocket tail (
/loki/api/v1/tail-> VL NDJSON streaming) -
L2 disk cache with gzip compression, write-back buffer (encryption via cloud provider)
-
OTLP telemetry push (gzip/zstd compression, TLS)
-
HTTP hardening (timeouts, body limits, security headers)
-
Index stats, volume, volume_range via VL
/select/logsql/hits -
Query fingerprinting + analytics (
/debug/queries) -
Graceful HTTP server shutdown (SIGTERM/SIGINT)
-
Grafana datasource config (maxLines, basic auth, backend timeout, TLS, header/cookie forwarding, optional listener mTLS)
-
Derived fields (regex extraction for trace linking)
-
Chunked streaming (Transfer-Encoding: chunked for large results)
-
OTel label translation (bidirectional dot<->underscore for 50+ fields)
-
Custom field remapping (
-field-mapping) -
Per-tenant metrics (request rate, latency, error rate by X-Scope-OrgID)
-
Client error breakdown (bad_request, rate_limited, not_found, body_too_large)
-
Grafana dashboard & alerting rules (Helm PrometheusRule CR)
-
Write safeguard (
/pushblocked with 405) -
| decolorizeproxy-side ANSI stripping -
| ip("CIDR")proxy-side IP range filtering -
| line_formatfull Go templates -
pprof, SIGHUP reload, and rate-limit response headers
-
Per-endpoint cache/backend metrics, CB state gauge
-
Fuzz testing (1.2M+ executions, no panics)
-
Nested binary metric queries (
sum(rate(...)) / sum(rate(...))) -
/loki/api/v1/patternsproxy-side Loki-style Drain tokenizer/clustering extraction -
directionparameter (forward/backward sort) -
quantile_over_time()mapped to VL quantile -
label_formatmulti-rename (comma-separated) -
Extended binary ops (
%,^,==,!=,>,<,>=,<=) -
Datasource compatibility handlers (
/rules,/alerts,/config) -
Playwright UI e2e tests
-
/tailbrowser and ops coverage (origin policy, native fallback, ingress recovery, upstream401/403/5xxparity) -
Multi-tenant Explore and Logs Drilldown coverage for
__tenant_id__, labels, series, and detected field/label browser/resource surfaces -
Delete API endpoint with safeguards (confirmation, tenant scoping, audit logging)
-
without()clause detection and clear error message -
IsScalarsupports negative and scientific notation -
Circuit breaker half-open metrics fix
-
Tenant map reload race condition fix
-
group()outer aggregation — inner metric translated normally, proxy normalises all values to1(v1.21.0) -
label_replace()— proxy post-processing via marker suffix; Prometheus no-match semantics (v1.21.0) -
label_join()— proxy post-processing via marker suffix; missing src labels skipped (v1.21.0) -
count_values()— returns descriptive error (not translatable to VL; VL has no group-by-value primitive) (v1.21.0) -
Circuit breaker sliding-window failure counting — 30s window replaces consecutive-failure model; sporadic slow-query resets no longer open the breaker (v1.18.0)
-
Bare label matcher error —
app="value"(missing braces) returns HTTP 400 with descriptive Loki-style parse error instead of silently emitting malformed VL syntax (v1.20.0) -
detected_levelinference from_msgcontent — proxy infers level from JSON/logfmt log body when not present in stream labels; Drilldown volume API gains automatic parser unpacking (v1.20.0) -
Deterministic log stream ordering for multi-window queries — streams sorted by canonical key, per-stream values sorted by timestamp before response emission (v1.21.1)
-
boolmodifier on comparison operators — stripped at translation (applyOp returns 1/0 for all comparisons) -
Field-specific parser
| json field1, field2/| logfmt field1, field2— maps to full unpack (VL extracts all fields) -
Backslash-escaped quotes in stream selectors — findMatchingBrace handles
\" -
Binary expression detection before metric query (fixes
rate(...) > 0being misrouted) -
Peer cache design doc + headless service Helm template
-
Performance-focused optimization pass (buffer pools, sync.Pool, connection-pool tuning, cache hot-path benchmarks)
-
Complete Helm chart: 11 templates, GOMEMLIMIT auto-calc, HTTPRoute
-
Broad test suite, CI bench job, and regression gates
-
Coverage and quality-gate reinforcement for runtime, middleware, cache, and proxy-path tests
-
Tier0 compatibility-edge cache with bounded memory budget, safe GET-only guardrails, and reload invalidation
-
Fleet shadow-copy validation for 3-peer cache reuse plus Tier0/fleet micro-benchmarks
-
Named tenant routing enhancements — YAML/JSON tenant map file with mtime-polling hot-reload and SIGHUP; label-based tenant isolation (
-tenant-labelinjects{field="orgID"}into VL queries);-forward-tenant-headerpassthrough for Lakehouse; uint32 validation at load time; LogsQL injection prevention via backslash-before-quote escaping (v1.31.x) -
Exhaustive LogQL parity machine — 316 parse/translation cases covering stream selectors, line filters, parsers, metric queries, binary ops, subqueries, offset modifier, unwrap unit conversion, field-specific parsers, named regexp groups, vector matching, complex pipelines, and edge cases; LogQL syntax validator for Loki error parity; comprehensive pipeline parity tests; all 14 previously tracked
proxy_bugandproxy_strictKnownGaps resolved (v3.7.1) -
| drop field=valuematcher semantics — proxy applies conditional field removal viaParseDropConditions+applyDropConditionspost-processing; the value predicate is now respected (previously the field was always dropped regardless of value) (v3.7.1) -
structuredMetadata vs parsedFields classification — proxy correctly classifies fields by comparing against
_msgJSON content, preventing structured metadata from being misclassified as parsedFields (v3.7.1) -
Non-OTel structured metadata e2e tests — push tests for plain structured metadata (non-OTel) added to log-generator; default
label-stylechanged tounderscoresandmetadata-field-modetotranslated(v3.7.1) -
Config examples folder (
examples/) — runnable env files, tenant map YAML, Docker Compose stack, Grafana datasource provisioning, systemd unit, Kubernetes ConfigMap with full annotated flag/env reference
Recently Shipped​
-
on()/ignoring()/group_left()/group_right()vector matching (0.22.0) -
@timestamp modifier (0.19.0) -
unwrap duration()/bytes()unit conversion (0.21.0) - Subquery syntax
rate(...)[1h:5m]— proxy-side evaluation (0.23.0) - LRU cache eviction (0.21.0)
- Peer cache Phase 1 implementation (DNS discovery + peer fetch) (0.24.0)
- System metrics in /metrics (CPU, memory, IO, network via /proc) (0.19.0)
- Native VL stream selector optimization for known
_stream_fields(0.23.0) - PR quality report workflow with coverage, compatibility, and performance delta comments (0.26.0)
- Add bounded peer hot-read-ahead (top-N hot keys with per-interval key/byte/concurrency budgets, jitter, and tenant fairness) to improve non-owner local hit rates without causing peer traffic storms (1.0.25)
- Add regression/perf suite for collapse forwarding and hot-read-ahead interactions (owner/non-owner paths, coalescing efficiency, backend offload delta) (1.0.25)
- Promote compose-backed e2e fleet cache smoke coverage into required GitHub Actions for pull requests and post-merge
mainruns (0.27.7) - fastjson + streaming rewrite of stats hot path — eliminate per-entry heap allocations in
collectRangeMetricSamplesand stats response assembly (1.30.0–1.31.2) -
stats_query_rangefast path for grouped and ungrouped metric queries —sum by (...)count/rate/bytes queries bypass raw log scan; heavy workload throughput 4× vs cold proxy baseline (1.30.0) - Cold storage backend routing — split queries across hot VL and cold Victoria Lakehouse by configurable time boundary (1.28.0)
- allocation pool pass — gzip reader pool, gob buffer pool, aggregator scalar eliminations, patternJoinBuilderPool, logfmt single-pass scanning, series/stats buffer pooling (1.31.2)
- Parser-stage guard removal from fast path —
rate/count_over_time/bytes_rate/bytes_over_timewith| json/| logfmtuse VL native stats for tumbling windows (1.31.2)
Committed — decided and in progress​
Everything below is decided work with its direction already proven in the codebase — no exploratory items are listed here.
- Tighten remaining merged-tenant Drilldown metadata accuracy for field and label cardinality surfaces
- Convert more upstream Loki, Logs Drilldown, and VictoriaLogs edge cases into regression tests (ongoing — LogQL parity machine at 555+ cases)
- Malformed drop/keep matcher → HTTP 400 error — enforced by the typed LogQL AST validator in the
query/query_rangehandlers (regression-locked), andrules-migratenow runs the same validator so malformed rule expressions fail conversion instead of silently skipping the broken stage - Migrate remaining string-based translation paths to the typed
logsqlAST builder and rolllogql.Translateinto the proxy handlers — the typed translation path (internal/logql/translate.go) is implemented and tested; the handler call-site migration is the remaining step - Expand browser-level multi-tenant Explore and Drilldown scenarios where API parity already exists but UI combinations still need live regression coverage
-
| keep field=valuestream label mutation — matcher form now applies to stream labels in addition to structured metadata and parsed fields (v1.50.1) - Parallel multi-tenant fanout — goroutine-per-tenant dispatch with
sync.WaitGroup(v1.43.0) - Streaming backward hot+cold merge — ring-buffer reverse with bounded memory (
maxRingSize=5000) and early termination (v1.43.0)