Security
Security Model​
loki-vl-proxy is intentionally read-focused. The default posture is:
- read APIs enabled for Loki-compatible querying
- write ingestion API (
/loki/api/v1/push) blocked (405) - admin/debug APIs disabled unless explicitly enabled
The only write-path exception is /loki/api/v1/delete, gated by strict safeguards.
High-Impact Controls​
1) Tenant Isolation​
X-Scope-OrgIDis mapped to VictoriaLogs tenant IDs via-tenant-map- optional multi-tenant fanout is explicit (
tenant-a|tenant-b) - wildcard tenant mode (
*) is proxy-specific and requires explicit allow config
Lightweight tenant enforcement: -require-tenant-header=true rejects any request missing an X-Scope-OrgID header with HTTP 401. This is a lighter alternative to full auth — it catches misconfigured clients without requiring a token/credential system.
Backend tenant header forwarding: Set FORWARD_TENANT_HEADER=false to prevent the proxy from forwarding X-Scope-OrgID to the backend (useful if the VL backend does not support multi-tenancy).
2) /tail Browser-Origin Controls​
/loki/api/v1/tailcan enforce allowed browser origins- use
-tail.allowed-originsfor Grafana/browser clients - keep restrictive defaults for internet-exposed deployments
3) Delete Safeguards​
/loki/api/v1/delete requires:
X-Delete-Confirmation: true- explicit query selector (no broad wildcard delete)
- explicit
startandendtime bounds - tenant-scoped execution and audit logging
4) Request Hardening​
- max request body/header limits
- request timeout boundaries
- built-in rate limiting and global concurrency guards
- request coalescing + circuit breaker to reduce backend cascade risk
5) Transport Security​
- frontend TLS and optional mTLS support
- backend TLS controls for VictoriaLogs/OTLP exporters
- controlled forwarding of auth headers/cookies to backend
- optional peer-cache shared-token protection via
-peer-auth-token
mTLS / client certificate flags:
| Flag | Default | Description |
|---|---|---|
-tls-require-client-cert | false | Require client TLS certificate (mTLS) |
-tls-client-ca-file | — | CA certificate for validating client certs |
CI Security Lanes​
The repository now treats security validation as its own layered test surface instead of burying it inside generic CI.
Fast PR Blockers​
Defined in .github/workflows/security-pr.yaml.
gitleaksfor secret detection in the repositorygosecfor Go-focused SAST on the proxy and related packagesTrivyfilesystem scanning for vulnerabilities, misconfigurations, and secretsactionlintfor GitHub Actions workflow validationhadolintfor Dockerfile hygiene and hardeningOpenSSF Scorecardfor repository and supply-chain posture
This lane is supposed to fail quickly on issues that should never merge.
Runtime PR Security​
Also defined in .github/workflows/security-pr.yaml.
- custom Go security regressions from
scripts/ci/run_security_regressions.sh - OWASP ZAP baseline scan from
scripts/ci/run_zap_scan.sh baseline
This lane validates the running stack rather than just the source tree. It is intentionally pointed at a short allowlist in security/zap/targets.txt so the baseline scan exercises the real user and admin/debug surface without wandering into unrelated compose internals.
Heavy Scheduled Security​
Defined in .github/workflows/security-heavy.yaml.
- Trivy image scan against the built runtime image
- SBOM generation for downstream review and artifact retention
- longer fuzz runs
- broader
Semgrepcoverage - OWASP ZAP active scan
- curated
Nucleitemplates fromsecurity/nuclei/
This lane is intentionally heavier and is meant for scheduled or manual deep validation rather than fast PR feedback.
Repository-Specific Threat Model​
Generic scanners are useful here, but the highest-risk bugs for this project are still proxy-specific:
- tenant isolation around
X-Scope-OrgIDand any tenant-derived cache keys - cache isolation across memory, disk, and peer cache layers
- metadata, label, and field enumeration leaks between tenants
- auth-boundary confusion across downstream requests, upstream requests, and forwarded headers/cookies
/tailbrowser-origin enforcement and websocket handling- oversized bodies, oversized headers, huge query windows, and malformed LogQL payloads
- debug/admin exposure on non-loopback listeners
The custom regression suite is biased toward these risks rather than only generic scanner output.
Response-Header Baseline​
The proxy now applies the same baseline security response headers across normal routes, 404s, and disabled admin/debug endpoints:
X-Content-Type-Options: nosniffX-Frame-Options: DENYCross-Origin-Resource-Policy: same-originCache-Control: no-store, no-cache, must-revalidate, max-age=0Pragma: no-cacheExpires: 0
That removes the weaker edge-path behavior where scanners could still reach missing or disabled routes without the same browser and cache protections as the main API surface.
Container And Chart Posture​
- the runtime image now runs as a non-root user
- the runtime image keeps a read-only root filesystem
- Helm drops all capabilities and blocks privilege escalation
- the chart can optionally mount host
/procread-only for richer process/system metrics
The host /proc mount is intentional. Trivy would normally flag this, so CI uses a narrow .trivyignore.yaml exception for the specific chart template path rather than disabling the broader class of checks.
Admin and Debug Endpoints​
The following are disabled by default and should stay restricted:
/debug/queries/debug/pprof/*
Enable only for controlled troubleshooting windows. On non-loopback listen addresses the proxy now refuses to start with these enabled unless -server.admin-auth-token is set.
/metrics stays available on the main listener when instrumentation is enabled, but the default export now suppresses per-tenant and per-client identity labels. Opt back in with -metrics.export-sensitive-labels=true only on trusted scrape paths.
Recommended Production Baseline​
- explicit
-tenant-map(avoid implicit defaults for multi-tenant production) - keep
-tenant.allow-global=falseunless you intentionally need wildcard backend-default access - strict
/tailorigin allowlist - conservative request-size and timeout limits
- explicit
-http-read-header-timeoutand bounded/metricsconcurrency ServiceMonitor+ alerting on5xx, circuit breaker open state, and backend latency-server.admin-auth-tokenfor debug/admin surfaces-peer-auth-tokenwhen peer cache crosses network trust boundaries- avoid exposing debug/admin endpoints publicly
Local Security Validation​
Useful local commands while working on hardening or CI changes:
# repo secret scan
docker run --rm -v "$PWD:/repo" -w /repo \
ghcr.io/gitleaks/gitleaks:v8.28.0 \
detect --source . --report-format sarif --report-path gitleaks.sarif --exit-code 1
# Go SAST
go install github.com/securego/gosec/v2/cmd/gosec@v2.22.7
"$(go env GOPATH)/bin/gosec" \
-exclude=G104,G108,G115,G301,G302,G304,G306,G402,G404 \
-exclude-generated \
./...
# filesystem scan with the same allowlist CI uses
docker run --rm -v "$PWD:/repo" -w /repo \
aquasec/trivy:0.69.3 \
fs . \
--ignorefile .trivyignore.yaml \
--scanners vuln,misconfig,secret \
--severity HIGH,CRITICAL \
--ignore-unfixed \
--exit-code 1 \
--skip-version-check
# workflow and Dockerfile linting
docker run --rm -v "$PWD:/repo" -w /repo rhysd/actionlint:1.7.7 -color
docker run --rm -i -v "$PWD/.hadolint.yaml:/root/.config/hadolint.yaml:ro" \
hadolint/hadolint:v2.12.0 < Dockerfile
# supply-chain posture gate
docker run --rm \
-e GITHUB_AUTH_TOKEN="${GITHUB_TOKEN}" \
gcr.io/openssf/scorecard:stable \
--repo="github.com/ReliablyObserve/Loki-VL-proxy" \
--format json \
--show-details > scorecard.json
python3 scripts/ci/check_scorecard.py scorecard.json \
--min-overall 5.0 \
--require-check Dangerous-Workflow=10 \
--require-check Binary-Artifacts=10 \
--require-check CI-Tests=8 \
--require-check SAST=7
# repo-specific runtime checks
./scripts/ci/run_security_regressions.sh
./scripts/ci/run_zap_scan.sh baseline
./scripts/ci/run_nuclei_scan.sh
When reproducing ZAP locally, expect occasional 10049 Non-Storable Content warnings on deliberate 404 discovery paths such as / or disabled /debug/* endpoints. Those reports are useful for visibility but are not currently treated as exploitable proxy issues.