Security
Security Model
loki-vl-proxy is intentionally read-focused. The default posture is:
- read APIs enabled for Loki-compatible querying
- write ingestion API (
/loki/api/v1/push) blocked (405) - admin/debug APIs disabled unless explicitly enabled
The only write-path exception is /loki/api/v1/delete, gated by strict safeguards.
High-Impact Controls
1) Tenant Isolation
X-Scope-OrgIDis mapped to VictoriaLogs tenant IDs via-tenant-map- optional multi-tenant fanout is explicit (
tenant-a|tenant-b) - wildcard tenant mode (
*) is proxy-specific and requires explicit allow config
2) /tail Browser-Origin Controls
/loki/api/v1/tailcan enforce allowed browser origins- use
-tail.allowed-originsfor Grafana/browser clients - keep restrictive defaults for internet-exposed deployments
3) Delete Safeguards
/loki/api/v1/delete requires:
X-Delete-Confirmation: true- explicit query selector (no broad wildcard delete)
- explicit
startandendtime bounds - tenant-scoped execution and audit logging
4) Request Hardening
- max request body/header limits
- request timeout boundaries
- built-in rate limiting and global concurrency guards
- request coalescing + circuit breaker to reduce backend cascade risk
5) Transport Security
- frontend TLS and optional mTLS support
- backend TLS controls for VictoriaLogs/OTLP exporters
- controlled forwarding of auth headers/cookies to backend
- optional peer-cache shared-token protection via
-peer-auth-token
CI Security Lanes
The repository now treats security validation as its own layered test surface instead of burying it inside generic CI.
Fast PR Blockers
Defined in .github/workflows/security-pr.yaml.
gitleaksfor secret detection in the repositorygosecfor Go-focused SAST on the proxy and related packagesTrivyfilesystem scanning for vulnerabilities, misconfigurations, and secretsactionlintfor GitHub Actions workflow validationhadolintfor Dockerfile hygiene and hardeningOpenSSF Scorecardfor repository and supply-chain posture
This lane is supposed to fail quickly on issues that should never merge.
Runtime PR Security
Also defined in .github/workflows/security-pr.yaml.
- custom Go security regressions from
scripts/ci/run_security_regressions.sh - OWASP ZAP baseline scan from
scripts/ci/run_zap_scan.sh baseline
This lane validates the running stack rather than just the source tree. It is intentionally pointed at a short allowlist in security/zap/targets.txt so the baseline scan exercises the real user and admin/debug surface without wandering into unrelated compose internals.
Heavy Scheduled Security
Defined in .github/workflows/security-heavy.yaml.
- Trivy image scan against the built runtime image
- SBOM generation for downstream review and artifact retention
- longer fuzz runs
- broader
Semgrepcoverage - OWASP ZAP active scan
- curated
Nucleitemplates fromsecurity/nuclei/
This lane is intentionally heavier and is meant for scheduled or manual deep validation rather than fast PR feedback.
Repository-Specific Threat Model
Generic scanners are useful here, but the highest-risk bugs for this project are still proxy-specific:
- tenant isolation around
X-Scope-OrgIDand any tenant-derived cache keys - cache isolation across memory, disk, and peer cache layers
- metadata, label, and field enumeration leaks between tenants
- auth-boundary confusion across downstream requests, upstream requests, and forwarded headers/cookies
/tailbrowser-origin enforcement and websocket handling- oversized bodies, oversized headers, huge query windows, and malformed LogQL payloads
- debug/admin exposure on non-loopback listeners
The custom regression suite is biased toward these risks rather than only generic scanner output.
Response-Header Baseline
The proxy now applies the same baseline security response headers across normal routes, 404s, and disabled admin/debug endpoints:
X-Content-Type-Options: nosniffX-Frame-Options: DENYCross-Origin-Resource-Policy: same-originCache-Control: no-store, no-cache, must-revalidate, max-age=0Pragma: no-cacheExpires: 0
That removes the weaker edge-path behavior where scanners could still reach missing or disabled routes without the same browser and cache protections as the main API surface.
Container And Chart Posture
- the runtime image now runs as a non-root user
- the runtime image keeps a read-only root filesystem
- Helm drops all capabilities and blocks privilege escalation
- the chart can optionally mount host
/procread-only for richer process/system metrics
The host /proc mount is intentional. Trivy would normally flag this, so CI uses a narrow .trivyignore.yaml exception for the specific chart template path rather than disabling the broader class of checks.
Admin and Debug Endpoints
The following are disabled by default and should stay restricted:
/debug/queries/debug/pprof/*
Enable only for controlled troubleshooting windows. On non-loopback listen addresses the proxy now refuses to start with these enabled unless -server.admin-auth-token is set.
/metrics stays available on the main listener when instrumentation is enabled, but the default export now suppresses per-tenant and per-client identity labels. Opt back in with -metrics.export-sensitive-labels=true only on trusted scrape paths.
Recommended Production Baseline
- explicit
-tenant-map(avoid implicit defaults for multi-tenant production) - keep
-tenant.allow-global=falseunless you intentionally need wildcard backend-default access - strict
/tailorigin allowlist - conservative request-size and timeout limits
- explicit
-http-read-header-timeoutand bounded/metricsconcurrency ServiceMonitor+ alerting on5xx, circuit breaker open state, and backend latency-server.admin-auth-tokenfor debug/admin surfaces-peer-auth-tokenwhen peer cache crosses network trust boundaries- avoid exposing debug/admin endpoints publicly
Local Security Validation
Useful local commands while working on hardening or CI changes:
# repo secret scan
docker run --rm -v "$PWD:/repo" -w /repo \
ghcr.io/gitleaks/gitleaks:v8.28.0 \
detect --source . --report-format sarif --report-path gitleaks.sarif --exit-code 1
# Go SAST
go install github.com/securego/gosec/v2/cmd/gosec@v2.22.7
"$(go env GOPATH)/bin/gosec" \
-exclude=G104,G108,G115,G301,G302,G304,G306,G402,G404 \
-exclude-generated \
./...
# filesystem scan with the same allowlist CI uses
docker run --rm -v "$PWD:/repo" -w /repo \
aquasec/trivy:0.69.3 \
fs . \
--ignorefile .trivyignore.yaml \
--scanners vuln,misconfig,secret \
--severity HIGH,CRITICAL \
--ignore-unfixed \
--exit-code 1 \
--skip-version-check
# workflow and Dockerfile linting
docker run --rm -v "$PWD:/repo" -w /repo rhysd/actionlint:1.7.7 -color
docker run --rm -i -v "$PWD/.hadolint.yaml:/root/.config/hadolint.yaml:ro" \
hadolint/hadolint:v2.12.0 < Dockerfile
# supply-chain posture gate
docker run --rm \
-e GITHUB_AUTH_TOKEN="${GITHUB_TOKEN}" \
gcr.io/openssf/scorecard:stable \
--repo="github.com/ReliablyObserve/Loki-VL-proxy" \
--format json \
--show-details > scorecard.json
python3 scripts/ci/check_scorecard.py scorecard.json \
--min-overall 5.0 \
--require-check Dangerous-Workflow=10 \
--require-check Binary-Artifacts=10 \
--require-check CI-Tests=8 \
--require-check SAST=7
# repo-specific runtime checks
./scripts/ci/run_security_regressions.sh
./scripts/ci/run_zap_scan.sh baseline
./scripts/ci/run_nuclei_scan.sh
When reproducing ZAP locally, expect occasional 10049 Non-Storable Content warnings on deliberate 404 discovery paths such as / or disabled /debug/* endpoints. Those reports are useful for visibility but are not currently treated as exploitable proxy issues.