Skip to content

feat: @actions/cache optional path validation during restore#2414

Open
jasongin wants to merge 8 commits into
mainfrom
jasongin/cache-path-validation
Open

feat: @actions/cache optional path validation during restore#2414
jasongin wants to merge 8 commits into
mainfrom
jasongin/cache-path-validation

Conversation

@jasongin
Copy link
Copy Markdown

@jasongin jasongin commented May 20, 2026

Optional path validation during cache restore

New restoreCache() option

A new opt-in DownloadOptions.pathValidation on @actions/cache that streams a downloaded cache archive through an in-process validator before handing it to system tar for extraction. Each entry's path (and, for symlinks/hardlinks, its target) must resolve under one of the declared cache paths — anything that escapes (../..., absolute, UNC, drive-relative, NUL bytes, unsupported entry types) is caught by the validator.

await cache.restoreCache(paths, key, restoreKeys, {
  pathValidation: 'error' // 'off' (default) | 'warn' | 'error'
})
  • 'off' — legacy behavior, validation is skipped.
  • 'warn' — validate; on violations, log via core.warning / core.debug and extract anyway.
  • 'error' — validate; on violations throw CacheIntegrityError (code: 'PATH_VIOLATION', with violations attached). No files are extracted.

Architecture: node-tar for listing, system tar for extraction

This is the biggest design call in the PR.

Listing/validation uses node-tar v7's Parser in-process.

  • Pure JS, deterministic across platforms, exposes structured entry metadata via onReadEntry. No subprocess, no parsing tar -tv text output that varies by GNU/BSD/Windows.
  • We're inspecting untrusted attacker-controlled bytes; doing it in JS avoids any shell-quoting/argv concern.
  • node-tar is stricter than system tar: it surfaces TAR_BAD_ARCHIVE / TAR_ENTRY_INVALID / TAR_ENTRY_ERROR / TAR_ABORT as discrete codes we can promote to parse errors, which is exactly what a pre-extraction gate wants.
  • Gzip is handled natively by node-tar; for zstd we spawn the system zstd binary so we get the same --long=30 window behavior the extract path uses (Node's built-in zstd is still experimental).

Extraction stays on system tar.

  • The tar -P (--absolute-names) option must be used when extracting cache archives. We cannot use tar to validate that extracted paths are within a target directory, because cache paths frequently live outside $GITHUB_WORKSPACE (~/.npm, ~/.cargo, ~/.gradle/caches, etc.). cacheUtils.resolvePaths rewrites every path relative to the workspace, so anything outside becomes ../../..., which GNU tar refuses without -P. node-tar's extractor refuses these by default — swapping it in wholesale would break essentially every common dependency cache.
  • The existing extract path is heavily tuned for cross-platform quirks (BSD/GNU detection, --force-local, --delay-directory-restore, zstdmt / zstd -T0, the BSD-tar-on-Windows-with-zstd workaround). Re-implementing all of that on node-tar's writer is a much larger, riskier change.
  • Zero behavior change for users who don't opt in.

Soundness of the split: The validator is strictly more conservative than system tar: anything the validator approves, tar accepts; in 'error' mode tar is never invoked when the validator rejects. 'warn' mode exists for the potential rare case where the validator is stricter than system tar (e.g. unusual PAX headers it can't classify) — a deliberate ergonomic concession so opting in doesn't break working caches.

Performance tradeoff: Validation is a full pre-pass over the archive: decompress + parse headers (entry bodies are drained, not read or written to disk), then system tar decompresses a second time to extract. Roughly a 2× decompress cost on the archive stream, no extra disk I/O at the destination. The extra pass is skipped when validation is 'off'.

Other approaches considered but rejected:

  • Validate and extract in one pass - More efficient but less secure since malicious files could be extracted before the validator aborts. And requires either of the following which are both problematic.
  • Validate and extract with node-tar (replace system-tar extraction code) - code would be much simpler for not having to deal with the variety of tar implementations. But there would be major risk of regressions, and node-tar extraction is significantly slower for extracting large archives.
  • Validate and extract with system tar - Avoids dependency on the node-tar package, and could be slightly faster for validation. But the format of tar's list output varies widely and might not accurately display links and unusual path characters which are exactly the kinds of things we need to carefully validate.

Other design notes

  • String enum, not boolean. Three discrete modes map 1:1 onto behavior and leave room for future modes (e.g. 'quarantine').
  • Allowed roots derived from paths. deriveAllowedRoots mirrors cacheUtils.resolvePaths: tilde / env-var expansion, glob-prefix stripping, etc.
  • prepareAllowedRoots() pre-bakes per-root normalize/lowercase/trailing-sep forms once per archive, dropping the per-entry hot loop from O(N · R) string ops to O(N + R).
  • Zstd lifecycle hardening: 64 KiB stderr cap, .catch(() => undefined) on the pipeline / exit promises to suppress late rejections, try/finally SIGTERM if the child is still running on early exit.

Known limitations

  • TOCTOU between validation and extraction. The validator never touches the filesystem. A pre-existing symlink in the workspace that the archive then writes "through", or a concurrent process creating one between passes, won't be caught. Closing this would require an extractor that refuses to traverse existing symlinks (node-tar's does, but see -P above).
  • Case-sensitivity heuristic (caseInsensitiveFs()) is a platform-level guess. On case-sensitive APFS or per-dir-case-sensitive NTFS it loosens (fewer violations) rather than tightens — error direction is "lets something through", not "miss at the comparison layer".

Tests

115 total: 86 pure-string unit tests for pathValidation.ts, 14 real-archive tests for listAndValidate.ts (zstd tests describe.skip when zstd isn't on PATH, scratch space via mkdtempSync), 15 mocked integration tests for tar.ts covering both modes, parse-error handling per mode, and the "no extraction in 'error' mode" assertion. See docs/path-validation-test-plan.md in this PR for a more detailed description of the test coverage.

Out of scope

  • Replacing system tar for extraction - see explanation above.
  • Entry-count / total-bytes caps (cache backend enforces archive size at upload).
  • Save-path validation (restore-only here).

actions/cache update

actions/cache#1761 is a related PR to actually use this path validation when restoring from cache. The path validation mode will be a new input to the action; initially the default will be 'warn', then we can consider changing the default to 'error' in a major version update.

@jasongin jasongin requested a review from a team as a code owner May 20, 2026 19:33
Copilot AI review requested due to automatic review settings May 20, 2026 19:33
Comment thread packages/cache/src/internal/pathValidation.ts Dismissed
Comment thread packages/cache/src/internal/pathValidation.ts Fixed
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an opt-in integrity gate to @actions/cache restore that validates tar archive entry paths (and link targets) against the user-declared cache paths prior to extraction, surfacing violations as warnings or as a hard CacheIntegrityError depending on mode.

Changes:

  • Introduces DownloadOptions.pathValidation ('off' | 'warn' | 'error') and plumbs it through restoreCache*() to extractTar().
  • Adds an in-process archive listing/validation pass (tar Parser + custom path/link validation) and a new CacheIntegrityError for parse/violation failures.
  • Updates package metadata (version bump, tar dependency, Node engine) and adds comprehensive unit/integration tests + test plan doc.
Show a summary per file
File Description
packages/cache/src/options.ts Adds pathValidation option with defaults and validation in getDownloadOptions().
packages/cache/src/internal/tar.ts Runs optional pre-extraction validation and reports/throws based on mode.
packages/cache/src/internal/pathValidation.ts Implements allowed-root derivation, entry/link validation, and violation formatting.
packages/cache/src/internal/listAndValidate.ts Streams archives through tar Parser (plus zstd decompression) to collect violations.
packages/cache/src/internal/cacheIntegrityError.ts Defines CacheIntegrityError + error codes for integrity failures.
packages/cache/src/cache.ts Exposes CacheIntegrityError publicly and forwards pathValidation/paths into extraction.
packages/cache/package.json Bumps version, adds tar dep, adds engines.node, adjusts exports.
packages/cache/package-lock.json Updates lockfile for new version and dependency graph.
packages/cache/docs/path-validation-test-plan.md Documents test strategy/coverage for the new feature.
packages/cache/tests/tarPathValidation.test.ts Adds integration tests asserting extract behavior across modes.
packages/cache/tests/restoreCache.test.ts Updates assertions for new extractTar argument plumbing (v1).
packages/cache/tests/restoreCacheV2.test.ts Updates assertions for new extractTar argument plumbing (v2).
packages/cache/tests/pathValidation.test.ts Adds extensive unit tests for root derivation + entry/link validation.
packages/cache/tests/options.test.ts Validates new default/override behavior for pathValidation.
packages/cache/tests/listAndValidate.test.ts Adds real-archive tests for gzip/zstd parsing + validation behavior.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files not reviewed (1)
  • packages/cache/package-lock.json: Language not supported
  • Files reviewed: 14/15 changed files
  • Comments generated: 3

Comment thread packages/cache/src/internal/pathValidation.ts Outdated
Comment thread packages/cache/src/internal/pathValidation.ts
Comment thread packages/cache/docs/path-validation-test-plan.md Outdated
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* - 'error': surfaced as `CacheIntegrityError` with `code: 'PARSE_ERROR'`
* and extraction does not run.
*
* @default 'off'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a security bug fix.

I'm wondering if we should default to error. We could major-version the package if we're worried about a breaking change.

Copy link
Copy Markdown
Author

@jasongin jasongin May 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are consumers of this package other than actions/cache (primarily actions/setup-*), so I didn't want to risk impacting them automatically. Even with a major version update, they might not realize the implications of taking the new major version. So I think it's safer to require consumers to opt in explicitly. Or at best 'warn' could be the default with a major version update.

The linked PR actions/cache#1761 defaults to warn and I proposed updating the default to error in a major version update of the action (not the toolkit package). We should consider similar updates to actions/setup-* repos.

Comment thread packages/cache/src/internal/pathValidation.ts Fixed
Comment thread packages/cache/src/internal/pathValidation.ts Fixed
@jasongin jasongin requested review from a team as code owners May 20, 2026 21:44
Comment thread packages/cache/src/internal/pathValidation.ts Fixed
Comment thread packages/cache/src/internal/pathValidation.ts Dismissed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants