refactor(epubnorm): bring god-functions under lint gates, delete exclusions (bookshelf-bbsd.4) #915

Open
zombor wants to merge 1 commit from bd-bookshelf-bbsd.4 into main
Owner

Summary

  • Extracts helpers from NormalizeEPUB, repairAndSerializeOPF, and buildEPUBBytes (test helper) so every function passes funlen (60L/40stmt), gocyclo (15), and nestif (5) gates
  • Deletes all 5 grandfathered .golangci.yml exclusion entries for internal/epubnorm/ (funlen × 2, gocyclo × 2, nestif × 1)
  • Behavior-preserving: all existing epubnorm tests pass unchanged; EPUB normalization output is byte-identical

Key extractions:

  • writeStructuralEntries + copyNonStructuralEntries + skipEntry from NormalizeEPUB
  • opfPathFromContainerXML + firstValidRootfile to flatten deep nesting in resolveOPFPath
  • ensureOPFDefaults, repairManifest, buildManifestByID, repairSpine from repairAndSerializeOPF
  • scaledDimensions from downscaleIfNeeded
  • addZIPEntry, writeMimetypeEntry, writeContainerEntry, writeOPFEntry, writeOptionalEntries from buildEPUBBytes

Test plan

  • make test — all tests pass
  • make lint — 0 issues (including golangci-lint, e2e-policy-check, test-policy-check)
  • make coverage — check-coverage: OK
  • .golangci.yml diff vs main: 5 exclusion entries removed, 0 added

Closes bead bookshelf-bbsd.4 on merge.

## Summary - Extracts helpers from `NormalizeEPUB`, `repairAndSerializeOPF`, and `buildEPUBBytes` (test helper) so every function passes funlen (60L/40stmt), gocyclo (15), and nestif (5) gates - Deletes all 5 grandfathered `.golangci.yml` exclusion entries for `internal/epubnorm/` (funlen × 2, gocyclo × 2, nestif × 1) - Behavior-preserving: all existing epubnorm tests pass unchanged; EPUB normalization output is byte-identical Key extractions: - `writeStructuralEntries` + `copyNonStructuralEntries` + `skipEntry` from `NormalizeEPUB` - `opfPathFromContainerXML` + `firstValidRootfile` to flatten deep nesting in `resolveOPFPath` - `ensureOPFDefaults`, `repairManifest`, `buildManifestByID`, `repairSpine` from `repairAndSerializeOPF` - `scaledDimensions` from `downscaleIfNeeded` - `addZIPEntry`, `writeMimetypeEntry`, `writeContainerEntry`, `writeOPFEntry`, `writeOptionalEntries` from `buildEPUBBytes` ## Test plan - [x] `make test` — all tests pass - [x] `make lint` — 0 issues (including golangci-lint, e2e-policy-check, test-policy-check) - [x] `make coverage` — check-coverage: OK - [x] `.golangci.yml` diff vs main: 5 exclusion entries removed, 0 added Closes bead bookshelf-bbsd.4 on merge.
refactor(epubnorm): bring NormalizeEPUB, repairAndSerializeOPF, buildEPUBBytes under lint gates; delete exclusions (bookshelf-bbsd.4)
All checks were successful
/ JS Unit Tests (pull_request) Successful in 1m7s
/ E2E API (pull_request) Successful in 2m1s
/ Integration (pull_request) Successful in 2m56s
/ Lint (pull_request) Successful in 4m0s
/ E2E Browser (pull_request) Successful in 3m25s
/ Test (pull_request) Successful in 4m43s
b9049110bd
Extract helpers to reduce funlen/gocyclo/nestif in the three grandfathered
functions:

epubnorm.go:
- NormalizeEPUB → delegates to writeStructuralEntries + copyNonStructuralEntries
- resolveOPFPath → opfPathFromContainerXML + firstValidRootfile (flatten deep nesting)
- repairAndSerializeOPF → ensureOPFDefaults + repairManifest + buildManifestByID + repairSpine
- downscaleIfNeeded → scaledDimensions

helpers_test.go (buildEPUBBytes):
- addZIPEntry, writeMimetypeEntry, writeContainerEntry, writeOPFEntry, writeOptionalEntries

.golangci.yml: delete all 5 epubnorm exclusion entries (funlen × 2, gocyclo × 2, nestif × 1).

Behavior-preserving: all existing epubnorm tests pass unchanged.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Workflow Detail page screenshot (wf-detail-older-execution)

Older completed ContinueAsNew epoch detail — execution ID and state visible, Cancel absent.

wf-detail-older-execution

**Workflow Detail page screenshot** (wf-detail-older-execution) Older completed ContinueAsNew epoch detail — execution ID and state visible, Cancel absent. ![wf-detail-older-execution](/attachments/2d3f546c-e470-4cd4-ae31-de5600e45a8e)

Recompute Match Score — kebab open screenshot (recompute-match-score-kebab-open)

recompute-match-score-kebab-open

**Recompute Match Score — kebab open screenshot** (recompute-match-score-kebab-open) ![recompute-match-score-kebab-open](/attachments/3cd38962-631d-4bc8-8cc1-63e7e782d7cc)
Author
Owner

Security Review — PR #915 (bookshelf-bbsd.4)

Scope: adversarial review of the epubnorm refactor (NormalizeEPUB + OPF repair + cover downscale helpers). Checks: zip-slip/path traversal, decompression-bomb/size caps, XML external-entity exposure, image-decode OOM guard, architecture boundary.


Guard-by-guard audit

Entry count cap (maxZipEntryCount): preserved unchanged in NormalizeEPUB. No regression.

Per-entry decompression cap (readZipEntry / io.LimitReader / maxZipEntryBytes): preserved. The totalRead *int64 pointer is correctly threaded from NormalizeEPUB through resolveOPFPathopfPathFromContainerXMLreadZipEntry, and through repairAndSerializeOPFreadZipEntry, and through copyNonStructuralEntriesreadZipEntry. All three call-sites pass the same shared pointer. Aggregate cap (maxZipTotalBytes) accumulates correctly across all three phases as before.

Mimetype LimitReader (hasValidMimetype): unchanged — not in diff, still uses io.LimitReader(rc, len(epubMimeType)+1).

Image decompression-bomb guard (downscaleIfNeeded): preserved. image.DecodeConfig (header only) + maxDecodePixels check fire before image.Decode. The extracted scaledDimensions helper is purely arithmetic — no allocation occurs there. No regression.

XML external entity / billion-laughs (OPF, container.xml): Go's encoding/xml.Unmarshal does not process DTDs or external entities by default. Both the original and refactored code use the same decoder. No regression; no new exposure.

Architecture boundary: no go-workflows or workflow-engine symbols appear in the diff or in the branch's epubnorm.go imports. Package comment retains the explicit "intentionally free of workflow-engine imports" note.

Test hygiene: helpers_test.go declares package epubnorm_test (black-box). The new helper funcs (addZIPEntry, writeMimetypeEntry, etc.) are all test-file-local; no unexported production symbols exported to game coverage. Clean.

Lint exclusion removals: the golangci.yml diff removes all epubnorm-specific funlen, gocyclo, and nestif exclusions. This is the intended outcome of the refactor and confirms the extracted helpers are within budget.


Findings

[MINOR] internal/epubnorm/epubnorm.go — pre-existing: zip entry names not sanitized before write
The output ZIP is built by passing f.Name (from the untrusted input archive) directly to writeDeflated. If a malicious EPUB contains entries named ../../etc/passwd the same name appears in the output ZIP. This is identical to the original code — the refactor did not introduce or worsen it (copyNonStructuralEntries uses the same f.Name the original loop used). Flagged for completeness per audit scope. Fix, if desired, would be a separate bead: sanitize f.Name through path.Clean and reject names that escape the archive root (start with .. after cleaning) before writing. Not a regression of this PR.


REVIEW VERDICT: 0 blocker, 0 major, 1 minor

## Security Review — PR #915 (bookshelf-bbsd.4) **Scope:** adversarial review of the epubnorm refactor (NormalizeEPUB + OPF repair + cover downscale helpers). Checks: zip-slip/path traversal, decompression-bomb/size caps, XML external-entity exposure, image-decode OOM guard, architecture boundary. --- ### Guard-by-guard audit **Entry count cap** (`maxZipEntryCount`): preserved unchanged in `NormalizeEPUB`. No regression. **Per-entry decompression cap** (`readZipEntry` / `io.LimitReader` / `maxZipEntryBytes`): preserved. The `totalRead *int64` pointer is correctly threaded from `NormalizeEPUB` through `resolveOPFPath` → `opfPathFromContainerXML` → `readZipEntry`, and through `repairAndSerializeOPF` → `readZipEntry`, and through `copyNonStructuralEntries` → `readZipEntry`. All three call-sites pass the same shared pointer. Aggregate cap (`maxZipTotalBytes`) accumulates correctly across all three phases as before. **Mimetype LimitReader** (`hasValidMimetype`): unchanged — not in diff, still uses `io.LimitReader(rc, len(epubMimeType)+1)`. **Image decompression-bomb guard** (`downscaleIfNeeded`): preserved. `image.DecodeConfig` (header only) + `maxDecodePixels` check fire before `image.Decode`. The extracted `scaledDimensions` helper is purely arithmetic — no allocation occurs there. No regression. **XML external entity / billion-laughs** (OPF, container.xml): Go's `encoding/xml.Unmarshal` does not process DTDs or external entities by default. Both the original and refactored code use the same decoder. No regression; no new exposure. **Architecture boundary**: no `go-workflows` or workflow-engine symbols appear in the diff or in the branch's `epubnorm.go` imports. Package comment retains the explicit "intentionally free of workflow-engine imports" note. **Test hygiene**: `helpers_test.go` declares `package epubnorm_test` (black-box). The new helper funcs (`addZIPEntry`, `writeMimetypeEntry`, etc.) are all test-file-local; no unexported production symbols exported to game coverage. Clean. **Lint exclusion removals**: the `golangci.yml` diff removes all epubnorm-specific `funlen`, `gocyclo`, and `nestif` exclusions. This is the intended outcome of the refactor and confirms the extracted helpers are within budget. --- ### Findings [MINOR] internal/epubnorm/epubnorm.go — pre-existing: zip entry names not sanitized before write The output ZIP is built by passing `f.Name` (from the untrusted input archive) directly to `writeDeflated`. If a malicious EPUB contains entries named `../../etc/passwd` the same name appears in the output ZIP. This is identical to the original code — the refactor did not introduce or worsen it (`copyNonStructuralEntries` uses the same `f.Name` the original loop used). Flagged for completeness per audit scope. Fix, if desired, would be a separate bead: sanitize `f.Name` through `path.Clean` and reject names that escape the archive root (start with `..` after cleaning) before writing. Not a regression of this PR. --- REVIEW VERDICT: 0 blocker, 0 major, 1 minor
Author
Owner

CODE REVIEW: APPROVED

Reviewed diff: 3 Go files changed (.golangci.yml, internal/epubnorm/epubnorm.go, internal/epubnorm/helpers_test.go).


Phase 0: DEMO Verification

No DEMO block required — this is a behavior-preserving refactor (no new user-facing surface). Proceeding to spec compliance.


Phase 1: Spec Compliance

Bead bookshelf-bbsd.4: extract helpers from NormalizeEPUB / repairAndSerializeOPF / buildEPUBBytes to clear funlen/gocyclo/nestif gates; delete all 5 epubnorm exclusions from .golangci.yml.

All five exclusions removed, zero added. Diff confirms clean burn-down.


Phase 2: Code Quality

1. EPUB output byte-identical (mimetype first, STORED)

internal/epubnorm/epubnorm.go:164 — writeStructuralEntries calls writeStored for "mimetype" as the first entry, then writeDeflated for container.xml, then writeDeflated for the OPF. Order and compression method are identical to the original monolithic function. skipEntry (line 179) excludes the same names (mimetype, containerXMLPath, opfPath, encryptionXMLPath, stripPaths) from copyNonStructuralEntries, preventing duplication. VERIFIED ✓

2. OPF repair semantics identical

  • ensureOPFDefaults (line 472): sets Version="2.0" and NS to the IDPF URL if missing — identical to the two inline guards in the original.
  • repairManifest (line 484): strips fonts (adds to stripPaths), strips missing-file items, appends survivors — identical filtering as the original loop. Critically, buildManifestByID (line 505) is called on the already-filtered repairedItems output, so the id→archivePath map contains only surviving items, same as the original where manifestByID was populated inside the same loop.
  • repairSpine (line 514): removes idrefs with no manifestByID entry — identical.
  • opfPathFromContainerXML / firstValidRootfile (lines 367/382): De Morgan inversion of the original nested if/continue logic; semantically equivalent. readZipEntry bytes still flow into the shared totalRead pointer on success and on error (error returns "" so falls through to OPF fallback scan, same as the original readErr == nil guard). VERIFIED ✓

3. Cover downscale identical

scaledDimensions (line 595) reproduces the original if w >= h branch exactly: dstW = maxImageLongEdge, dstH = h*maxImageLongEdge/w clamped to 1; and the else branch symmetrically. In downscaleIfNeeded, img.Bounds() is called twice (once for Dx/Dy into scaledDimensions, once as the src rect for CatmullRom.Scale) vs the original single bounds variable — image.Image.Bounds() is pure/deterministic, so this is a safe no-op change. VERIFIED ✓

4. Lint exclusion burn-down

.golangci.yml diff: funlen×2 (NormalizeEPUB|repairAndSerializeOPF in epubnorm.go, buildEPUBBytes in helpers_test.go), gocyclo×2 (NormalizeEPUB in epubnorm.go, buildEPUBBytes in helpers_test.go), nestif×1 (epubnorm.go whole-file) — all five entries deleted, zero new exclusions added. CI green confirms all extracted helpers pass the gates without an exemption. VERIFIED ✓

5. Coverage / test quality

helpers_test.go changes are pure test-helper refactoring: addZIPEntry, writeMimetypeEntry, writeContainerEntry, writeOPFEntry, writeOptionalEntries split buildEPUBBytes to bring it under funlen/gocyclo. No Ginkgo Describe/It blocks were added or modified in this file — it contains only builder helpers. All spec assertions remain in epubnorm_test.go (32 It blocks, unchanged by this PR). No vacuous or assert-nothing tests introduced. VERIFIED ✓

6. Convention

  • All 10 extracted production helpers are unexported (lowercase). ✓
  • Package declaration: package epubnorm_test (black-box). ✓
  • No unexported symbols exported to game black-box coverage. ✓
  • Pre-existing two-Expect It at epubnorm_test.go:48 is untouched by this PR (identical on origin/main); not flagged here.

REVIEW VERDICT: 0 blocker, 0 major, 0 minor

CODE REVIEW: APPROVED Reviewed diff: 3 Go files changed (.golangci.yml, internal/epubnorm/epubnorm.go, internal/epubnorm/helpers_test.go). --- ## Phase 0: DEMO Verification No DEMO block required — this is a behavior-preserving refactor (no new user-facing surface). Proceeding to spec compliance. --- ## Phase 1: Spec Compliance Bead bookshelf-bbsd.4: extract helpers from NormalizeEPUB / repairAndSerializeOPF / buildEPUBBytes to clear funlen/gocyclo/nestif gates; delete all 5 epubnorm exclusions from .golangci.yml. All five exclusions removed, zero added. Diff confirms clean burn-down. --- ## Phase 2: Code Quality ### 1. EPUB output byte-identical (mimetype first, STORED) internal/epubnorm/epubnorm.go:164 — writeStructuralEntries calls writeStored for "mimetype" as the first entry, then writeDeflated for container.xml, then writeDeflated for the OPF. Order and compression method are identical to the original monolithic function. skipEntry (line 179) excludes the same names (mimetype, containerXMLPath, opfPath, encryptionXMLPath, stripPaths) from copyNonStructuralEntries, preventing duplication. VERIFIED ✓ ### 2. OPF repair semantics identical - ensureOPFDefaults (line 472): sets Version="2.0" and NS to the IDPF URL if missing — identical to the two inline guards in the original. - repairManifest (line 484): strips fonts (adds to stripPaths), strips missing-file items, appends survivors — identical filtering as the original loop. Critically, buildManifestByID (line 505) is called on the already-filtered repairedItems output, so the id→archivePath map contains only surviving items, same as the original where manifestByID was populated inside the same loop. - repairSpine (line 514): removes idrefs with no manifestByID entry — identical. - opfPathFromContainerXML / firstValidRootfile (lines 367/382): De Morgan inversion of the original nested if/continue logic; semantically equivalent. readZipEntry bytes still flow into the shared totalRead pointer on success and on error (error returns "" so falls through to OPF fallback scan, same as the original readErr == nil guard). VERIFIED ✓ ### 3. Cover downscale identical scaledDimensions (line 595) reproduces the original if w >= h branch exactly: dstW = maxImageLongEdge, dstH = h*maxImageLongEdge/w clamped to 1; and the else branch symmetrically. In downscaleIfNeeded, img.Bounds() is called twice (once for Dx/Dy into scaledDimensions, once as the src rect for CatmullRom.Scale) vs the original single bounds variable — image.Image.Bounds() is pure/deterministic, so this is a safe no-op change. VERIFIED ✓ ### 4. Lint exclusion burn-down .golangci.yml diff: funlen×2 (NormalizeEPUB|repairAndSerializeOPF in epubnorm.go, buildEPUBBytes in helpers_test.go), gocyclo×2 (NormalizeEPUB in epubnorm.go, buildEPUBBytes in helpers_test.go), nestif×1 (epubnorm.go whole-file) — all five entries deleted, zero new exclusions added. CI green confirms all extracted helpers pass the gates without an exemption. VERIFIED ✓ ### 5. Coverage / test quality helpers_test.go changes are pure test-helper refactoring: addZIPEntry, writeMimetypeEntry, writeContainerEntry, writeOPFEntry, writeOptionalEntries split buildEPUBBytes to bring it under funlen/gocyclo. No Ginkgo Describe/It blocks were added or modified in this file — it contains only builder helpers. All spec assertions remain in epubnorm_test.go (32 It blocks, unchanged by this PR). No vacuous or assert-nothing tests introduced. VERIFIED ✓ ### 6. Convention - All 10 extracted production helpers are unexported (lowercase). ✓ - Package declaration: package epubnorm_test (black-box). ✓ - No unexported symbols exported to game black-box coverage. ✓ - Pre-existing two-Expect It at epubnorm_test.go:48 is untouched by this PR (identical on origin/main); not flagged here. --- REVIEW VERDICT: 0 blocker, 0 major, 0 minor
All checks were successful
/ JS Unit Tests (pull_request) Successful in 1m7s
/ E2E API (pull_request) Successful in 2m1s
Required
Details
/ Integration (pull_request) Successful in 2m56s
Required
Details
/ Lint (pull_request) Successful in 4m0s
Required
Details
/ E2E Browser (pull_request) Successful in 3m25s
Required
Details
/ Test (pull_request) Successful in 4m43s
Required
Details
This pull request is blocked because it's outdated.
This branch is out-of-date with the base branch
You are not authorized to merge this pull request.
View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u origin bd-bookshelf-bbsd.4:bd-bookshelf-bbsd.4
git switch bd-bookshelf-bbsd.4
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
zombor/pergamum!915
No description provided.