feat(files): CBR cover extraction via rardecode/v2 (bookshelf-dct5) #284
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "bd-bookshelf-dct5"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
.cbr(RAR-packed comics) support toExtractCoverininternal/files/github.com/nwaples/rardecode/v2(pure Go, BSD, handles RAR3 + RAR5)extractCBZCover: iterate entries → collect images → natural-sort → return firstUnPackedSize > 50MBdoc.goupdated to list CBR among supported formatsTest plan
make testpasses (all 183 specs green)make lintclean (0 issues)make coveragepasses (100% oninternal/files)sample.cbr(2 pages → returns page-001),sample-no-images.cbr→ ErrNoCover,sample-corrupt.cbr→ errorstaticRARIterator: dir-skip, oversize-entry (ErrCoverTooLarge), read-error, next-errorCloses bead bookshelf-dct5 on merge.
@ -0,0 +1,142 @@package filesWe don't do "internal" tests. all tests must use the public interface of the package.
f389caf4d01a053ac8d1CODE REVIEW: APPROVED (minor findings only)
Phase 0: DEMO Verification
No DEMO block found in the bead comments or PR description. The bead workflow requires a DEMO block; this PR has only a test plan checklist. However, CI is green (
success) at head SHA1a053ac8d13a409c332f9a5e432bf8afa1dc3795, which provides behavioral truth. Per review-standard.md, proceeding on CI green.Phase 1: Spec Compliance — PASS
All scope items from the expanded spec are implemented:
extractCBRCoverininternal/files/cover_extract_cbr.go,ExtractCoverswitch updated.hasReadableCBZ→hasReadableComicrename — completed; grep confirms zero remaininghasReadableCBZreferences.persistComicCBR extension — wired ininternal/library/wire.goviacomicPersistCallbackswitch on.cbr/default;isComicExtinpersist.goaccepts.cbzand.cbr.ExtractCoverwith RAR5 binary fixtures.ParseCBRininternal/comic/comicinfo.go— shared XML parser viatoMetadata/decodeComicInfoReader; parser is not duplicated.Phase 2: Code Quality
[MINOR] internal/files/cover_extract_cbr.go:29-43 — Custom
rarIteratorinterface violates project-conventions.mdThe file defines
rarIterator,rarFileHeader, andrardecodeIteratoras a domain interface. project-conventions.md says: “Custom interfaces in domain code when functional arguments work — do NOT use.” The iterator could instead be afunc() (rarFileHeader, io.Reader, error)closure injected intoextractCBRCoverFromIterator, matching the curried-functional pattern used everywhere else in the codebase. Currently testable only indirectly through the publicExtractCoverentry point anyway (the internal test was deleted), so the interface buys nothing tests couldn't get from a func arg. No correctness impact.[MINOR] internal/files/cover_extract.go:349-359 —
collectImageEntriesuses lexicographic sort;extractCBZCoveris inconsistent with CBRThis is pre-existing (not introduced in this PR).
collectImageEntriessorts by plain string compare whilecollectCBZImageEntriesincbz.gousesnaturalLess. The new CBR path correctly usesnaturalLess. The CBZ cover-extraction path (cover_extract.go:336) calls the oldcollectImageEntriesand sorts lexicographically, sopage10.jpg < page2.jpg. Not introduced by this PR; flagged for awareness.[MINOR] internal/comic/comicinfo_test.go:328-346 — ParseCBR “no ComicInfo.xml” test exercises open-error path, not ErrNoComicInfo
The test comment says it exercises
ErrNoComicInfobut the MapFS is empty soParseCBRreturns an open error before reaching the entry-iteration loop.ErrNoComicInfois covered by the internal tests (TestParseCBR_ErrNoComicInfoviano_comicinfo.cbrfixture) and the Ginkgo test misleads the reader. No coverage gap, but the comment and test name are wrong.[MINOR] internal/comic/comicinfo.go:253 and internal/files/cover_extract_cbr.go:16 — duplicate
rarEntrytype nameBoth
internal/comicandinternal/filesdefine a privaterarEntrystruct. They are in different packages so there is no compile issue, but the naming overlap is confusing when reading cross-package diffs. Minor.Positive observations
naturalLessis correctly shared fromcbz.go— not duplicated.imageExtsandmaxCoverBytesare shared fromcover_extract.go— not duplicated.toMetadataXML parser is shared between CBZ and CBR paths ininternal/comic.check-coverage.shhas no new exclusions (verified).hasReadableComic) coverscbx,cbz,cbr, uppercase variants, empty slice, and non-comic types..cbrfile triggers thepersistComiccallback (line 1889-1941 inpersist_test.go).BeforeEach/JustBeforeEach/Itseparation; happy-path error folded into value assertions.REVIEW VERDICT: 0 blocker, 0 major, 4 minor
Security Review — PR #284 (CBR support, branch bd-bookshelf-dct5)
Reviewed:
internal/files/cover_extract_cbr.go,internal/files/cover_extract_cbr_test.go,internal/comic/comicinfo.go,internal/library/scan/persist.go,internal/books/service.go,go.mod, test fixtures.Findings
Verified non-issues
h.unPackedSize > maxCoverBytesguard at line 86 fires from the RAR header before anyio.ReadAll. Theoversized.cbrfixture (54 bytes, reports UnPackedSize > 50 MB) and theLimitReaderdefense-in-depth both confirmed. ✓encoding/xmldoes not resolve external entities (no XXE). The ComicInfo.xml decoder is wrapped in aLimitReader(1 MB), which prevents billion-laughs expansion. ✓corrupt-file-header.cbrfixture exercisesr.Next()returning a header-CRC error; rardecode validates block CRCs during iteration by default. ✓internal/files/, 65–163 bytes forinternal/comic/), handcrafted binary; no LFS or repo-crawler concerns. ✓REVIEW VERDICT: 0 blocker, 2 major, 1 minor
443235bcf00bd62f333bSecurity Review — bd-bookshelf-dct5 @
f1b0d557Scope: CBR cover extraction only (rardecode/v2 dep,
extractCBRCover,ExtractCoverswitch, public-interface tests with binary fixtures).Resolution of previous MAJORs
MAJOR #1 (streaming ParseCBR into 512 MB RAM) — RESOLVED.
git grep ParseCBR origin/bd-bookshelf-dct5finds no ParseCBR symbol in the diff. The entireParseCBR/CBR-ComicInfo-XML work is confirmed absent frominternal/files/in this branch.internal/comic/is not touched by the diff (zero delta vs main). The concern is gone.MAJOR #2 (no entry-count cap) — RESOLVED.
maxImageEntries = 2000is declared atinternal/files/cover_extract_cbr.go:19. The guard fires at line 94:if len(imageEntries) >= maxImageEntries { break }. Theexport_test.gobridge exposesMaxImageEntries = &maxImageEntries(a pointer so tests mutate the package var). Themany-entries.cbr(6-entry fixture) test lowers the cap to 5 and exercises the early-break path. Both the guard and its test coverage are confirmed present.MINOR (rardecode/v2 indirect) — RESOLVED.
go.modline 12:github.com/nwaples/rardecode/v2 v2.2.3appears in the first (direct)requireblock, not the// indirectblock.go mod tidywas run.Security checks — this review
Decompression-bomb / per-entry size cap.
h.unPackedSize > maxCoverBytes(line 85) rejects any entry reporting ≥50 MB before anyio.ReadAllis attempted. TheLimitReader(reader, maxCoverBytes)on line 89 provides defence-in-depth even ifUnPackedSizeis underreported (not possible in RAR5 but is a belt-and-suspenders guard).oversized.cbr(54 bytes on disk, UnPackedSize header set > 50 MB) confirms the pre-read rejection path. PASS.Path traversal.
RAR entry names (
h.name) are used for exactly three things:filepath.Ext(extension filter), error message formatting, and therarEntry.namesort key. Entry names are never concatenated into anos.Open,os.Create,filepath.Join, or any other filesystem path. The extracted bytes are returned in-memory. No path traversal surface exists. PASS.Resource exhaustion — entry count.
maxImageEntries = 2000caps RAM accumulation to at most 2000 entries × 50 MB each (theoretical). In practice theunPackedSizepre-check means any entry larger than 50 MB causes an immediate error return, not silent accumulation. A real-world worst case (2000 entries × ~50 MB each) is ~100 GB of RAM, which is not reachable in normal operation but remains a theoretical upper bound. For a personal library server with operator-supplied CBR files this is an acceptable trade-off; for an untrusted multi-tenant intake path it would warrant lowering the per-entry cap or the max-entries cap further. Given the stated use case (personal library, user-owned files), this is acceptable — no finding raised.CRC / data-integrity validation.
rardecode/v2 validates CRC on decompressed blocks by default.
corrupt-file-header.cbr(valid RAR5 archive header but corrupt file-block CRC) causesr.Next()to returnrardecode: bad header crc, which propagates as a wrapped error.truncated-body.cbrcauses an error duringio.ReadAll. Both error paths are exercised by tests. PASS.New dependency — rardecode/v2 v2.2.3.
Version v2.2.3 confirmed in go.mod (direct require) and go.sum. No known CVE for this library. The project is actively maintained on GitHub under github.com/nwaples/rardecode; v2 is the current major. PASS.
Test fixtures.
All nine
.cbrfixtures are tiny (23–446 bytes). No embedded malicious payloads are possible at these sizes. Fixtures exercise: happy path (natural-sort cover selection), no-image entries, corrupt archive header, corrupt file-block header, directory-entry skip, oversized entry header, truncated body, case-insensitive extension dispatch, and entry-count cap. Coverage is comprehensive. PASS.REVIEW VERDICT: 0 blocker, 0 major, 0 minor
CODE REVIEW: APPROVED
Phase 0: DEMO Verification
No formal DEMO block exists in the bead or PR description. The PR description references "white-box internal tests via staticRARIterator" — a stale description from an earlier iteration that was superseded by the surgical revert. Behavioral truth deferred to CI, which is green at
f1b0d557. Tests re-run locally: 192/192 specs pass.Commands run:
Phase 1: Spec Compliance — PASS
All scope items verified against the final narrowed scope:
internal/files/cover_extract.go:.cbrcase added toExtractCoverswitch (line 72)internal/files/cover_extract_cbr.go: new file,extractCBRCover+extractCBRCoverFromIteratorinternal/files/cover_extract_cbr_test.go: 9 Contexts covering all pathsinternal/files/cover_extract_test.go: portedDecodeXMLEntry+PickFirstPDFImageerror-path tests to Ginkgointernal/files/export_test.go: bridge file, re-exports onlyinternal/files/doc.go: CBR mentionedgo.mod/go.sum:github.com/nwaples/rardecode/v2 v2.2.3added as direct depPhase 2: Security Majors — BOTH FIXED
MAJOR 1 (entry-count cap):
extractCBRCoverFromIteratorincover_extract_cbr.go:94breaks out of the loop after collectingmaxImageEntries(2000) entries. The guard fires afterappend, so at most 2000 entries are collected — correct. Thevar(notconst) declaration with testability comment at line 19 is appropriate.MAJOR 2 (ParseCBR buffering):
ParseCBRis entirely absent from this branch. Confirmed no ParseCBR symbol anywhere ininternal/files/. This work was correctly deferred to bookshelf-b3s1.Phase 2: Convention Checks — ALL PASS
cover_extract_internal_test.gois deleted (confirmed via diff:-/dev/nullin the diff)export_test.godeclarespackage files(correct bridge pattern — test-only by filename, accesses unexported symbols), contains onlyvarre-exports (MaxImageEntries,DecodeXMLEntry,PickFirstPDFImage), zero test functionspackage files_test(external package) incover_extract_cbr_test.goandcover_extract_test.gomaxImageEntriesdeclared asvarwith a comment explaining the testability rationale atcover_extract_cbr.go:15-19rarIteratorinterface remains — replaced by thefunc() (rarFileHeader, io.Reader, error)closure pattern consistent with project conventionsFindings
[MINOR] internal/files/cover_extract.go:349 —
collectImageEntries(used byextractCBZCover) sorts lexicographically whileextractCBRCoverFromIteratorusesnaturalLess. The inconsistency is pre-existing and not introduced by this PR; CBZ cover extraction used lex sort before this change. No correctness impact for the common case of zero-padded filenames (page-001, page-002). Not a blocker.[MINOR] PR description is stale — references "White-box internal tests via staticRARIterator" and "183 specs" from a prior iteration. Actual suite is 192 specs, no rarIterator interface exists. No correctness impact; just misleading for future readers.
REVIEW VERDICT: 0 blocker, 0 major, 2 minor
Code Review — bookshelf-s9vp
Phase 0: DEMO Verification
The bead description frames this as "re-enable two previously-skipped tests." The investigation comment (2026-06-01 02:36) correctly clarifies: the tests were not protected by
Skip/XIt/PItdirectives onmain— they ran and panicked with context deadline exceeded. The diff confirms noSkipremoval anywhere; the branch adds per-op timeouts to tests that were already live but unreliable. DEMO is N/A as a command-output comparison; behavioral correctness verified below.git greponorigin/bd-bookshelf-s9vpacross all three affected files: zero hits forSkip,XIt,PIt,PDescribe,XDescribe— tests are genuinely live.Phase 1: Spec Compliance
Requirements from bead + scope comment:
rodtimeouts on slow ops inbooks_bulk_custom_fetch_test.gobooks_bulk_move_test.go(updates all row target librariestest)rootDB/baseDSNvars frombrowser_suite_test.goAll three delivered. The dead-var fix correctly converts the outer
var (...)block assignments to a short-scoped:=at the declaration site, sorootDBandbaseDSNare consumed immediately and not orphaned.Phase 2: Code Quality
page.Timeout(d)semantics are correct. rod'sPage.Timeout(context.go:71) returns a cloned page with a freshcontext.WithTimeout; it does not mutate the receiver. Sopage.Timeout(30*time.Second).MustNavigate(...).MustWaitStable()runs the navigate+waitStable pair under a 30s deadline and leaves the originalpage(with its 60spageTimeout) untouched for subsequent calls. This is the right pattern — not blanket re-applying a global timeout at a different layer.Per-op timeouts target the correct calls. Navigate calls get 30s (the dominant cost); element-wait calls get 10s (DOM queries). Bare
page.MustWaitStable()andpage.MustWaitNavigation()calls that remain undecorated still run under the ambient 60spageTimeout— correct, since they follow immediately after a completed navigation and are typically sub-second.Timeout values are reasonable. 30s for navigation and 10s for element queries give clear headroom over observed worst-case CI timings while keeping per-op budgets tight enough to surface real failures promptly.
Convention compliance. All three files use
Describe/Context/ItwithBeforeEach/JustBeforeEachseparation per project conventions.AfterEachcleanup is present. No structural regressions.No compile-fail risk from dead-var removal.
rootDBandbaseDSNare declared and consumed on the same line (browser_suite_test.go:60via:=) and passed directly totestutil.NewSuiteDBon line 66. No other references exist in the suite or package.Findings
No blockers, majors, or minors found.
REVIEW VERDICT: 0 blocker, 0 major, 0 minor
CODE REVIEW: APPROVED — per-op timeouts are mechanically correct, target only the slow ops, dead vars cleanly removed, no
Skip/XIt/PIthiding anywhere.CODE REVIEW: APPROVED
Phase 0: DEMO Verification
No formal DEMO block in bead or PR description. PR description references "white-box internal tests via staticRARIterator" - a stale description from an earlier iteration. Behavioral truth deferred to CI (green at
f1b0d557). Tests re-run locally: 192/192 specs pass.Command run:
go test ./internal/files/... -vResult: SUCCESS! -- 192 Passed | 0 Failed | 0 Pending | 0 Skipped
Phase 1: Spec Compliance - PASS
All scope items verified:
internal/files/cover_extract.go:.cbrcase added toExtractCoverswitch (line 72)internal/files/cover_extract_cbr.go: new file,extractCBRCover+extractCBRCoverFromIteratorinternal/files/cover_extract_cbr_test.go: 9 Contexts covering all pathsinternal/files/cover_extract_test.go: portedDecodeXMLEntry+PickFirstPDFImageerror-path tests to Ginkgointernal/files/export_test.go: bridge file, re-exports onlyinternal/files/doc.go: CBR mentionedgo.mod/go.sum:github.com/nwaples/rardecode/v2 v2.2.3added as direct depPhase 2: Security Majors - BOTH FIXED
MAJOR 1 (entry-count cap):
extractCBRCoverFromIteratoratcover_extract_cbr.go:94breaks out of the loop after collectingmaxImageEntries(2000) entries. Guard fires afterappend, so at most 2000 entries are collected. Thevar(notconst) declaration with testability comment at line 19 is appropriate.MAJOR 2 (ParseCBR buffering):
ParseCBRis entirely absent from this branch. No ParseCBR symbol anywhere ininternal/files/. Correctly deferred to bookshelf-b3s1.Phase 2: Convention Checks - ALL PASS
cover_extract_internal_test.gois deleted (confirmed via diff)export_test.godeclarespackage files(correct bridge pattern), contains onlyvarre-exports (MaxImageEntries,DecodeXMLEntry,PickFirstPDFImage), zero test functionspackage files_test(external) incover_extract_cbr_test.goandcover_extract_test.gomaxImageEntriesdeclared asvarwith testability comment atcover_extract_cbr.go:15-19rarIteratorinterface remains - replaced byfunc() (rarFileHeader, io.Reader, error)closure patternFindings
[MINOR] internal/files/cover_extract.go:349 -
collectImageEntries(used byextractCBZCover) sorts lexicographically whileextractCBRCoverFromIteratorusesnaturalLess. Pre-existing inconsistency not introduced by this PR. No correctness impact for zero-padded filenames.[MINOR] PR description is stale - references "white-box internal tests via staticRARIterator" and "183 specs" from a prior iteration. Actual suite is 192 specs, no rarIterator interface exists. No correctness impact.
REVIEW VERDICT: 0 blocker, 0 major, 2 minor
CODE REVIEW: APPROVED
Phase 0: DEMO Verification
No formal DEMO block in bead or PR description. PR description references "white-box internal tests via staticRARIterator" - a stale description from an earlier iteration. Behavioral truth deferred to CI (green at
f1b0d557). Tests re-run locally: 192/192 specs pass.Command run:
go test ./internal/files/... -vResult: SUCCESS! -- 192 Passed | 0 Failed | 0 Pending | 0 Skipped
Phase 1: Spec Compliance - PASS
All scope items verified:
internal/files/cover_extract.go:.cbrcase added toExtractCoverswitch (line 72)internal/files/cover_extract_cbr.go: new file,extractCBRCover+extractCBRCoverFromIteratorinternal/files/cover_extract_cbr_test.go: 9 Contexts covering all pathsinternal/files/cover_extract_test.go: portedDecodeXMLEntry+PickFirstPDFImageerror-path tests to Ginkgointernal/files/export_test.go: bridge file, re-exports onlyinternal/files/doc.go: CBR mentionedgo.mod/go.sum:github.com/nwaples/rardecode/v2 v2.2.3added as direct depPhase 2: Security Majors - BOTH FIXED
MAJOR 1 (entry-count cap):
extractCBRCoverFromIteratoratcover_extract_cbr.go:94breaks out of the loop after collectingmaxImageEntries(2000) entries. Guard fires afterappend, so at most 2000 entries are collected. Thevar(notconst) declaration with testability comment at line 19 is appropriate.MAJOR 2 (ParseCBR buffering):
ParseCBRis entirely absent from this branch. No ParseCBR symbol anywhere ininternal/files/. Correctly deferred to bookshelf-b3s1.Phase 2: Convention Checks - ALL PASS
cover_extract_internal_test.gois deleted (confirmed via diff)export_test.godeclarespackage files(correct bridge pattern), contains onlyvarre-exports (MaxImageEntries,DecodeXMLEntry,PickFirstPDFImage), zero test functionspackage files_test(external) incover_extract_cbr_test.goandcover_extract_test.gomaxImageEntriesdeclared asvarwith testability comment atcover_extract_cbr.go:15-19rarIteratorinterface remains - replaced byfunc() (rarFileHeader, io.Reader, error)closure patternFindings
[MINOR] internal/files/cover_extract.go:349 -
collectImageEntries(used byextractCBZCover) sorts lexicographically whileextractCBRCoverFromIteratorusesnaturalLess. Pre-existing inconsistency not introduced by this PR. No correctness impact for zero-padded filenames.[MINOR] PR description is stale - references "white-box internal tests via staticRARIterator" and "183 specs" from a prior iteration. Actual suite is 192 specs, no rarIterator interface exists. No correctness impact.
REVIEW VERDICT: 0 blocker, 0 major, 2 minor