fix(reader): CBX-aware reader + persistComic + hasReadableComic (bookshelf-b3s1) #285
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "bd-bookshelf-b3s1"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Three layered bugs fixed — all caused by checking file extension instead of the production schema fields (
book_type='CBX',archive_type='ZIP'|'RAR').Bug #1 — hasReadableComic (
internal/books/service.go):hasReadableCBZ→hasReadableComic; now checksbook_type='CBX'(case-insensitive)book_type='CBX'per Grimmory V91 migrationBug #2 — Reader handler (page count + serving):
rardecode/v2(already a dep from dct5)sync.Mapkeyed by archive pathServeComicPageHandlerdispatches onarchive_type:'RAR'→ rardecode, otherwise → ZIPComicPageCounterlikewise dispatches onarchive_typeGetPrimaryCBZFileRowandGetBookFileRownow includearchive_typeBug #3 — persistComic (
internal/library/scan/persist.go):.cbz) tobook_type='CBX'archiveTypeparameterlibrary/wire.godispatchescomic.ParseCBRvscomic.ParseCBZbased onarchive_typecomic.ParseCBRadded: streams RAR archive without buffering full file into RAMArchitecture choice (caching strategy)
For RAR page serving, went with option (c): in-memory cache. On first page request for a given archive path, all image pages are extracted and cached in a process-level
sync.Map. Subsequent requests serve from cache without reopening the archive. This is correct for the sequential-only rardecode/v2 API and avoids O(n²) behaviour for late-page reads.Test plan
internal/files/cbr_pages_test.go— CBRPageCount and ServeCBRPage: 2-page happy path, no-images, corrupt, corrupt-header (Next error), directory entry skipped, oversized entry (ErrCoverTooLarge), truncated body, cache behavior, open error, page out of rangeinternal/books/has_readable_comic_test.go— table test: true for CBX/ZIP, true for CBX/RAR, true for lowercase cbx, false for cbz, false for epub, false for emptyinternal/comic/comicinfo_cbr_test.go— ParseCBR: happy path (parses series/issue/writer/characters), no ComicInfo.xml → ErrNoComicInfo, open failure, corrupt RAR, corrupt header (Next error), directory entry skipped, size cap via internal testinternal/comic/comicinfo_internal_test.go— TestParseCBR_OversizedComicInfo covers the size-cap branchinternal/library/scan/persist_test.go— updated CBX trigger test + captures archiveType passed to callbackinternal/books/reader_handler_test.go— ServeComicPageHandler: ZIP happy path, RAR dispatch, RAR out-of-range, RAR generic error, path traversal, all ID validation pathsinternal/books/reader_service_test.go— ComicPageCounter: ZIP dispatch, RAR dispatch, library path error, zip open error, count error, traversal guardinternal/make testgreenmake lintcleanCloses bead bookshelf-b3s1 on merge.
Security Review — PR #285 (bd-bookshelf-b3s1, HEAD
52390462)Reviewed:
internal/files/cbr_pages.go,internal/comic/comicinfo.go,internal/books/reader_handler.go,internal/books/reader_service.go,internal/db/sqlc/books_extra.go,internal/library/scan/persist.goFindings
[MAJOR] internal/files/cbr_pages.go:60 — No entry-count cap when iterating RAR archive for images
loadCBRPagesiterates every entry in the archive looking for image files. A crafted RAR with millions of tiny non-image entries (e.g. text files, each well belowmaxCBRPageBytes) will loop indefinitely, holding an open file descriptor and consuming CPU per-entry CRC validation for the entire duration of the request. There is no bail-out after N entries. The analogous concern was flagged in the dct5 review forParseCBR(comicinfo scanning). Both paths need the same fix: amaxEntriesToScanconstant (e.g.const maxCBREntriesToScan = 5000) and a counter that returns an error orErrNoCBRImageswhen exceeded.Suggested fix:
The same fix should be applied to
ParseCBRininternal/comic/comicinfo.go:150.[MAJOR] internal/files/cbr_pages.go:82 — RAR entry filename leaked via
http.ServeContentname parameterServeCBRPagecallshttp.ServeContent(w, req, p.name, ...)wherep.nameis the raw RAR entry filename (e.g.../../../../etc/passwdor a crafted string). Whilehttp.ServeContentdoes not setContent-Dispositiondirectly, in Go ≥ 1.20 it usesnameto set aContent-Disposition: inline; filename=...header when the MIME type could be a download or when a redirect is triggered. More concretely:serveContentcallssetLastModifiedand may callhttp.Redirectwith the name embedded in the URL path component — if the client sends a request for?the redirect can include the rawname. This is a latent XSS/header-injection vector if the name contains special characters.The CBZ path (
internal/files/cbz.go:161) has the same pattern.Suggested fix: pass a synthetic name (e.g.
fmt.Sprintf("page%04d%s", n, ext)) derived from the index and extension, not the raw entry name.[MINOR] internal/comic/comicinfo.go:170 — ParseCBR: no entry-count cap for ComicInfo.xml scanning
ParseCBRiterates all RAR entries looking forComicInfo.xml. A crafted archive with 100,000 non-matching entries (all belowmaxComicInfoBytes) causes O(N) CPU work during library scan for every CBR file. Low severity because this runs once per book at scan time (not per HTTP request), but the samemaxEntriesToScanconstant from the MAJOR above should be applied here for consistency and defence in depth.Confirmed Clean
h.UnPackedSize > maxCBRPageBytesguard at line 75 +io.LimitReaderdefence-in-depth at line 78. Matches the dct5 cover cap. ✓rardecode.NewReader(f)receives thefs.Filedirectly — noio.ReadAllof the archive body. ✓decodeComicInfoReaderat line 172, also enforced viaLimitReaderinsidedecodeComicInfoReader. ✓comicinfo_cbr_test.go:167lowers cap to 1 byte viaMaxComicInfoBytesbridge and asserts error. ✓ServeComicPageHandlerandComicPageCounterperform thefilepath.Clean+HasPrefixguard before constructing the full path. ✓parseID(integer parse), not passed as a filename to the RAR reader. ✓GetBookFileandGetPrimaryCBZFileuse?parameterized queries. ✓rardecode.NewReaderuses default options; CRC validation is enabled by default in rardecode v2. ✓cbrPageCache(sync.Map) caches all pages on first load; subsequent page requests for the same archive are served from memory with no re-open. ✓REVIEW VERDICT: 0 blocker, 2 major, 1 minor
CODE REVIEW: NOT APPROVED
Phase 0: DEMO Verification
No DEMO block found in the bead description or PR body. Per review standard: NOT APPROVED — "No DEMO block provided". This is a bug-fix PR so a DEMO showing the fixed behavior is mandatory (e.g. a curl against a CBR book's reader endpoint, or the
make testoutput showing the new CBX-aware test cases passing). Without it, there is no verification the fixes actually apply to the production scenario.Phase 1: Spec Compliance — PASSES (all three bugs addressed)
hasReadableComic):strings.EqualFold(f.BookType, "cbx")correct. Handler call site (handler.go:560) updated.hasReadableis also updated to checkCBX. Export bridge inexport_test.gois re-export only.ComicPageCounterandServeComicPageHandlerboth dispatch onarchive_type;GetPrimaryCBZFileRowandGetBookFileRownow includearchive_type; wiring inbooks/wire.gois complete.persistComic): trigger changed tostrings.EqualFold(nb.sr.BookType, "CBX");archiveTypeparameter threaded through;library/wire.godispatchesParseCBRvsParseCBZ.RAR page-serving strategy (full extraction to process-level
sync.Mapcache) is documented.ParseCBRpassesf(anio.Reader) directly torardecode.NewReader— no full-archive buffer.maxComicInfoBytescap is present.export_test.gobridges contain onlyvar/constre-exports, no test functions.Phase 2: Code Quality
[BLOCKER] internal/comic/comicinfo.go:171 — uint64 cast of int64 UnPackedSize silently rejects valid ComicInfo.xml in unknown-size RAR entries
rardecode/v2'sFileHeader.UnPackedSizeisint64. Per the library source (archive15.go:283–285), whenUnKnownSize == true(solid archives, streaming RAR4),UnPackedSizeis set to-1. The code does:uint64(-1)is18446744073709551615— far above the 1 MB cap — so any RAR archive whose ComicInfo.xml entry hasUnKnownSize=truereturns an error even when the file is tiny and legitimate. In contrastcbr_pages.go:75correctly comparesh.UnPackedSize > maxCBRPageBytesusingint64arithmetic (where -1 is less than 50 MB and falls through safely). The fix: checkh.UnKnownSizefirst and skip the size pre-check when the size is unknown; rely solely ondecodeComicInfoReader'sLimitReadercap, which already handlessize=0correctly by defaulting tomaxComicInfoBytes.[MINOR] internal/books/service.go:862 — stale
strings.EqualFold(f.ArchiveType, "cbz")branch inhasReadable(pre-existing, not introduced by this PR)archive_typein the database is'ZIP'or'RAR'(confirmed ininternal/library/scan/booktype.go);'cbz'can never match. This branch is dead and misleading. It was not introduced by this PR (present inmain) so does not block merge, but should be filed as a follow-up.REVIEW VERDICT: 1 blocker, 0 major, 1 minor
The blocker (
uint64(-1)cast inParseCBR) will causecomic.ParseCBRto silently reject validComicInfo.xmlentries in solid/streaming RAR4 archives. Fix: guard withif !h.UnKnownSizebefore the size pre-check, or compare asint64(matchingcbr_pages.go's pattern).CODE REVIEW: NOT APPROVED
Phase 0: DEMO Verification
No DEMO block found in the bead description or PR body. Per review standard: NOT APPROVED — "No DEMO block provided". This is a bug-fix PR so a DEMO showing the fixed behavior is mandatory (e.g. a curl against a CBR book's reader endpoint, or the make test output showing the new CBX-aware test cases passing). Without it there is no verification the fixes actually apply to the production scenario.
Phase 1: Spec Compliance — PASSES (all three bugs addressed)
RAR page-serving strategy (full extraction to process-level sync.Map cache) is documented. ParseCBR passes f (an io.Reader) directly to rardecode.NewReader — no full-archive buffer. maxComicInfoBytes cap is present. export_test.go bridges contain only var/const re-exports, no test functions.
Phase 2: Code Quality
[BLOCKER] internal/comic/comicinfo.go:171 — uint64 cast of int64 UnPackedSize silently rejects valid ComicInfo.xml in unknown-size RAR entries
rardecode/v2 FileHeader.UnPackedSize is int64. Per the library source (archive15.go:283-285), when UnKnownSize == true (solid archives, streaming RAR4), UnPackedSize is set to -1. The code does:
uint64(-1) is 18446744073709551615 — far above the 1 MB cap — so any RAR archive whose ComicInfo.xml entry has UnKnownSize=true returns an error even when the file is tiny and legitimate. In contrast cbr_pages.go:75 correctly compares h.UnPackedSize > maxCBRPageBytes using int64 arithmetic (where -1 is less than 50 MB and falls through safely). Fix: check h.UnKnownSize first and skip the size pre-check when the size is unknown; rely solely on decodeComicInfoReader's LimitReader cap, which already handles size=0 correctly by defaulting to maxComicInfoBytes.
[MINOR] internal/books/service.go:862 — stale strings.EqualFold(f.ArchiveType, "cbz") branch in hasReadable (pre-existing, not introduced by this PR)
archive_type in the database is 'ZIP' or 'RAR' (confirmed in internal/library/scan/booktype.go); 'cbz' can never match. This branch is dead and misleading. It was not introduced by this PR (present in main) so does not block merge, but should be filed as a follow-up.
REVIEW VERDICT: 1 blocker, 0 major, 1 minor
The blocker (uint64(-1) cast in ParseCBR) will cause comic.ParseCBR to silently reject valid ComicInfo.xml entries in solid/streaming RAR4 archives. Fix: guard with if !h.UnKnownSize before the size pre-check, or compare as int64 (matching cbr_pages.go's pattern). Also: DEMO block is required per review standard.