feat(comicvine): port Grimmory full scoring + special-issue + filter-fallback (bookshelf-23na) #690
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "bd-bookshelf-23na"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Completes the multi-step ComicVine lookup to fully mirror Grimmory's
ComicvineBookParser.javaalgorithm. The core volume→issue structured path was already shipped in bookshelf-wzef (PR #414); this PR adds the remaining gaps identified in the bead spec:"X-Men Annual 5","Spawn Special 1","Batman One-Shot 3"(no#). The keyword is folded into the series name so volume search targets the correct annual/special volume."Batman #1 [Digital Edition]"is pre-cleaned before regex matching so year and series are correctly extracted./api/volumes/?filter=name:<series>fallback — when/search/?resources=volumereturns 0 results, a second/volumes/?filter=request is tried before giving up, mirroring Grimmory's two-step volume discovery.calculateVolumeScoreheuristics — scoring now includes: +50 exact name match, +25 name-contains, +20 count_of_issues ≥ requested issue, +10 major publisher, +5/+5/+5 recency bonuses for start_year ≥ 2000/2010/2020 (in addition to the existing +100 year-match and position bonus). Publisher field added tovolumeFieldList.Test plan
make testpasses (comicvine 100% coverage gate)make coveragepasses withcheck-coverage: OKScoreVolumesheuristics table, filter-fallback happy path, filter-fallback HTTP 5xx error, filter-fallback malformed JSON error, API key forwarding on/volumes/requestCloses bead bookshelf-23na on merge.
Security Review — PR #690 (bd-bookshelf-23na)
Scope: ComicVine multi-step volume lookup + caching —
internal/metadata/comicvine/provider.go,structured_test.go,export_test.go.1. ReDoS — new title-parsing regexes
All three new regexes are safe.
titleRE— pre-existing, unchanged. Anchored^...$with lazy(.*?)and literal#delimiter. RE2 linear.specialIssueRE— new. Anchored^...$, lazy(.*?), fixed keyword alternation(annual|special|one-?shot), simple digit groups[0-9]+(?:\.[0-9]+)?. No nested quantifiers, no overlapping alternations. RE2 linear.stripSuffixRE— new.[\[(][^)\]]*(?:keyword)[^)\]]*[\])]. The[^)\]]*class explicitly excludes the terminators)and], so there is no path for the engine to re-enter the bracket after the keyword. RE2 linear. Verified empirically: a 10,002-char crafted non-matching string completes in <1ms.Go's
regexppackage is RE2 — guaranteed linear-time — so catastrophic backtracking is architecturally impossible regardless of pattern shape.No length cap on
q.Titlebefore regex. The only guard isq.Title == "". For a personal self-hosted server with RE2 this carries no practical exploit risk, but aMaxTitleLenconstant (e.g. 512) would be belt-and-suspenders hygiene.2. URL / Query injection —
filter=name:<series>volumeFilterSearchconstructs:url.Values.Encode()percent-encodes the entire value, including&,?,=,,, CRLF, and the colon itself. A crafted series name likebatman&api_key=evilbecomesfilter=name%3Abatman%26api_key%3Devil— the injected fragment is data, not syntax. CRLF is similarly encoded to%0D%0A. No injection risk.This pattern is consistent with the pre-existing
fetchIssuesByFilteronmain(params.Set("filter", fmt.Sprintf("volume:%d,issue_number:%s", ...))+params.Encode()).3. API key handling
getAPIKey(ctx)— not hardcoded.redactURLErrors()walks the full*url.Errorchain and replacesapi_key=…withREDACTEDbefore any log or error-string emission. This applies to all new paths (volumeSearch,volumeFilterSearch) which both calldoGetWithRetry— the existing redaction covers them automatically.Info/Warncall in the new code."API key is sent on /volumes/ request") confirms the key is forwarded correctly.No API key leak risk in this diff.
4. SSRF
baseURLdefaults to the hardcoded constant"https://comicvine.gamespot.com/api"and is only overridden in tests viahttptest.NewServer. No user-supplied input flows into the host or scheme. No SSRF risk.5. Response parsing / cache DoS
doGetWithRetryusesio.LimitReader(resp.Body, maxBodyBytes)(2 MiB cap) beforeio.ReadAll. All new code paths (volumeSearch,volumeFilterSearch) share the samedoGetWithRetrycall — the cap is inherited automatically.json.Unmarshalon the capped buffer: 2 MiB of JSON cannot cause unbounded allocation with simple flat structs likevolumeSearchResult.volumeCache) is an unboundedmap[string]volumeCacheEntry. This is pre-existing onmain(15 references, TTL-based expiry, no max-entries eviction). This PR does not change the cache implementation. Not flagged as a new finding; tracking as a pre-existing minor.6. Error classification / domain boundary
No
go-workflowsorwfengineimports appear anywhere in the diff. Domain package returns plain errors (fmt.Errorf). Boundary maintained.Verdict
No new security issues introduced by this diff.
REVIEW VERDICT: 0 blocker, 0 major, 0 minor
CODE REVIEW — bookshelf-23na (PR #690)
Phase 0: DEMO Verification
No DEMO block is present in the PR body or bead comments. This is a library-only change (no CLI surface, no HTTP endpoint) so the test run is the operative verification. CI state: success (SHA
0225e89c). Proceeding under the library-only exception.Phase 1: Spec Compliance
All seven requirements from the bead description are implemented:
Annual N,Special N,One-Shot N) —specialIssueREat line 87, keyword folded into series name.stripSuffixREat line 93, applied before both regex paths./api/volumes/?filter=name:fallback —volumeFilterSearchfunction, called fromsearchVolumeswhen primary returns 0 results.calculateVolumeScoreheuristics —scoreVolumeswith +50/+25/+20/+10/+5×3 bonuses.volumeFieldList— correct.sync.MutexinvolumeCache.get/set.Phase 2: Code Quality
Regex correctness
titleRE (
provider.go:82): Anchored^…$, lazy(.*?), no nested quantifiers. Correctly handlesBatman #18 #2(lazy → last#), fractional issues (#1.5), letter suffixes (#18AU). No ReDoS risk.specialIssueRE (
provider.go:87): Anchored, fixed keyword alternation, no nested quantifiers. Correctly handlesX-Men Annual 5,Batman One-Shot 3, year suffix. No ReDoS risk.stripSuffixRE (
provider.go:93): NOT anchored —ReplaceAlltries every character position. For input with an unmatched[or(and no closing bracket, the engine tries O(n) start positions with up to O(n) backtrack each → O(n²). In practice, comic title lengths are bounded (<200 chars) and the input is not adversarial, so the practical risk is negligible. Note for awareness only.Multi-step search correctness
searchStructured(provider.go:390):cachedVolumeSearch→scoreVolumes→ per-volumefetchIssuesByFilter→ early-stop onyearMatch. ThenormalizeIssueNumber(issueNum)call is correctly moved beforescoreVolumessonormis passed to scoring. The fallback tosearchFreeTextwhen structured returns empty is correct.searchVolumessplit intovolumeSearch+volumeFilterSearch: the fallback path is exercised by tests withatomic.Bool. Cache stores the empty-result case correctly (prevents thundering-herd re-queries for known-barren series).Sort determinism
scoreVolumesuses insertion sort with strict>comparison — equal scores preserve original array order (stable). WithvolumeSearchLimit=25max results from primary and 20 from fallback, all volumes have different position bonuses (25−i), so ties only occur if two volumes are at position ≥ 25 (impossible from primary) or if all other scores are also identical. Effectively deterministic. No flake risk.Cache
strings.ToLower(strings.TrimSpace(series))— correct normalization.sync.Mutexinget/set.mainalready); the new code does not make it worse. Not a blocker.structured_test.go:493) usesatomic.Int32to count volume search calls, assertsEqual(int32(1))after two same-series queries. Correctly proves the 2nd call makes no HTTP volume request. ✓Tests
httptest.NewServer— correct, no interface mocks.ScoreVolumestable covers all seven bonus types individually."X-Men Annual 5") covers the full path to issue result.JustBeforeEachresults used correctly).Findings
[MINOR]
internal/metadata/comicvine/provider.go(scoreVolumes) — Year-match bonus uses exact string equality; bead spec says ±1 toleranceBead description (referencing Grimmory): "+100 if start_year matches extracted year (±1)". Implementation uses
v.StartYear == year(exact string match). A title with year 2012 vs a volume withstart_year=2011gets 0 instead of +100. Degrades ranking quality for off-by-one years but does not select the wrong series. Follow-up bead if desired.[MINOR]
internal/metadata/comicvine/provider.go:93(stripSuffixRE) — stripSuffixRE drops the year when it is embedded inside the digital parentheticalInput
"Batman #1 (2016 Digital Edition)"→ strip removes the entire(2016 Digital Edition)group → cleaned"Batman #1"→ year parsed as"". The inline comment says this givesyear="2016"but that is only true when the year appears outside the bracket:"Batman #1 [Digital Edition] (2016)". The behavior is acceptable (falls back to unbosted scoring) but the comment is misleading and no test covers the year-embedded variant.[MINOR] PR #690 description — No DEMO block
Library-only change with no runnable CLI/HTTP surface; CI green is the operative verification. Process note only.
REVIEW VERDICT: 0 blocker, 0 major, 3 minor
0225e89cdfaa8ffec09baa8ffec09b022bfe6561