refactor(metadata): extract shared provider helpers and split comicvine god-file (bookshelf-s7kr) #909
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "bd-bookshelf-s7kr"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
ParseRetryAfter,SleepWithContext,StripHTML,ParseDate,ParseComicIssueNumber,ParseComicTitleComponents) from all 6 metadata providers intointernal/metadata/provider_helpers.go, eliminating per-package behavioral drift (capping, date formats, HTML stripping)metadata.MaxRetryAfterWait = 60sas a single canonical cap applied inParseRetryAfter, covering the previously-uncappedopenlibraryprovidercomicvine/provider.gogod-file into 7 focused files:types.go,cache.go,scoring.go,decode.go,search.go,mapping.go,provider.govolumeCacheaudnexusandcomicvinefrom racy package-level var sleep hooks to constructor injection (nil→metadata.SleepWithContext)parseTitleComponentsonto comicvine's richer version asmetadata.ParseComicTitleComponents(used by both comicvine and metron)Test plan
make test— all unit tests passmake coverage— zero uncovered statement blocksmake lint— zero issues in this worktree's packagesCloses bead bookshelf-s7kr on merge.
Workflow Detail page screenshot (wf-detail-older-execution)
Older completed ContinueAsNew epoch detail — execution ID and state visible, Cancel absent.
Recompute Match Score — kebab open screenshot (recompute-match-score-kebab-open)
Security Review — bookshelf-s7kr
Scope: shared metadata-provider HTTP helpers, ComicVine split, volumeCache bound.
[MINOR] internal/metadata/metron/provider.go:70-76 — stale
Optionsstruct comment references removed exportsThe
Optionsstruct doc says "Defaults to DefaultRetrySleep when nil" and "Defaults to DefaultBackoffSleep when nil", but bothDefaultRetrySleepandDefaultBackoffSleepwere removed by this PR. The actual default is nowmetadata.SleepWithContext. No runtime or security impact, but the comment will mislead the next developer looking for those symbols.Fix: update both comment lines to read "Defaults to metadata.SleepWithContext when nil."
Checked (no findings):
Redaction regression: All 6 providers preserve credential redaction after centralisation.
redactURLErrors+redactAPIKeyin the newdecode.gomutates*url.Error.URLin-place before anylogger.Warnorfmt.Errorfwrapping call, confirmed by the existingredact_test.goend-to-end log/error-string assertions.redactKey(local, unchanged) redacts the?key=param on every log line.Authorizationheader, never in the URL;*url.Error.URLcontains no credential.req.SetBasicAuth(header, not URL); Metron URLs carry no credential in query params;attemptGetlogs"error", errwhose*url.Error.URLis credential-free.parseRetryAfter / body cap:
metadata.ParseRetryAftercaps atMaxRetryAfterWait(60 s) in float space before converting totime.Duration, preventing the pre-existing overflow bypass on very large values (e.g."1e300"). Previously metron and openlibrary had NO cap at all onRetry-Aftersleep; this PR fixes that.io.LimitReaderis present on every body-read path (success, 429, 5xx) for all six providers.volumeCache bound:
set()checkslen(c.entries) >= c.maxEntriesunder the lock and evicts one entry (expired preferred, otherwise arbitrary) before inserting, so the map size is always<= maxEntries(512 default). Cannot be defeated by unique keys: each new unique series name causes one eviction. Thread-safe viasync.Mutex. Memory ceiling ~1 MiB (512 entries × ~1-2 KiB).SSRF: All six provider base URLs are compile-time constants or constructor-injected config values. No user-supplied host or path component reaches
http.NewRequestWithContext. User data (title, series, ISBN) appears only as URL-encoded query parameter values, never as host or path segments.Architecture boundary: No workflow-engine (
go-workflows/wfengine) import in any of the touched domain packages (internal/metadata/*,internal/metadata/comicvine/*). Verified viaimportsections in all changed files.Multi-user scoping: N/A — metadata provider packages hold no per-user state and issue no user-scoped DB queries.
REVIEW VERDICT: 0 blocker, 0 major, 1 minor
CODE REVIEW: bookshelf-s7kr / PR #909
Reviewed diff
origin/main...origin/bd-bookshelf-s7kr. CI is the behavioral source of truth; tests were not re-run locally.Phase 0: DEMO verification
No explicit DEMO block exists (pure internal refactor; CI green + 100% coverage maintained is the behavioral proof). Proceeding to code review.
Phase 1 & 2: Findings
[MAJOR] internal/metadata/comicvine/export_test.go — SetCacheEntryExpired accesses unexported struct fields
SetCacheEntryExpired(added in this PR) directly callsc.mu.Lock()and writes toc.entries— both unexported fields ofvolumeCache. This is white-box access in violation of project policy ("A review MUST flag, as at least a [MAJOR], any test file that references an unexported symbol of the package under test"). The rest of the cache-test exports (type alias, thin wrappers over exported-for-test methods) are acceptable, but this one bypasses encapsulation to force an already-expired entry into the internal map.Fix: inject a
now func() time.Timeparameter intonewVolumeCacheand use it in bothset()andget(). Tests pass a clock that returns an already-past time, makingSetCacheEntryExpiredunnecessary and removing the unexported-field access entirely. The injected clock has no production overhead (nil-guarded totime.Now).[MAJOR] Rebase hazard — bkzy (#906, already merged to main) added 404 handling that s7kr's file-split will conflict with
bkzy(commit3f476380, merged via5fbfd4b1) added tocomicvine/provider.go:var ErrIssueNotFoundandvar errHTTP404sentinelserrHTTP404return insidedoGetWithRetryerrors.Is(err, errHTTP404)→ErrIssueNotFoundmapping insidefetchIssueDetails7kr moves
doGetWithRetrytodecode.goandfetchIssueDetailtosearch.go. The s7kr branch was forked fromdc5210b7(before bkzy landed), so s7kr does NOT contain bkzy's 404 handling. Currentdecode.go:doGetWithRetryreturns a generic"HTTP 404"error — not theerrHTTP404sentinel — sofetchIssueDetailinsearch.gocannot map it toErrIssueNotFound.When s7kr is rebased onto current main, git will produce conflicts at
provider.go(bkzy's additions meet s7kr's wholesale deletion of the same file). If the merge supervisor resolves by accepting s7kr's version of those files, bkzy's 404 handling is silently dropped: stale issue IDs will retry 3× at backoff instead of failing fast (isPermanentEnrichErrin wfengine won't match).The merge supervisor must explicitly:
errHTTP404sentinel intodecode.go.StatusNotFound→errHTTP404branch todecode.go:doGetWithRetry(between the 5xx block and the body-read).ErrIssueNotFoundsentinel +errors.Is(err, errHTTP404)check intosearch.go:fetchIssueDetail.ErrIssueNotFoundexported from thecomicvinepackage sowfenginecan reference it.[MINOR] internal/metadata/comicvine/decode.go:27 — redundant local constant duplicates metadata.MaxRetryAfterWait
defaultOn429 = 60 * time.Secondis declared locally indecode.go.metadata.MaxRetryAfterWaithas the same value and the same semantic (cap/fallback for 429/420 Retry-After). Line 99 (retryAfter = defaultOn429) should useretryAfter = metadata.MaxRetryAfterWaitdirectly;defaultOn429can then be dropped. Not a blocker.Positive findings (no action needed)
parseTitleComponentsomitted both, so the title "Batman #1 (2016 Digital Edition)" would return ("", "", "") in metron (no match). The new sharedParseComicTitleComponentsstrips the suffix and returns ("Batman", "1", ""). Improvement in all cases examined.ParseRetryAftercaps in float space before conversion, eliminating the overflow attack surface.var sleepWithContext/var retryAfterSleepracy package-level vars in comicvine or audnexus. Constructor injection is clean.SetCacheEntryExpired, adds a live entry, then overflows — and asserts the expired key is absent and the live key survives. Not vacuous..golangci.ymlnever enabledfunlen/gocyclo/nestiflinters, so there are no per-function grandfather exclusion entries to remove. Criterion satisfied vacuously.package comicvine; tests inpackage comicvine_test; no import cycles;provider_helpers_test.goinpackage metadata_test. File boundaries (types / cache / scoring / decode / search / mapping / provider) are logical and single-responsibility.REVIEW VERDICT: 0 blocker, 2 major, 1 minor
cf16868ab94ffb80b788Recompute Match Score — kebab open screenshot (recompute-match-score-kebab-open)
Workflow Detail page screenshot (wf-detail-older-execution)
Older completed ContinueAsNew epoch detail — execution ID and state visible, Cancel absent.
Recompute Match Score — kebab open screenshot (recompute-match-score-kebab-open)
Workflow Detail page screenshot (wf-detail-older-execution)
Older completed ContinueAsNew epoch detail — execution ID and state visible, Cancel absent.
d12b123490780fc17ab8