fix(comics): lazy detail-fetch for bulk enrich (bookshelf-bkzy) #906
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "bd-bookshelf-bkzy"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
enrichComicDetail func(ctx, Metadata) (Metadata, error)parameter toRefetchMetadataso the bulk EnrichWorkflow path can fetch rich comic metadata (writers, cover artists, characters, teams) from the provider detail endpoint after selecting the winning search candidate.wfengine.NewEnrichComicDetailFunc(fetchDetail, fetchMetronDetail)inbuildEnrichDepswith a sharedcvHourlyLimiter(prevents double-counting the ComicVine hourly quota). The HTTP interactive path passesnil(it fetches detail inline in the metadata handler).Test plan
Describeblocks inmetadata_service_test.go: comic path calls detail fetch and persists Writers/CoverArtists/Characters; ebook path does NOT call detail fetch; error from detail fetch propagated with book context.make coveragepasses locally).Closes bead bookshelf-bkzy on merge.
Workflow Detail page screenshot (wf-detail-older-execution)
Older completed ContinueAsNew epoch detail — execution ID and state visible, Cancel absent.
Recompute Match Score — kebab open screenshot (recompute-match-score-kebab-open)
Security Review — PR #906 (bookshelf-bkzy)
Scope: detail-fetch URL construction (SSRF), untrusted-response handling, arch boundary, permanent-vs-retryable error classification, secrets in logs.
[MAJOR] internal/wfengine/simple_workflows.go:154-160 — isPermanentEnrichErr omits metron.ErrIssueNotFound, causing 5 retries on a permanent 404
When the new Metron detail-fetch path (added by this PR via
NewEnrichComicDetailFunc) receives HTTP 404 from/issue/{id}/, the Metron package correctly returnsmetron.ErrIssueNotFound— a plain sentinel that signals "this ID does not exist and will never exist." That sentinel propagates throughNewEnrichComicDetailFunc→enrichComicDetail(ctx, winner)inRefetchMetadata→ causesEnrichActivities.Refetchto fail.isPermanentEnrichErris the only mapping point that wraps domain sentinels togowf.NewPermanentError, and it does NOT includemetron.ErrIssueNotFound. The activity is therefore retried up to 5× (enrich5Attempts: 30 s → 10 min backoff), each retry re-running the full provider search AND the detail endpoint, burning Metron quota slots for something that cannot succeed. A similar gap exists for ComicVine 404s —comicvine.fetchIssueDetailreturnsfmt.Errorf("comicvine issue: HTTP 404")on a 404 with no sentinel, so those also retry 5×.Fix: Add
errors.Is(err, metron.ErrIssueNotFound)toisPermanentEnrichErr. For ComicVine, addvar ErrIssueNotFound = errors.New("comicvine: issue not found"), return it fromfetchIssueDetailon HTTP 404, and add it toisPermanentEnrichErr. Both additions need tests in their respective provider packages.Cleared items
SSRF: URL construction is safe. ComicVine detail URL:
fmt.Sprintf("%s/issue/4000-%d/?%s", baseURL, issueID, params.Encode())— host iscfg.MetadataProviderComicVineBaseURL(fixed config), ID isint64from the provider search result, formatted as a plain integer. Metron:fmt.Sprintf("%s/issue/%d/", d.baseURL, issueID)— same pattern. Neither host nor path segment is attacker-controllable.Untrusted response size: Both providers apply
io.LimitReader(resp.Body, maxBodyBytes)wheremaxBodyBytes = 2 MiB. No OOM risk from oversized detail responses.Untrusted response content: Description fields are stripped of HTML tags (
stripHTML/htmlTagRE) before storage, andhtml/templateauto-escapes at render time. No stored-XSS path.Arch boundary:
internal/books/metadata_service.gotakesenrichComicDetailas a plainfunc(context.Context, metadata.Metadata) (metadata.Metadata, error)— no import of go-workflows. The wfengine adapter (NewEnrichComicDetailFunc,EnrichComicDetail) is the only layer that touches go-workflows types. Clean.Secrets/PII in logs: ComicVine API key is passed as a query parameter and is redacted from all
*url.Errorchains viaredactURLErrorsbefore any log write. Response bodies go throughmetadata.RedactBody. Metron credentials are HTTP Basic Auth (Authorization header), never in the URL and never logged. Clean.ExternalID validation:
NewEnrichComicDetailFuncguardsm.ExternalID == 0before making any request. IDs originate from the provider API (not user input). Integer-only formatting in URL construction.REVIEW VERDICT: 0 blocker, 1 major, 0 minor
CODE REVIEW: NOT APPROVED
Phase 0 — DEMO Verification
No runnable DEMO block found in the bead or PR description. The PR test plan lists checkboxes but provides no command with expected output to re-run and verify. Per review policy this alone blocks approval. The code-quality [MAJOR] below also blocks independently.
Phase 1 — Spec Compliance
Phase 2 — Code Quality
Verdict
NOT APPROVED. The [MAJOR] must be fixed before merge: add metron.ErrIssueNotFound to isPermanentEnrichErr in internal/wfengine/simple_workflows.go and update the sentinel inventory comment. The [MINOR] does not block but should be cleaned up in the same commit.
Recompute Match Score — kebab open screenshot (recompute-match-score-kebab-open)
Workflow Detail page screenshot (wf-detail-older-execution)
Older completed ContinueAsNew epoch detail — execution ID and state visible, Cancel absent.
84469273393f47638036Workflow Detail page screenshot (wf-detail-older-execution)
Older completed ContinueAsNew epoch detail — execution ID and state visible, Cancel absent.
Recompute Match Score — kebab open screenshot (recompute-match-score-kebab-open)