fix(comics): lazy detail-fetch for bulk enrich (bookshelf-bkzy) #906

Merged
zombor merged 2 commits from bd-bookshelf-bkzy into main 2026-07-03 18:55:45 +00:00
Owner

Summary

  • Adds enrichComicDetail func(ctx, Metadata) (Metadata, error) parameter to RefetchMetadata so the bulk EnrichWorkflow path can fetch rich comic metadata (writers, cover artists, characters, teams) from the provider detail endpoint after selecting the winning search candidate.
  • Previously, the bulk path populated only shallow comic fields (series name, issue number, cover date) from the search result; the detail endpoint was only hit in the HTTP interactive path.
  • Wires wfengine.NewEnrichComicDetailFunc(fetchDetail, fetchMetronDetail) in buildEnrichDeps with a shared cvHourlyLimiter (prevents double-counting the ComicVine hourly quota). The HTTP interactive path passes nil (it fetches detail inline in the metadata handler).

Test plan

  • New Describe blocks in metadata_service_test.go: comic path calls detail fetch and persists Writers/CoverArtists/Characters; ebook path does NOT call detail fetch; error from detail fetch propagated with book context.
  • 100% coverage maintained (make coverage passes locally).
  • Lint clean on changed packages.

Closes bead bookshelf-bkzy on merge.

## Summary - Adds `enrichComicDetail func(ctx, Metadata) (Metadata, error)` parameter to `RefetchMetadata` so the bulk EnrichWorkflow path can fetch rich comic metadata (writers, cover artists, characters, teams) from the provider detail endpoint after selecting the winning search candidate. - Previously, the bulk path populated only shallow comic fields (series name, issue number, cover date) from the search result; the detail endpoint was only hit in the HTTP interactive path. - Wires `wfengine.NewEnrichComicDetailFunc(fetchDetail, fetchMetronDetail)` in `buildEnrichDeps` with a shared `cvHourlyLimiter` (prevents double-counting the ComicVine hourly quota). The HTTP interactive path passes `nil` (it fetches detail inline in the metadata handler). ## Test plan - [ ] New `Describe` blocks in `metadata_service_test.go`: comic path calls detail fetch and persists Writers/CoverArtists/Characters; ebook path does NOT call detail fetch; error from detail fetch propagated with book context. - [ ] 100% coverage maintained (`make coverage` passes locally). - [ ] Lint clean on changed packages. Closes bead bookshelf-bkzy on merge.
fix(comics): lazy detail-fetch for bulk enrich (bookshelf-bkzy)
All checks were successful
/ JS Unit Tests (pull_request) Successful in 1m26s
/ E2E API (pull_request) Successful in 2m3s
/ Lint (pull_request) Successful in 2m35s
/ Integration (pull_request) Successful in 2m43s
/ E2E Browser (pull_request) Successful in 3m31s
/ Test (pull_request) Successful in 3m49s
43621c1af4
Add enrichComicDetail func param to RefetchMetadata so the bulk-enrich
path fetches rich comic data (writers, cover artists, characters, teams)
from the provider detail endpoint after selecting the winning candidate.

Previously, RefetchMetadata populated only shallow comic fields from the
provider SEARCH result (series name, issue number, cover date). The rich
creator/character data is only available via a separate DETAIL request.

Wire NewEnrichComicDetailFunc in buildEnrichDeps (shared cvHourlyLimiter
prevents double-counting the ComicVine hourly quota). Pass nil in wire.go
for the HTTP interactive path (which fetches detail inline in the handler).

Closes bead bookshelf-bkzy on merge.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Workflow Detail page screenshot (wf-detail-older-execution)

Older completed ContinueAsNew epoch detail — execution ID and state visible, Cancel absent.

wf-detail-older-execution

**Workflow Detail page screenshot** (wf-detail-older-execution) Older completed ContinueAsNew epoch detail — execution ID and state visible, Cancel absent. ![wf-detail-older-execution](/attachments/17bd4a9b-5e42-4335-ab9f-b72c32ec6438)

Recompute Match Score — kebab open screenshot (recompute-match-score-kebab-open)

recompute-match-score-kebab-open

**Recompute Match Score — kebab open screenshot** (recompute-match-score-kebab-open) ![recompute-match-score-kebab-open](/attachments/3bb05b6c-8c50-42d6-a7b3-ec98dda2f2f0)
Author
Owner

Security Review — PR #906 (bookshelf-bkzy)

Scope: detail-fetch URL construction (SSRF), untrusted-response handling, arch boundary, permanent-vs-retryable error classification, secrets in logs.


[MAJOR] internal/wfengine/simple_workflows.go:154-160 — isPermanentEnrichErr omits metron.ErrIssueNotFound, causing 5 retries on a permanent 404

When the new Metron detail-fetch path (added by this PR via NewEnrichComicDetailFunc) receives HTTP 404 from /issue/{id}/, the Metron package correctly returns metron.ErrIssueNotFound — a plain sentinel that signals "this ID does not exist and will never exist." That sentinel propagates through NewEnrichComicDetailFuncenrichComicDetail(ctx, winner) in RefetchMetadata → causes EnrichActivities.Refetch to fail. isPermanentEnrichErr is the only mapping point that wraps domain sentinels to gowf.NewPermanentError, and it does NOT include metron.ErrIssueNotFound. The activity is therefore retried up to 5× (enrich5Attempts: 30 s → 10 min backoff), each retry re-running the full provider search AND the detail endpoint, burning Metron quota slots for something that cannot succeed. A similar gap exists for ComicVine 404s — comicvine.fetchIssueDetail returns fmt.Errorf("comicvine issue: HTTP 404") on a 404 with no sentinel, so those also retry 5×.

Fix: Add errors.Is(err, metron.ErrIssueNotFound) to isPermanentEnrichErr. For ComicVine, add var ErrIssueNotFound = errors.New("comicvine: issue not found"), return it from fetchIssueDetail on HTTP 404, and add it to isPermanentEnrichErr. Both additions need tests in their respective provider packages.


Cleared items

SSRF: URL construction is safe. ComicVine detail URL: fmt.Sprintf("%s/issue/4000-%d/?%s", baseURL, issueID, params.Encode()) — host is cfg.MetadataProviderComicVineBaseURL (fixed config), ID is int64 from the provider search result, formatted as a plain integer. Metron: fmt.Sprintf("%s/issue/%d/", d.baseURL, issueID) — same pattern. Neither host nor path segment is attacker-controllable.

Untrusted response size: Both providers apply io.LimitReader(resp.Body, maxBodyBytes) where maxBodyBytes = 2 MiB. No OOM risk from oversized detail responses.

Untrusted response content: Description fields are stripped of HTML tags (stripHTML/htmlTagRE) before storage, and html/template auto-escapes at render time. No stored-XSS path.

Arch boundary: internal/books/metadata_service.go takes enrichComicDetail as a plain func(context.Context, metadata.Metadata) (metadata.Metadata, error) — no import of go-workflows. The wfengine adapter (NewEnrichComicDetailFunc, EnrichComicDetail) is the only layer that touches go-workflows types. Clean.

Secrets/PII in logs: ComicVine API key is passed as a query parameter and is redacted from all *url.Error chains via redactURLErrors before any log write. Response bodies go through metadata.RedactBody. Metron credentials are HTTP Basic Auth (Authorization header), never in the URL and never logged. Clean.

ExternalID validation: NewEnrichComicDetailFunc guards m.ExternalID == 0 before making any request. IDs originate from the provider API (not user input). Integer-only formatting in URL construction.


REVIEW VERDICT: 0 blocker, 1 major, 0 minor

## Security Review — PR #906 (bookshelf-bkzy) **Scope:** detail-fetch URL construction (SSRF), untrusted-response handling, arch boundary, permanent-vs-retryable error classification, secrets in logs. --- [MAJOR] internal/wfengine/simple_workflows.go:154-160 — isPermanentEnrichErr omits metron.ErrIssueNotFound, causing 5 retries on a permanent 404 When the new Metron detail-fetch path (added by this PR via `NewEnrichComicDetailFunc`) receives HTTP 404 from `/issue/{id}/`, the Metron package correctly returns `metron.ErrIssueNotFound` — a plain sentinel that signals "this ID does not exist and will never exist." That sentinel propagates through `NewEnrichComicDetailFunc` → `enrichComicDetail(ctx, winner)` in `RefetchMetadata` → causes `EnrichActivities.Refetch` to fail. `isPermanentEnrichErr` is the only mapping point that wraps domain sentinels to `gowf.NewPermanentError`, and it does NOT include `metron.ErrIssueNotFound`. The activity is therefore retried up to 5× (enrich5Attempts: 30 s → 10 min backoff), each retry re-running the full provider search AND the detail endpoint, burning Metron quota slots for something that cannot succeed. A similar gap exists for ComicVine 404s — `comicvine.fetchIssueDetail` returns `fmt.Errorf("comicvine issue: HTTP 404")` on a 404 with no sentinel, so those also retry 5×. Fix: Add `errors.Is(err, metron.ErrIssueNotFound)` to `isPermanentEnrichErr`. For ComicVine, add `var ErrIssueNotFound = errors.New("comicvine: issue not found")`, return it from `fetchIssueDetail` on HTTP 404, and add it to `isPermanentEnrichErr`. Both additions need tests in their respective provider packages. --- ### Cleared items **SSRF:** URL construction is safe. ComicVine detail URL: `fmt.Sprintf("%s/issue/4000-%d/?%s", baseURL, issueID, params.Encode())` — host is `cfg.MetadataProviderComicVineBaseURL` (fixed config), ID is `int64` from the provider search result, formatted as a plain integer. Metron: `fmt.Sprintf("%s/issue/%d/", d.baseURL, issueID)` — same pattern. Neither host nor path segment is attacker-controllable. **Untrusted response size:** Both providers apply `io.LimitReader(resp.Body, maxBodyBytes)` where `maxBodyBytes = 2 MiB`. No OOM risk from oversized detail responses. **Untrusted response content:** Description fields are stripped of HTML tags (`stripHTML`/`htmlTagRE`) before storage, and `html/template` auto-escapes at render time. No stored-XSS path. **Arch boundary:** `internal/books/metadata_service.go` takes `enrichComicDetail` as a plain `func(context.Context, metadata.Metadata) (metadata.Metadata, error)` — no import of go-workflows. The wfengine adapter (`NewEnrichComicDetailFunc`, `EnrichComicDetail`) is the only layer that touches go-workflows types. Clean. **Secrets/PII in logs:** ComicVine API key is passed as a query parameter and is redacted from all `*url.Error` chains via `redactURLErrors` before any log write. Response bodies go through `metadata.RedactBody`. Metron credentials are HTTP Basic Auth (Authorization header), never in the URL and never logged. Clean. **ExternalID validation:** `NewEnrichComicDetailFunc` guards `m.ExternalID == 0` before making any request. IDs originate from the provider API (not user input). Integer-only formatting in URL construction. --- REVIEW VERDICT: 0 blocker, 1 major, 0 minor
Author
Owner

CODE REVIEW: NOT APPROVED

Phase 0 — DEMO Verification

No runnable DEMO block found in the bead or PR description. The PR test plan lists checkboxes but provides no command with expected output to re-run and verify. Per review policy this alone blocks approval. The code-quality [MAJOR] below also blocks independently.


Phase 1 — Spec Compliance

  • Arch boundary: CLEAN. internal/books/metadata_service.go imports only internal/metadata and stdlib — no workflow engine import. The enrichComicDetail parameter is a plain func(context.Context, metadata.Metadata) (metadata.Metadata, error); wfengine.NewEnrichComicDetailFunc is wired only in internal/app/build_extended_deps.go. Boundary holds.
  • Scope guard: isComicQuery && enrichComicDetail != nil correctly bypasses ebooks and the HTTP interactive path (passes nil in internal/books/wire.go:267).
  • Rate limiting / fan-out: cvHourlyLimiter is shared between the ComicVine search provider and fetchDetail, preventing double-counting. Metron reuses the existing DB rate_limit.metron row via waitFor. No new unbounded fan-out; the detail call is serial within the existing bounded per-book activity.
  • Winner update before merge: winner = enriched and allResults[winnerID] = enriched both happen before MergeMetadataByFieldPriority, so the merged output correctly carries the enriched fields.

Phase 2 — Code Quality

[MAJOR] internal/wfengine/simple_workflows.go:154 — isPermanentEnrichErr missing metron.ErrIssueNotFound
  This PR is the first to route Metron detail-fetch errors through the
  EnrichActivities.Refetch -> isPermanentEnrichErr code path (before this
  PR, enrichComicDetail was nil on the bulk path and was never called).
  internal/metadata/metron/provider.go:66-69 explicitly defines ErrIssueNotFound
  as a permanent sentinel (HTTP 404 on the detail endpoint) and its doc-comment
  says "The wfengine adapter maps ErrIssueNotFound to workflow.NewPermanentError"
  — but that mapping is absent from isPermanentEnrichErr.
  With this PR in place, if bulk enrich hits a Metron issue returning 404
  (deleted/stale issue ID from the search index), the workflow retries 5x at
  increasing backoff before failing — burning fan-out concurrency slots on a
  condition that can never change.
  Fix: add errors.Is(err, metron.ErrIssueNotFound) to isPermanentEnrichErr
  and update the sentinel inventory doc-comment to list it.

[MINOR] internal/books/metadata_service_test.go:~2321 — two Expects in one It
  The It("persists writers from the detail result") block has a nil guard
  Expect(persistedComic).NotTo(BeNil()) followed by the real assertion —
  two Expects, violating the one-Expect-per-It convention.
  The sibling It blocks (cover artists, characters) lack the nil guard and
  would silently panic if persistedComic were nil, making protection inconsistent.
  Fix: use Expect(persistedComic.Writers, err).To(ConsistOf("Chris Condon"))
  which asserts err==nil and the field in a single Expect, matching the
  multi-actual pattern.

Verdict

REVIEW VERDICT: 0 blocker, 1 major, 1 minor

NOT APPROVED. The [MAJOR] must be fixed before merge: add metron.ErrIssueNotFound to isPermanentEnrichErr in internal/wfengine/simple_workflows.go and update the sentinel inventory comment. The [MINOR] does not block but should be cleaned up in the same commit.

## CODE REVIEW: NOT APPROVED ### Phase 0 — DEMO Verification No runnable DEMO block found in the bead or PR description. The PR test plan lists checkboxes but provides no command with expected output to re-run and verify. Per review policy this alone blocks approval. The code-quality [MAJOR] below also blocks independently. --- ### Phase 1 — Spec Compliance - **Arch boundary**: CLEAN. internal/books/metadata_service.go imports only internal/metadata and stdlib — no workflow engine import. The enrichComicDetail parameter is a plain func(context.Context, metadata.Metadata) (metadata.Metadata, error); wfengine.NewEnrichComicDetailFunc is wired only in internal/app/build_extended_deps.go. Boundary holds. - **Scope guard**: isComicQuery && enrichComicDetail != nil correctly bypasses ebooks and the HTTP interactive path (passes nil in internal/books/wire.go:267). - **Rate limiting / fan-out**: cvHourlyLimiter is shared between the ComicVine search provider and fetchDetail, preventing double-counting. Metron reuses the existing DB rate_limit.metron row via waitFor. No new unbounded fan-out; the detail call is serial within the existing bounded per-book activity. - **Winner update before merge**: winner = enriched and allResults[winnerID] = enriched both happen before MergeMetadataByFieldPriority, so the merged output correctly carries the enriched fields. --- ### Phase 2 — Code Quality ``` [MAJOR] internal/wfengine/simple_workflows.go:154 — isPermanentEnrichErr missing metron.ErrIssueNotFound This PR is the first to route Metron detail-fetch errors through the EnrichActivities.Refetch -> isPermanentEnrichErr code path (before this PR, enrichComicDetail was nil on the bulk path and was never called). internal/metadata/metron/provider.go:66-69 explicitly defines ErrIssueNotFound as a permanent sentinel (HTTP 404 on the detail endpoint) and its doc-comment says "The wfengine adapter maps ErrIssueNotFound to workflow.NewPermanentError" — but that mapping is absent from isPermanentEnrichErr. With this PR in place, if bulk enrich hits a Metron issue returning 404 (deleted/stale issue ID from the search index), the workflow retries 5x at increasing backoff before failing — burning fan-out concurrency slots on a condition that can never change. Fix: add errors.Is(err, metron.ErrIssueNotFound) to isPermanentEnrichErr and update the sentinel inventory doc-comment to list it. [MINOR] internal/books/metadata_service_test.go:~2321 — two Expects in one It The It("persists writers from the detail result") block has a nil guard Expect(persistedComic).NotTo(BeNil()) followed by the real assertion — two Expects, violating the one-Expect-per-It convention. The sibling It blocks (cover artists, characters) lack the nil guard and would silently panic if persistedComic were nil, making protection inconsistent. Fix: use Expect(persistedComic.Writers, err).To(ConsistOf("Chris Condon")) which asserts err==nil and the field in a single Expect, matching the multi-actual pattern. ``` --- ### Verdict ``` REVIEW VERDICT: 0 blocker, 1 major, 1 minor ``` **NOT APPROVED.** The [MAJOR] must be fixed before merge: add metron.ErrIssueNotFound to isPermanentEnrichErr in internal/wfengine/simple_workflows.go and update the sentinel inventory comment. The [MINOR] does not block but should be cleaned up in the same commit.
fix(wfengine): classify metron+comicvine 404s as permanent enrich errors
All checks were successful
/ JS Unit Tests (pull_request) Successful in 1m31s
/ E2E API (pull_request) Successful in 2m26s
/ Lint (pull_request) Successful in 5m4s
/ Integration (pull_request) Successful in 4m19s
/ E2E Browser (pull_request) Successful in 4m24s
/ Test (pull_request) Successful in 5m34s
8446927339
- Add comicvine.ErrIssueNotFound sentinel returned from fetchIssueDetail
  on HTTP 404 (via internal errHTTP404 in doGetWithRetry); domain package
  stays engine-free per architecture boundary rule.
- Add errors.Is(err, metron.ErrIssueNotFound) and
  errors.Is(err, comicvine.ErrIssueNotFound) to isPermanentEnrichErr
  so stale/deleted issue IDs fail-fast instead of retrying 5× at backoff.
- Update isPermanentEnrichErr doc-comment sentinel inventory for both.
- Add NewFetchIssueDetail 404 test driving the ErrIssueNotFound path.
- Add isPermanentEnrichErr tests for both new sentinels.
- Fold double-Expect nil-guard in metadata_service_test.go writers It
  into Expect(field, err) form; apply same form to sibling Its.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Recompute Match Score — kebab open screenshot (recompute-match-score-kebab-open)

recompute-match-score-kebab-open

**Recompute Match Score — kebab open screenshot** (recompute-match-score-kebab-open) ![recompute-match-score-kebab-open](/attachments/cfce82c1-9e4f-4b13-8a45-97decf55b3f3)

Workflow Detail page screenshot (wf-detail-older-execution)

Older completed ContinueAsNew epoch detail — execution ID and state visible, Cancel absent.

wf-detail-older-execution

**Workflow Detail page screenshot** (wf-detail-older-execution) Older completed ContinueAsNew epoch detail — execution ID and state visible, Cancel absent. ![wf-detail-older-execution](/attachments/10799bb8-0050-4f55-b05a-10a32d181369)
zombor force-pushed bd-bookshelf-bkzy from 8446927339
All checks were successful
/ JS Unit Tests (pull_request) Successful in 1m31s
/ E2E API (pull_request) Successful in 2m26s
/ Lint (pull_request) Successful in 5m4s
/ Integration (pull_request) Successful in 4m19s
/ E2E Browser (pull_request) Successful in 4m24s
/ Test (pull_request) Successful in 5m34s
to 3f47638036
All checks were successful
/ JS Unit Tests (pull_request) Successful in 1m31s
/ E2E API (pull_request) Successful in 2m39s
/ Integration (pull_request) Successful in 3m48s
/ Lint (pull_request) Successful in 3m55s
/ E2E Browser (pull_request) Successful in 4m4s
/ Test (pull_request) Successful in 4m47s
2026-07-03 18:50:17 +00:00
Compare

Workflow Detail page screenshot (wf-detail-older-execution)

Older completed ContinueAsNew epoch detail — execution ID and state visible, Cancel absent.

wf-detail-older-execution

**Workflow Detail page screenshot** (wf-detail-older-execution) Older completed ContinueAsNew epoch detail — execution ID and state visible, Cancel absent. ![wf-detail-older-execution](/attachments/5a28db5d-58dd-4e95-aaca-beb2a17cdf3e)

Recompute Match Score — kebab open screenshot (recompute-match-score-kebab-open)

recompute-match-score-kebab-open

**Recompute Match Score — kebab open screenshot** (recompute-match-score-kebab-open) ![recompute-match-score-kebab-open](/attachments/b3e5f749-1611-4526-bd8f-dd245cb55097)
zombor merged commit 5fbfd4b15b into main 2026-07-03 18:55:45 +00:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
zombor/pergamum!906
No description provided.