fix(comicvine): structured volume+issue_number lookup for Series #N (bookshelf-wzef) #414

Merged
zombor merged 3 commits from bd-bookshelf-wzef into main 2026-06-08 01:45:13 +00:00
Owner

Summary

  • Implements Grimmory's proven volume→issue structured lookup path: parse 'Series #N [(Year)]' from the title, search volumes by series name (cached, 10min TTL), score by position+year-alignment, fetch /issues/?filter=volume:ID,issue_number:N for the top 3 volumes, return exact matches first
  • Free-text /search/?resources=issue path preserved as fallback for plain titles or when structured path yields nothing
  • Rate budget: ~2-3 req/search typical (vs ~6 before); per-series volume cache avoids repeat volume requests; max 3 volumes checked; year-aligned early-stop; detail fetched only for matched issues (no 5×-per-candidate loop)
  • 100% test coverage; mock ComicVine at httptest boundary; no real API calls in tests

Test plan

  • 'X-O Manowar #18' → exact #18 returned (volume search + filter)
  • Suffix issue (18.1, 0AU) handled correctly
  • Year disambiguation picks the right printing
  • issue_number mismatch logged and skipped; falls back to free-text
  • Title with no #N uses existing free-text path unchanged
  • Volume cache hit avoids second volume request (request count asserted)
  • Structured path empty → falls back to free-text
  • Detail fetch failure → falls back to issueListResult data
  • Full ComicMetadata fields from issueListResult (writers, pencilers, inkers, etc.)
  • All tests pass, 100% coverage, no new exclusions

Closes bead bookshelf-wzef on merge.

🤖 Generated with Claude Code

## Summary - Implements Grimmory's proven volume→issue structured lookup path: parse 'Series #N [(Year)]' from the title, search volumes by series name (cached, 10min TTL), score by position+year-alignment, fetch /issues/?filter=volume:ID,issue_number:N for the top 3 volumes, return exact matches first - Free-text /search/?resources=issue path preserved as fallback for plain titles or when structured path yields nothing - Rate budget: ~2-3 req/search typical (vs ~6 before); per-series volume cache avoids repeat volume requests; max 3 volumes checked; year-aligned early-stop; detail fetched only for matched issues (no 5×-per-candidate loop) - 100% test coverage; mock ComicVine at httptest boundary; no real API calls in tests ## Test plan - [x] 'X-O Manowar #18' → exact #18 returned (volume search + filter) - [x] Suffix issue (18.1, 0AU) handled correctly - [x] Year disambiguation picks the right printing - [x] issue_number mismatch logged and skipped; falls back to free-text - [x] Title with no #N uses existing free-text path unchanged - [x] Volume cache hit avoids second volume request (request count asserted) - [x] Structured path empty → falls back to free-text - [x] Detail fetch failure → falls back to issueListResult data - [x] Full ComicMetadata fields from issueListResult (writers, pencilers, inkers, etc.) - [x] All tests pass, 100% coverage, no new exclusions Closes bead bookshelf-wzef on merge. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
fix(comicvine): structured volume+issue_number lookup for Series #N (bookshelf-wzef)
All checks were successful
/ JS Unit Tests (pull_request) Successful in 51s
/ Lint (pull_request) Successful in 2m24s
/ Test (pull_request) Successful in 4m11s
/ E2E Browser (pull_request) Successful in 3m55s
/ Integration (pull_request) Successful in 6m6s
/ E2E API (pull_request) Successful in 6m57s
a920453adc
Implement the Grimmory-proven volume→issue structured path so 'X-O Manowar #18'
returns the exact #18 issue instead of the top-5 free-text relevance results.

Changes:
- Parse 'Series #N [(Year)]' from title via titleRE; fall back to free-text when
  no issue number is present (existing behaviour preserved).
- searchVolumes: GET /search/?resources=volume&query=<series>&limit=25; results
  cached per-series with a 10-min TTL (concurrency-safe volumeCache).
- scoreVolumes: position-bonus + +100 for start_year match (mirrors Grimmory's
  calculateVolumeScore); top MAX_VOLUMES_TO_CHECK=3 checked.
- fetchIssuesByFilter: GET /issues/?filter=volume:<id>,issue_number:<N> with rich
  field_list; issue_number mismatch guard logs-and-skips.
- Early-stop on year-aligned volume match; detail fetched only for matched issues
  (not a 5×-per-candidate loop as before).
- issueListResultToMetadata + buildComicMetadataFromList map the rich /issues/
  filter response without requiring a separate detail fetch.
- normalizeIssueNumber strips leading zeros, preserves suffixes (18AU, 18.1).
- Rate budget: ~2-3 req/search typical vs ~6 previously; volume cache makes
  repeat-series cost 0 volume requests.
- Tests: 100% coverage, mock httptest boundary (no real API calls).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Author
Owner

Security Review — bookshelf-wzef (PR #414)

Diff reviewed: origin/main...origin/bd-bookshelf-wzef
Files: internal/metadata/comicvine/provider.go, internal/metadata/comicvine/structured_test.go, internal/metadata/comicvine/export_test.go


URL / Query Injection

query= param (series name → volume search): series is passed to params.Set("query", series) and the full params block is serialized via params.Encode() (standard url.Values.Encode()). This percent-encodes the value; any special characters in the series name are safely escaped before the URL is assembled. No injection possible here.

filter= param (volume ID + issue number): The filter string is constructed as:

params.Set("filter", fmt.Sprintf("volume:%d,issue_number:%s", volumeID, issueNumber))

volumeID is an int64 formatted with %d (integer only). issueNumber is the output of normalizeIssueNumber(issueNum), where issueNum is captured by regex group 2:

([0-9]+(?:[.\-][0-9]+|[A-Za-z]+)?)

This constrains the value to digits, optionally followed by ./- + digits, or alpha characters. The characters that could corrupt the ComicVine server-side filter syntax — , (filter clause separator), : (key/value separator), & (query string separator) — are structurally impossible from this regex. The entire filter value is then further URL-encoded by params.Encode(). No injection vector.

SSRF

baseURL is set to the hardcoded constant defaultBaseURL = "https://comicvine.gamespot.com/api" at construction time. The config field MetadataProviderComicVineBaseURL is explicitly documented as test-only and is NOT registered as an ff flag / environment variable, preventing operator or user injection of an attacker-controlled host. No user-request input influences the host or scheme. No SSRF vector.

API Key Leakage

The API key is passed as a URL query parameter (?api_key=KEY). The existing redactAPIKey/redactURLErrors machinery walks the full *url.Error chain before logging and replaces the key value with REDACTED. Log statements at the Info level (volume search completed, issues filter completed, issue detail fetched, etc.) log only structured fields like series, volume_id, issue_number, results — never the raw reqURL. No key leakage.

DoS / Rate Budget

Every new request path (searchVolumes, fetchIssuesByFilter) routes through doGetWithRetry, which calls both hourly.Wait(ctx) (200/hr bucket) and limiter(ctx) (per-second) on every attempt, including retries. The existing limiters gate 100% of new requests. The structured path adds at most 1 + maxVolumesToCheck(3) + maxVolumesToCheck(3) = 7 requests per search in the worst case (1 volume search + 3 issue filter + 3 detail), bounded by constants.

Volume Cache — No Size Cap (MINOR observation, not a blocking finding)

The volumeCache is a map[string]volumeCacheEntry with TTL-based lazy expiry (10 min). Stale entries are only removed when a get() for the same key is called after expiry — there is no background sweep or maximum-entry eviction. In a normal personal library use-case the number of distinct series is bounded (hundreds to low thousands), so the memory impact is negligible. Each entry holds at most 25 volumeSearchResult structs (small structs, ~100 bytes each) → worst case ~2.5 KB per series → thousands of series ≈ a few MB. Not a practical DoS risk for the target deployment scale.

Test Fixtures

No real API keys, passwords, tokens, or credentials in any test fixture. Tests use noAPIKey (nil/empty) or "test-key" literals. Clean.


REVIEW VERDICT: 0 blocker, 0 major, 0 minor

## Security Review — bookshelf-wzef (PR #414) Diff reviewed: `origin/main...origin/bd-bookshelf-wzef` Files: `internal/metadata/comicvine/provider.go`, `internal/metadata/comicvine/structured_test.go`, `internal/metadata/comicvine/export_test.go` --- ### URL / Query Injection **`query=` param (series name → volume search):** `series` is passed to `params.Set("query", series)` and the full params block is serialized via `params.Encode()` (standard `url.Values.Encode()`). This percent-encodes the value; any special characters in the series name are safely escaped before the URL is assembled. No injection possible here. **`filter=` param (volume ID + issue number):** The filter string is constructed as: ```go params.Set("filter", fmt.Sprintf("volume:%d,issue_number:%s", volumeID, issueNumber)) ``` `volumeID` is an `int64` formatted with `%d` (integer only). `issueNumber` is the output of `normalizeIssueNumber(issueNum)`, where `issueNum` is captured by regex group 2: ``` ([0-9]+(?:[.\-][0-9]+|[A-Za-z]+)?) ``` This constrains the value to digits, optionally followed by `.`/`-` + digits, or alpha characters. The characters that could corrupt the ComicVine server-side filter syntax — `,` (filter clause separator), `:` (key/value separator), `&` (query string separator) — are structurally impossible from this regex. The entire `filter` value is then further URL-encoded by `params.Encode()`. **No injection vector.** ### SSRF `baseURL` is set to the hardcoded constant `defaultBaseURL = "https://comicvine.gamespot.com/api"` at construction time. The config field `MetadataProviderComicVineBaseURL` is explicitly documented as test-only and is NOT registered as an ff flag / environment variable, preventing operator or user injection of an attacker-controlled host. No user-request input influences the host or scheme. **No SSRF vector.** ### API Key Leakage The API key is passed as a URL query parameter (`?api_key=KEY`). The existing `redactAPIKey`/`redactURLErrors` machinery walks the full `*url.Error` chain before logging and replaces the key value with `REDACTED`. Log statements at the Info level (`volume search completed`, `issues filter completed`, `issue detail fetched`, etc.) log only structured fields like `series`, `volume_id`, `issue_number`, `results` — never the raw `reqURL`. **No key leakage.** ### DoS / Rate Budget Every new request path (`searchVolumes`, `fetchIssuesByFilter`) routes through `doGetWithRetry`, which calls both `hourly.Wait(ctx)` (200/hr bucket) and `limiter(ctx)` (per-second) on every attempt, including retries. The existing limiters gate 100% of new requests. The structured path adds at most `1 + maxVolumesToCheck(3) + maxVolumesToCheck(3)` = 7 requests per search in the worst case (1 volume search + 3 issue filter + 3 detail), bounded by constants. ### Volume Cache — No Size Cap (MINOR observation, not a blocking finding) The `volumeCache` is a `map[string]volumeCacheEntry` with TTL-based lazy expiry (10 min). Stale entries are only removed when a `get()` for the same key is called after expiry — there is no background sweep or maximum-entry eviction. In a normal personal library use-case the number of distinct series is bounded (hundreds to low thousands), so the memory impact is negligible. Each entry holds at most 25 `volumeSearchResult` structs (small structs, ~100 bytes each) → worst case ~2.5 KB per series → thousands of series ≈ a few MB. Not a practical DoS risk for the target deployment scale. ### Test Fixtures No real API keys, passwords, tokens, or credentials in any test fixture. Tests use `noAPIKey` (nil/empty) or `"test-key"` literals. **Clean.** --- **REVIEW VERDICT: 0 blocker, 0 major, 0 minor**
Author
Owner

Code Review — PR #414 (bookshelf-wzef)

Branch: bd-bookshelf-wzef — ComicVine structured volume→issue lookup


Findings

[MAJOR] internal/metadata/comicvine/structured_test.go:232 — callCount declared and written in handler but never asserted; year-early-stop is not actually tested
The "year disambiguation picks the right printing" spec declares callCount := 0 and increments it in the /issues/ handler, but no Expect(callCount...) assertion exists. The two It blocks only check that results are non-empty and that SeriesNumber ≈ 18. This means the year-aligned early-stop (the key rate-budget guarantee: "stop after the first year-matched volume yields a result, do not check remaining volumes") is completely untested. A regression that removes if sv.yearMatch && len(results) > 0 { break } would pass all tests. Add It("checks only 1 volume when year-aligned volume matches first", func() { Expect(callCount).To(Equal(1)) }) — or switch to atomic.Int32 + assertion. The underlying early-stop logic in production code is correct, but the test gives false confidence on a rate-budget-critical path.

[MINOR] internal/metadata/comicvine/structured_test.go:109–111 — three Expect calls per DescribeTable entry violates one-assertion-per-It convention
Each Entry body contains Expect(s), Expect(i), and Expect(y). Project conventions (project-conventions.md) require exactly one Expect per It block; DescribeTable entries compile to It blocks. Split into three separate DescribeTables (one per return value) or use three tables. Minor: no correctness impact.

[MINOR] internal/metadata/comicvine/structured_test.go:787 — provider invocation inside BeforeEach, not JustBeforeEach
The "volume and filter search both use API key" spec calls provider(ctx, ...) at line 787 inside BeforeEach. Project conventions require BeforeEach for pure setup (fixtures, stubs) and JustBeforeEach for the system-under-test call. Move the provider(ctx, ...) call to a JustBeforeEach block. Minor: no correctness impact.

[MINOR] internal/metadata/comicvine/structured_test.go:790–791 — API-key assertion is underspecified
Expect(gotKeys).To(ContainElement("test-key")) passes if the key is sent on any request (e.g. just the volume search). Both searchVolumes and fetchIssuesByFilter do include the key by code inspection, but the test does not verify the filter request specifically sent it. Strengthen to Expect(gotKeys).To(HaveLen(3)) (volume + filter + detail each send it) or route-tag the key collection and assert both paths.

[MINOR] internal/metadata/comicvine/provider.go:75–78 — regex allows empty series name; bare #18 sends a useless volume search
titleRE matches ^(.*?)#... — group 1 (series) can be empty. A title of "#18" produces series="", issueNum="18", which enters the structured path and calls searchVolumes with query="". ComicVine will return an arbitrary result set. Cache key "" permanently pins those results for 10 minutes. This wastes one rate-limited request. Fix: add if strings.TrimSpace(series) == "" { return "", "", "" } in parseTitleComponents, or guard issueNum != "" && strings.TrimSpace(series) != "" at the call site.

[MINOR] internal/metadata/comicvine/structured_test.go:105–119 — ParseTitleComponents table is missing edge cases
No entry tests: (a) bare "#18"("","18","") empty series; (b) title with two # signs like "Batman #18 #2" which the lazy .*? anchors to the last #, yielding series="Batman #18", issueNum="2" (wrong series name). Neither scenario has catastrophic impact in practice (bare #18 is rare; multi-# titles are rare), but both are silent wrong-answer paths on a function exported for white-box testing.


REVIEW VERDICT: 0 blocker, 1 major, 5 minor

## Code Review — PR #414 (bookshelf-wzef) **Branch:** `bd-bookshelf-wzef` — ComicVine structured volume→issue lookup --- ### Findings [MAJOR] internal/metadata/comicvine/structured_test.go:232 — `callCount` declared and written in handler but never asserted; year-early-stop is not actually tested The "year disambiguation picks the right printing" spec declares `callCount := 0` and increments it in the `/issues/` handler, but no `Expect(callCount...)` assertion exists. The two `It` blocks only check that results are non-empty and that `SeriesNumber ≈ 18`. This means the year-aligned early-stop (the key rate-budget guarantee: "stop after the first year-matched volume yields a result, do not check remaining volumes") is completely untested. A regression that removes `if sv.yearMatch && len(results) > 0 { break }` would pass all tests. Add `It("checks only 1 volume when year-aligned volume matches first", func() { Expect(callCount).To(Equal(1)) })` — or switch to `atomic.Int32` + assertion. The underlying early-stop logic in production code is correct, but the test gives false confidence on a rate-budget-critical path. [MINOR] internal/metadata/comicvine/structured_test.go:109–111 — three `Expect` calls per `DescribeTable` entry violates one-assertion-per-It convention Each `Entry` body contains `Expect(s)`, `Expect(i)`, and `Expect(y)`. Project conventions (project-conventions.md) require exactly one `Expect` per `It` block; DescribeTable entries compile to `It` blocks. Split into three separate `DescribeTable`s (one per return value) or use three tables. Minor: no correctness impact. [MINOR] internal/metadata/comicvine/structured_test.go:787 — provider invocation inside `BeforeEach`, not `JustBeforeEach` The "volume and filter search both use API key" spec calls `provider(ctx, ...)` at line 787 inside `BeforeEach`. Project conventions require `BeforeEach` for pure setup (fixtures, stubs) and `JustBeforeEach` for the system-under-test call. Move the `provider(ctx, ...)` call to a `JustBeforeEach` block. Minor: no correctness impact. [MINOR] internal/metadata/comicvine/structured_test.go:790–791 — API-key assertion is underspecified `Expect(gotKeys).To(ContainElement("test-key"))` passes if the key is sent on *any* request (e.g. just the volume search). Both `searchVolumes` and `fetchIssuesByFilter` do include the key by code inspection, but the test does not verify the filter request specifically sent it. Strengthen to `Expect(gotKeys).To(HaveLen(3))` (volume + filter + detail each send it) or route-tag the key collection and assert both paths. [MINOR] internal/metadata/comicvine/provider.go:75–78 — regex allows empty series name; bare `#18` sends a useless volume search `titleRE` matches `^(.*?)#...` — group 1 (series) can be empty. A title of `"#18"` produces `series=""`, `issueNum="18"`, which enters the structured path and calls `searchVolumes` with `query=""`. ComicVine will return an arbitrary result set. Cache key `""` permanently pins those results for 10 minutes. This wastes one rate-limited request. Fix: add `if strings.TrimSpace(series) == "" { return "", "", "" }` in `parseTitleComponents`, or guard `issueNum != "" && strings.TrimSpace(series) != ""` at the call site. [MINOR] internal/metadata/comicvine/structured_test.go:105–119 — `ParseTitleComponents` table is missing edge cases No entry tests: (a) bare `"#18"` → `("","18","")` empty series; (b) title with two `#` signs like `"Batman #18 #2"` which the lazy `.*?` anchors to the *last* `#`, yielding `series="Batman #18"`, `issueNum="2"` (wrong series name). Neither scenario has catastrophic impact in practice (bare `#18` is rare; multi-`#` titles are rare), but both are silent wrong-answer paths on a function exported for white-box testing. --- REVIEW VERDICT: 0 blocker, 1 major, 5 minor
zombor force-pushed bd-bookshelf-wzef from a920453adc
All checks were successful
/ JS Unit Tests (pull_request) Successful in 51s
/ Lint (pull_request) Successful in 2m24s
/ Test (pull_request) Successful in 4m11s
/ E2E Browser (pull_request) Successful in 3m55s
/ Integration (pull_request) Successful in 6m6s
/ E2E API (pull_request) Successful in 6m57s
to 494d9bab3a
All checks were successful
/ Lint (pull_request) Successful in 2m8s
/ JS Unit Tests (pull_request) Successful in 58s
/ Test (pull_request) Successful in 3m10s
/ Integration (pull_request) Successful in 5m10s
/ E2E Browser (pull_request) Successful in 4m41s
/ E2E API (pull_request) Successful in 7m42s
2026-06-08 01:09:30 +00:00
Compare
Merge branch 'main' into bd-bookshelf-wzef
All checks were successful
/ Lint (pull_request) Successful in 2m43s
/ Test (pull_request) Successful in 3m25s
/ JS Unit Tests (pull_request) Successful in 45s
/ Integration (pull_request) Successful in 5m42s
/ E2E API (pull_request) Successful in 5m6s
/ E2E Browser (pull_request) Successful in 3m39s
31ef47818d
zombor merged commit 41036b6d89 into main 2026-06-08 01:45:13 +00:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
zombor/pergamum!414
No description provided.