fix(comicvine): separator-insensitive volume name match (bookshelf-poza) #879

Merged
zombor merged 2 commits from bd-bookshelf-poza into main 2026-07-02 12:10:26 +00:00
Owner

Summary

  • Adds normalizeSeriesName helper that lowercases and replaces separator punctuation (:, -, , , _, /, .) with a single space, collapses whitespace, and trims
  • Applies normalization to both sides of the scoreVolumes name-match comparison (exact +50 and contains +25) so that series names like Batman - City of Light or Batman City of Light match the canonical ComicVine volume Batman: City of Light
  • When nameMatch=true the structured volume→/issues/?filter=volume:V,issue_number:N path fires, returning only the requested issue instead of the fuzzy free-text fallback that returns all issues

Scope: Only the local scoring comparison is changed. ComicVine query strings are unchanged. No other providers or fallback paths are touched.

Test plan

  • ScoreVolumes unit tests: dash variant, space variant, genuinely-different series (no nameMatch), partial normalized-contains
  • Full httptest.Server journey: "Batman - City of Light #5" → structured path → exactly 1 result (issue #5)
  • Full httptest.Server journey: "Batman City of Light #5" → structured path → exactly 1 result (issue #5)
  • All existing tests still pass
  • Coverage: 100% on internal/metadata/comicvine including normalizeSeriesName

Closes bead bookshelf-poza on merge.

## Summary - Adds `normalizeSeriesName` helper that lowercases and replaces separator punctuation (`:`, `-`, `–`, `—`, `_`, `/`, `.`) with a single space, collapses whitespace, and trims - Applies normalization to **both sides** of the `scoreVolumes` name-match comparison (exact +50 and contains +25) so that series names like `Batman - City of Light` or `Batman City of Light` match the canonical ComicVine volume `Batman: City of Light` - When `nameMatch=true` the structured `volume→/issues/?filter=volume:V,issue_number:N` path fires, returning only the requested issue instead of the fuzzy free-text fallback that returns all issues **Scope:** Only the local scoring comparison is changed. ComicVine query strings are unchanged. No other providers or fallback paths are touched. ## Test plan - [x] `ScoreVolumes` unit tests: dash variant, space variant, genuinely-different series (no nameMatch), partial normalized-contains - [x] Full httptest.Server journey: `"Batman - City of Light #5"` → structured path → exactly 1 result (issue #5) - [x] Full httptest.Server journey: `"Batman City of Light #5"` → structured path → exactly 1 result (issue #5) - [x] All existing tests still pass - [x] Coverage: 100% on `internal/metadata/comicvine` including `normalizeSeriesName` Closes bead bookshelf-poza on merge.
fix(comicvine): separator-insensitive volume name match (bookshelf-poza)
Some checks failed
/ JS Unit Tests (pull_request) Successful in 2m28s
/ E2E API (pull_request) Successful in 3m3s
/ Integration (pull_request) Successful in 4m10s
/ Lint (pull_request) Successful in 4m16s
/ E2E Browser (pull_request) Failing after 4m22s
/ Test (pull_request) Successful in 5m9s
ef8c422eda
Add normalizeSeriesName helper that lowercases and replaces separator
punctuation (colon, dash, en/em dashes, underscore, slash, period) with
a single space, then collapses whitespace and trims. Use it on both sides
of the scoreVolumes name-match comparison so that "Batman - City of Light"
and "Batman City of Light" match the canonical ComicVine volume name
"Batman: City of Light", setting nameMatch=true and taking the structured
issue-#-filtered path instead of the fuzzy free-text fallback.

The query strings sent to ComicVine are unchanged; only the local scoring
comparison is normalized.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Workflow Detail page screenshot (wf-detail-older-execution)

Older completed ContinueAsNew epoch detail — execution ID and state visible, Cancel absent.

wf-detail-older-execution

**Workflow Detail page screenshot** (wf-detail-older-execution) Older completed ContinueAsNew epoch detail — execution ID and state visible, Cancel absent. ![wf-detail-older-execution](/attachments/8da3a627-6319-4ac3-8b56-dd4f5e9d48d5)

Recompute Match Score — kebab open screenshot (recompute-match-score-kebab-open)

recompute-match-score-kebab-open

**Recompute Match Score — kebab open screenshot** (recompute-match-score-kebab-open) ![recompute-match-score-kebab-open](/attachments/99d74b4c-a8b7-4903-8b25-cf48acece012)
refactor(comicvine): compute seriesNorm once outside scoreVolumes loop
All checks were successful
/ JS Unit Tests (pull_request) Successful in 34s
/ E2E API (pull_request) Successful in 2m39s
/ Lint (pull_request) Successful in 3m40s
/ Integration (pull_request) Successful in 3m44s
/ E2E Browser (pull_request) Successful in 4m28s
/ Test (pull_request) Successful in 4m37s
f76ea32155
seriesNorm = normalizeSeriesName(series) is constant across all volume
candidates — hoist it above the loop so we compute it once rather than
once per candidate.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Workflow Detail page screenshot (wf-detail-older-execution)

Older completed ContinueAsNew epoch detail — execution ID and state visible, Cancel absent.

wf-detail-older-execution

**Workflow Detail page screenshot** (wf-detail-older-execution) Older completed ContinueAsNew epoch detail — execution ID and state visible, Cancel absent. ![wf-detail-older-execution](/attachments/3695b739-efc1-4491-9987-4b4cee54f577)

Recompute Match Score — kebab open screenshot (recompute-match-score-kebab-open)

recompute-match-score-kebab-open

**Recompute Match Score — kebab open screenshot** (recompute-match-score-kebab-open) ![recompute-match-score-kebab-open](/attachments/5e918cca-4421-4517-b410-be79f9667eaa)
Author
Owner

CODE REVIEW: APPROVED

Phase 0 — DEMO Verification

No DEMO block present in this bead (separator-insensitivity is a scoring-internal change with no interactive CLI demo). I ran the normalization logic manually (Go program against the same regex/collapse logic) to verify correctness.

Phase 1 — Spec Compliance

Diff: internal/metadata/comicvine/provider.go (+32 lines), internal/metadata/comicvine/structured_test.go (+91 lines).

  • normalizeSeriesName added and wired into both sides of scoreVolumes' name-match. ✓
  • seriesNorm hoisted outside the loop (computed once; series is constant). ✓
  • Query strings sent to ComicVine are unchanged (doc-comment confirms, verified by reading call sites). ✓
  • Scoring weights and nameMatch gating logic are unchanged. ✓

Phase 2 — Code Quality

Correctness: normalizeSeriesName

provider.go:289separatorRE = regexp.MustCompile([-:–—_/.])

  • Dash - at position 0 inside [...] is a literal dash in RE2. Correct.
  • Period . inside [...] is a literal period (not any-char). Correct.
  • En-dash (U+2013) and em-dash (U+2014) are multi-byte UTF-8; Go's regexp handles these correctly as Unicode runes.
  • The regex does not touch digits, apostrophes, ampersands, letters, or parentheses. No meaningful distinctions are collapsed.

Verified normalization outputs:

  • "Batman - City of Light""batman city of light"
  • "Batman: City of Light""batman city of light"
  • "Batman City of Light""batman city of light"
  • "Batman & Robin""batman & robin" (ampersand preserved; does NOT collapse to "batman robin") ✓
  • "X-Men""x men", "X Men""x men" (same series, acceptable)
  • "S.H.I.E.L.D.""s h i e l d", "SHIELD""shield" (NOT equal — no false positive across acronym styles) ✓

Over-match analysis

The one new behavior worth naming: a partial query like "Batman City" previously did NOT contains-match "Batman: City of Light" (the colon blocked strings.Contains). After the patch it does, because both normalize to strings where "batman city" is a substring of "batman city of light". This is an acceptable trade-off — the contains branch awards +25 and sets nameMatch, but all other scoring factors (year, issue count, publisher, recency) differentiate further. This is strictly within the existing fuzzy-matching design intent and is not a correctness bug.

Tests

Four new ScoreVolumes unit tests (structured_test.go:264–295):

  1. Dash variant ("Batman - City of Light" vs "Batman: City of Light") → nameMatch=true. Non-tautological: was false before the patch (the old nameLower path had strings.Contains("batman: city of light", "batman - city of light") = false). ✓
  2. Space variant ("Batman City of Light" vs "Batman: City of Light") → nameMatch=true. Non-tautological: was false before the patch (the colon in the volume name blocked the old contains check). ✓
  3. Different series ("Superman" vs "Batman: City of Light") → nameMatch=false. ✓
  4. Partial/contains ("Batman" vs "Batman: City of Light") → nameMatch=true. Valid regression test for the contains branch via the new code path; however this behavior was already true before the patch (old code: strings.Contains("batman: city of light", "batman") = true). It does not specifically exercise the new separator normalization. Minor only.

Two new httptest.Server journey tests (structured_test.go:2022–2077):

  • Both mock a ComicVine volume named "Batman: City of Light" and an issue #5, then query with the dash and space variants. Assert HaveLen(1) (structured path fires, not fuzzy fallback) and SeriesNumber ≈ 5. Non-tautological — these journeys require nameMatch=true to route through the structured issue-#-filtered path; without the patch, the structured path would be skipped and a fuzzy result set (or empty set) would be returned instead. ✓

[MINOR] internal/metadata/comicvine/structured_test.go:288 — partial/contains test does not exercise new normalization
The "Batman" vs "Batman: City of Light" It passes both before and after the patch ("batman" was always a substring of "batman: city of light"). It is a valid regression guard for the contains branch but does not specifically test that separator normalization enabled a new contains match. Consider adding a case where normalization is the enabling factor for the contains path (e.g. "Batman City" vs "Batman: City of Light"). Does not block.

REVIEW VERDICT: 0 blocker, 0 major, 1 minor

CODE REVIEW: APPROVED ## Phase 0 — DEMO Verification No DEMO block present in this bead (separator-insensitivity is a scoring-internal change with no interactive CLI demo). I ran the normalization logic manually (Go program against the same regex/collapse logic) to verify correctness. ## Phase 1 — Spec Compliance Diff: `internal/metadata/comicvine/provider.go` (+32 lines), `internal/metadata/comicvine/structured_test.go` (+91 lines). - `normalizeSeriesName` added and wired into both sides of `scoreVolumes`' name-match. ✓ - `seriesNorm` hoisted outside the loop (computed once; `series` is constant). ✓ - Query strings sent to ComicVine are unchanged (doc-comment confirms, verified by reading call sites). ✓ - Scoring weights and `nameMatch` gating logic are unchanged. ✓ ## Phase 2 — Code Quality ### Correctness: `normalizeSeriesName` `provider.go:289` — `separatorRE = regexp.MustCompile(`[-:–—_/.]`)` - Dash `-` at position 0 inside `[...]` is a literal dash in RE2. Correct. - Period `.` inside `[...]` is a literal period (not any-char). Correct. - En-dash `–` (U+2013) and em-dash `—` (U+2014) are multi-byte UTF-8; Go's `regexp` handles these correctly as Unicode runes. - The regex does **not** touch digits, apostrophes, ampersands, letters, or parentheses. No meaningful distinctions are collapsed. Verified normalization outputs: - `"Batman - City of Light"` → `"batman city of light"` ✓ - `"Batman: City of Light"` → `"batman city of light"` ✓ - `"Batman City of Light"` → `"batman city of light"` ✓ - `"Batman & Robin"` → `"batman & robin"` (ampersand preserved; does NOT collapse to `"batman robin"`) ✓ - `"X-Men"` → `"x men"`, `"X Men"` → `"x men"` (same series, acceptable) - `"S.H.I.E.L.D."` → `"s h i e l d"`, `"SHIELD"` → `"shield"` (NOT equal — no false positive across acronym styles) ✓ ### Over-match analysis The one new behavior worth naming: a partial query like `"Batman City"` previously did NOT contains-match `"Batman: City of Light"` (the colon blocked `strings.Contains`). After the patch it does, because both normalize to strings where `"batman city"` is a substring of `"batman city of light"`. This is an acceptable trade-off — the contains branch awards +25 and sets `nameMatch`, but all other scoring factors (year, issue count, publisher, recency) differentiate further. This is strictly within the existing fuzzy-matching design intent and is not a correctness bug. ### Tests Four new `ScoreVolumes` unit tests (`structured_test.go:264–295`): 1. **Dash variant** (`"Batman - City of Light"` vs `"Batman: City of Light"`) → `nameMatch=true`. Non-tautological: was `false` before the patch (the old `nameLower` path had `strings.Contains("batman: city of light", "batman - city of light")` = false). ✓ 2. **Space variant** (`"Batman City of Light"` vs `"Batman: City of Light"`) → `nameMatch=true`. Non-tautological: was `false` before the patch (the colon in the volume name blocked the old contains check). ✓ 3. **Different series** (`"Superman"` vs `"Batman: City of Light"`) → `nameMatch=false`. ✓ 4. **Partial/contains** (`"Batman"` vs `"Batman: City of Light"`) → `nameMatch=true`. Valid regression test for the contains branch via the new code path; however this behavior was already `true` before the patch (old code: `strings.Contains("batman: city of light", "batman")` = true). It does not specifically exercise the new separator normalization. Minor only. Two new httptest.Server journey tests (`structured_test.go:2022–2077`): - Both mock a ComicVine volume named `"Batman: City of Light"` and an issue #5, then query with the dash and space variants. Assert `HaveLen(1)` (structured path fires, not fuzzy fallback) and `SeriesNumber ≈ 5`. Non-tautological — these journeys require `nameMatch=true` to route through the structured issue-#-filtered path; without the patch, the structured path would be skipped and a fuzzy result set (or empty set) would be returned instead. ✓ --- [MINOR] `internal/metadata/comicvine/structured_test.go:288` — partial/contains test does not exercise new normalization The "Batman" vs "Batman: City of Light" `It` passes both before and after the patch (`"batman"` was always a substring of `"batman: city of light"`). It is a valid regression guard for the contains branch but does not specifically test that separator normalization enabled a new contains match. Consider adding a case where normalization is the enabling factor for the contains path (e.g. `"Batman City"` vs `"Batman: City of Light"`). Does not block. REVIEW VERDICT: 0 blocker, 0 major, 1 minor
zombor merged commit 77e17d5258 into main 2026-07-02 12:10:26 +00:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
zombor/pergamum!879
No description provided.