fix(books): comic-aware metadata_match_score so comics aren't over-scored (bookshelf-v1to) #816

Merged
zombor merged 3 commits from bd-bookshelf-v1to into main 2026-06-29 01:13:52 +00:00
Owner

Summary

  • Root cause: MatchScoreInput + CalculateMatchScore were ebook-only — they scored CBX books on ISBN, page_count, authors, categories, etc. A comic with title/series/description but zero comic_metadata scored ~81% (book 4595: Spectacular Spider-Girl #2).
  • Fix: CalculateMatchScore branches on IsComic. Comic books use a hardcoded comic weight table (104 total weight) that grades comic-specific fields; ebooks continue to use the existing user-configurable weight table unchanged.
  • Wiring: GetMetadataForMatchScoreWithComic replaces GetMetadataForMatchScore in app.go and the RecalcMatchScores workflow — it fetches comic completeness data in one SQL round-trip (12 correlated subqueries) and sets IsComic=true for CBX books.

Comic weight table (total = 104)

Field Weight
Title 8
SeriesName 8
Description 6
Cover 8
HasComicMetadata (row exists) 5
IssueNumber 15
StoryArc 12
Imprint 8
VolumeName 8
Format 5
CoverDate 6
Creators (has 1+) 5
Characters (has 1+) 5
Teams (has 1+) 3
Locations (has 1+) 2

Book 4595 (Spectacular Spider-Girl #2) before: 80.77% | after: ~21% (22/104)

Deviation from Grimmory

Grimmory match_score is generic-book-only (no comic-aware branching). This is an intentional pergamum improvement; CBX scores will differ from Grimmory output.

Recompute note

The backfill migration key (backfill_metadata_match_score_v1) is intentionally unchanged — it was marked done on the live install. A manual trigger of POST /settings/match-weights/recalculate will re-score all books (including comics) with the new logic. The RecalcMatchScores workflow now uses GetMetadataForMatchScoreWithComic, so any future recalc run picks up comic scoring automatically.

Test plan

  • make test — 3 new CalculateMatchScore specs (full comic CBX > 80%, zero comic CBX < 25%, ebook > 70%), + GetComicMatchScoreData + GetMetadataForMatchScoreWithComic store tests (all error paths covered)
  • make coverage — zero uncovered statement blocks (100% gate)
  • go build ./... — compiles clean
  • golangci-lint run ./internal/books/... ./internal/app/... — 0 issues
  • CI green

Closes bead bookshelf-v1to on merge.

## Summary - **Root cause:** MatchScoreInput + CalculateMatchScore were ebook-only — they scored CBX books on ISBN, page_count, authors, categories, etc. A comic with title/series/description but zero comic_metadata scored ~81% (book 4595: Spectacular Spider-Girl #2). - **Fix:** CalculateMatchScore branches on `IsComic`. Comic books use a hardcoded comic weight table (104 total weight) that grades comic-specific fields; ebooks continue to use the existing user-configurable weight table unchanged. - **Wiring:** `GetMetadataForMatchScoreWithComic` replaces `GetMetadataForMatchScore` in app.go and the RecalcMatchScores workflow — it fetches comic completeness data in one SQL round-trip (12 correlated subqueries) and sets `IsComic=true` for CBX books. ## Comic weight table (total = 104) | Field | Weight | |-------|--------| | Title | 8 | | SeriesName | 8 | | Description | 6 | | Cover | 8 | | HasComicMetadata (row exists) | 5 | | IssueNumber | 15 | | StoryArc | 12 | | Imprint | 8 | | VolumeName | 8 | | Format | 5 | | CoverDate | 6 | | Creators (has 1+) | 5 | | Characters (has 1+) | 5 | | Teams (has 1+) | 3 | | Locations (has 1+) | 2 | **Book 4595 (Spectacular Spider-Girl #2) before:** 80.77% | **after:** ~21% (22/104) ## Deviation from Grimmory Grimmory match_score is generic-book-only (no comic-aware branching). This is an intentional pergamum improvement; CBX scores will differ from Grimmory output. ## Recompute note The backfill migration key (`backfill_metadata_match_score_v1`) is intentionally unchanged — it was marked done on the live install. A manual trigger of POST /settings/match-weights/recalculate will re-score all books (including comics) with the new logic. The `RecalcMatchScores` workflow now uses `GetMetadataForMatchScoreWithComic`, so any future recalc run picks up comic scoring automatically. ## Test plan - [x] `make test` — 3 new CalculateMatchScore specs (full comic CBX > 80%, zero comic CBX < 25%, ebook > 70%), + GetComicMatchScoreData + GetMetadataForMatchScoreWithComic store tests (all error paths covered) - [x] `make coverage` — zero uncovered statement blocks (100% gate) - [x] `go build ./...` — compiles clean - [x] `golangci-lint run ./internal/books/... ./internal/app/...` — 0 issues - [x] CI green Closes bead bookshelf-v1to on merge.
fix(books): comic-aware metadata_match_score so comics aren't over-scored (bookshelf-v1to)
All checks were successful
/ JS Unit Tests (pull_request) Successful in 40s
/ Hugo build (pull_request) Successful in 45s
/ E2E API (pull_request) Successful in 2m11s
/ Lint (pull_request) Successful in 2m12s
/ Integration (pull_request) Successful in 2m56s
/ Test (pull_request) Successful in 3m3s
/ E2E Browser (pull_request) Successful in 3m22s
d808b204f8
- Add comic fields to MatchScoreInput (IsComic, HasComicMetadata, IssueNumber,
  StoryArc, Imprint, VolumeName, Format, CoverDate, CreatorCount,
  CharacterCount, TeamCount, LocationCount)
- CalculateMatchScoreWeighted branches on IsComic: comic books use a hardcoded
  comic weight table (104 total); ebooks use the existing user-configurable
  weight table unchanged
- Add ComicMatchScoreRow + GetComicMatchScoreData (hand-written SQL, single
  round-trip with 12 correlated subqueries) + GetMetadataForMatchScoreWithComic
  that wraps the existing function and enriches it with comic data
- Comic weight table: title(8)+series(8)+desc(6)+cover(8)+hasComicMetadata(5)+
  issue(15)+storyArc(12)+imprint(8)+volumeName(8)+format(5)+coverDate(6)+
  creators(5)+characters(5)+teams(3)+locations(2) = 104
- CBX with only title/series/description (book 4595 scenario) scores 22/104
  = ~21% instead of the previous 80.77%
- CBX with full comic metadata scores 100%
- Ebook scoring is UNCHANGED (IsComic=false, existing weights apply)
- Wire GetMetadataForMatchScoreWithComic in app.go and build_extended_deps.go
  so both the startup backfill and the RecalcMatchScores workflow use it
- Backfill migration key is unchanged; next RecalcMatchScores run re-scores all
  books with the new logic

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Author
Owner

[BLOCKER] internal/books/recompute_scores.go:20-28 and internal/books/match_score_comic_store.go:42-57 — N+1 query pattern at scale

The RecomputeScoresBatch function loops through bookIDs sequentially and calls getMatchInput() for each book individually:

for _, bookID := range bookIDs {
    input, err := getMatchInput(ctx, bookID)
    // getMatchInput calls GetBookMetadataForMatchScore (1 query/book)
    // + GetComicMatchScoreData (1 query/book with 12 correlated subqueries)
}

For the startup backfill of 250k books in 100-book batches: 2,500 batches × 2 queries/book = 500,000+ database round trips. This violates the hard rule "No N+1" in project-conventions.md.

Fix: Batch the comic-data fetches. Instead of GetComicMatchScoreData(conn.QueryContext) returning a per-book curried function, create GetComicMatchScoreDataBatch(conn, bookIDs []int64) that fetches all books in one query with WHERE book_id IN (...), returning a map[int64]ComicMatchScoreRow. Then RecomputeScoresBatch can fetch comic data once per 100-book batch instead of 100 times per batch.


[MAJOR] internal/app/build_extended_deps.go:73-80 and bead comment — Stale stored scores until manual recompute

The bead says "Backfill migration key is unchanged; next RecalcMatchScores run re-scores all." This means the startup backfill will NOT re-run (it only runs once per migration-key). Comics currently scored at ~80% will keep that old score until the user manually triggers a RecalcMatchScores workflow.

Is this intentional? If yes, the bead and PR should document this — existing users will see inconsistent scores (stored: old 80%, detail-page-computed: new 21%) until they manually kick off a recompute. If unintentional, bump the backfill migration key so it re-runs and updates all scores at startup.


[MINOR] internal/books/match_score.go:111-115 — Document weight-total rationale

The comment "total=104" lacks rationale. Add a note: "chosen so a comic with only generic fields (title + series + description, but no comic_metadata row) scores below 25%, matching the book-4595 test case."


REVIEW VERDICT: 1 blocker, 1 major, 1 minor

[BLOCKER] internal/books/recompute_scores.go:20-28 and internal/books/match_score_comic_store.go:42-57 — N+1 query pattern at scale The RecomputeScoresBatch function loops through bookIDs sequentially and calls getMatchInput() for each book individually: ```go for _, bookID := range bookIDs { input, err := getMatchInput(ctx, bookID) // getMatchInput calls GetBookMetadataForMatchScore (1 query/book) // + GetComicMatchScoreData (1 query/book with 12 correlated subqueries) } ``` For the startup backfill of 250k books in 100-book batches: 2,500 batches × 2 queries/book = 500,000+ database round trips. This violates the hard rule "No N+1" in project-conventions.md. **Fix:** Batch the comic-data fetches. Instead of GetComicMatchScoreData(conn.QueryContext) returning a per-book curried function, create GetComicMatchScoreDataBatch(conn, bookIDs []int64) that fetches all books in one query with `WHERE book_id IN (...)`, returning a map[int64]ComicMatchScoreRow. Then RecomputeScoresBatch can fetch comic data once per 100-book batch instead of 100 times per batch. --- [MAJOR] internal/app/build_extended_deps.go:73-80 and bead comment — Stale stored scores until manual recompute The bead says "Backfill migration key is unchanged; next RecalcMatchScores run re-scores all." This means the startup backfill will NOT re-run (it only runs once per migration-key). Comics currently scored at ~80% will keep that old score until the user manually triggers a RecalcMatchScores workflow. Is this intentional? If yes, the bead and PR should document this — existing users will see inconsistent scores (stored: old 80%, detail-page-computed: new 21%) until they manually kick off a recompute. If unintentional, bump the backfill migration key so it re-runs and updates all scores at startup. --- [MINOR] internal/books/match_score.go:111-115 — Document weight-total rationale The comment "total=104" lacks rationale. Add a note: "chosen so a comic with only generic fields (title + series + description, but no comic_metadata row) scores below 25%, matching the book-4595 test case." --- REVIEW VERDICT: 1 blocker, 1 major, 1 minor
zombor force-pushed bd-bookshelf-v1to from d808b204f8
All checks were successful
/ JS Unit Tests (pull_request) Successful in 40s
/ Hugo build (pull_request) Successful in 45s
/ E2E API (pull_request) Successful in 2m11s
/ Lint (pull_request) Successful in 2m12s
/ Integration (pull_request) Successful in 2m56s
/ Test (pull_request) Successful in 3m3s
/ E2E Browser (pull_request) Successful in 3m22s
to 774d666047
All checks were successful
/ E2E API (pull_request) Successful in 2m26s
/ JS Unit Tests (pull_request) Successful in 58s
/ Lint (pull_request) Successful in 2m36s
/ Integration (pull_request) Successful in 3m15s
/ Test (pull_request) Successful in 3m21s
/ E2E Browser (pull_request) Successful in 3m38s
2026-06-29 00:51:15 +00:00
Compare

Recompute Match Score — kebab open screenshot (recompute-match-score-kebab-open)

recompute-match-score-kebab-open

**Recompute Match Score — kebab open screenshot** (recompute-match-score-kebab-open) ![recompute-match-score-kebab-open](/attachments/fefe7107-62f8-4f0d-8988-03161e47620d)

Workflow Detail page screenshot (wf-detail-older-execution)

Older completed ContinueAsNew epoch detail — execution ID and state visible, Cancel absent.

wf-detail-older-execution

**Workflow Detail page screenshot** (wf-detail-older-execution) Older completed ContinueAsNew epoch detail — execution ID and state visible, Cancel absent. ![wf-detail-older-execution](/attachments/27e86596-2ebd-4cf2-931d-25c0b41382ad)
zombor force-pushed bd-bookshelf-v1to from 774d666047
All checks were successful
/ E2E API (pull_request) Successful in 2m26s
/ JS Unit Tests (pull_request) Successful in 58s
/ Lint (pull_request) Successful in 2m36s
/ Integration (pull_request) Successful in 3m15s
/ Test (pull_request) Successful in 3m21s
/ E2E Browser (pull_request) Successful in 3m38s
to 7d65258a4a
All checks were successful
/ E2E API (pull_request) Successful in 2m2s
/ Lint (pull_request) Successful in 2m5s
/ JS Unit Tests (pull_request) Successful in 49s
/ Integration (pull_request) Successful in 2m58s
/ Test (pull_request) Successful in 3m8s
/ E2E Browser (pull_request) Successful in 3m8s
2026-06-29 01:07:04 +00:00
Compare

Workflow Detail page screenshot (wf-detail-older-execution)

Older completed ContinueAsNew epoch detail — execution ID and state visible, Cancel absent.

wf-detail-older-execution

**Workflow Detail page screenshot** (wf-detail-older-execution) Older completed ContinueAsNew epoch detail — execution ID and state visible, Cancel absent. ![wf-detail-older-execution](/attachments/f4cf1bca-3a5e-4f4c-8328-f1b625e45caa)

Recompute Match Score — kebab open screenshot (recompute-match-score-kebab-open)

recompute-match-score-kebab-open

**Recompute Match Score — kebab open screenshot** (recompute-match-score-kebab-open) ![recompute-match-score-kebab-open](/attachments/2c0179bf-2be3-4296-b3f4-22fba973d5da)
zombor merged commit 0c896a32f0 into main 2026-06-29 01:13:52 +00:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
zombor/pergamum!816
No description provided.