fix(books): comic-aware metadata_match_score so comics aren't over-scored (bookshelf-v1to)

zombor commented

2026-06-29 00:18:53 +00:00

Owner

Summary

Root cause: MatchScoreInput + CalculateMatchScore were ebook-only — they scored CBX books on ISBN, page_count, authors, categories, etc. A comic with title/series/description but zero comic_metadata scored ~81% (book 4595: Spectacular Spider-Girl #2).
Fix: CalculateMatchScore branches on IsComic. Comic books use a hardcoded comic weight table (104 total weight) that grades comic-specific fields; ebooks continue to use the existing user-configurable weight table unchanged.
Wiring: GetMetadataForMatchScoreWithComic replaces GetMetadataForMatchScore in app.go and the RecalcMatchScores workflow — it fetches comic completeness data in one SQL round-trip (12 correlated subqueries) and sets IsComic=true for CBX books.

Comic weight table (total = 104)

Field	Weight
Title	8
SeriesName	8
Description	6
Cover	8
HasComicMetadata (row exists)	5
IssueNumber	15
StoryArc	12
Imprint	8
VolumeName	8
Format	5
CoverDate	6
Creators (has 1+)	5
Characters (has 1+)	5
Teams (has 1+)	3
Locations (has 1+)	2

Book 4595 (Spectacular Spider-Girl #2) before: 80.77% | after: ~21% (22/104)

Deviation from Grimmory

Grimmory match_score is generic-book-only (no comic-aware branching). This is an intentional pergamum improvement; CBX scores will differ from Grimmory output.

Recompute note

The backfill migration key (backfill_metadata_match_score_v1) is intentionally unchanged — it was marked done on the live install. A manual trigger of POST /settings/match-weights/recalculate will re-score all books (including comics) with the new logic. The RecalcMatchScores workflow now uses GetMetadataForMatchScoreWithComic, so any future recalc run picks up comic scoring automatically.

Test plan

make test — 3 new CalculateMatchScore specs (full comic CBX > 80%, zero comic CBX < 25%, ebook > 70%), + GetComicMatchScoreData + GetMetadataForMatchScoreWithComic store tests (all error paths covered)
make coverage — zero uncovered statement blocks (100% gate)
go build ./... — compiles clean
golangci-lint run ./internal/books/... ./internal/app/... — 0 issues
CI green

Closes bead bookshelf-v1to on merge.

## Summary - **Root cause:** MatchScoreInput + CalculateMatchScore were ebook-only — they scored CBX books on ISBN, page_count, authors, categories, etc. A comic with title/series/description but zero comic_metadata scored ~81% (book 4595: Spectacular Spider-Girl #2). - **Fix:** CalculateMatchScore branches on `IsComic`. Comic books use a hardcoded comic weight table (104 total weight) that grades comic-specific fields; ebooks continue to use the existing user-configurable weight table unchanged. - **Wiring:** `GetMetadataForMatchScoreWithComic` replaces `GetMetadataForMatchScore` in app.go and the RecalcMatchScores workflow — it fetches comic completeness data in one SQL round-trip (12 correlated subqueries) and sets `IsComic=true` for CBX books. ## Comic weight table (total = 104) | Field | Weight | |-------|--------| | Title | 8 | | SeriesName | 8 | | Description | 6 | | Cover | 8 | | HasComicMetadata (row exists) | 5 | | IssueNumber | 15 | | StoryArc | 12 | | Imprint | 8 | | VolumeName | 8 | | Format | 5 | | CoverDate | 6 | | Creators (has 1+) | 5 | | Characters (has 1+) | 5 | | Teams (has 1+) | 3 | | Locations (has 1+) | 2 | **Book 4595 (Spectacular Spider-Girl #2) before:** 80.77% | **after:** ~21% (22/104) ## Deviation from Grimmory Grimmory match_score is generic-book-only (no comic-aware branching). This is an intentional pergamum improvement; CBX scores will differ from Grimmory output. ## Recompute note The backfill migration key (`backfill_metadata_match_score_v1`) is intentionally unchanged — it was marked done on the live install. A manual trigger of POST /settings/match-weights/recalculate will re-score all books (including comics) with the new logic. The `RecalcMatchScores` workflow now uses `GetMetadataForMatchScoreWithComic`, so any future recalc run picks up comic scoring automatically. ## Test plan - [x] `make test` — 3 new CalculateMatchScore specs (full comic CBX > 80%, zero comic CBX < 25%, ebook > 70%), + GetComicMatchScoreData + GetMetadataForMatchScoreWithComic store tests (all error paths covered) - [x] `make coverage` — zero uncovered statement blocks (100% gate) - [x] `go build ./...` — compiles clean - [x] `golangci-lint run ./internal/books/... ./internal/app/...` — 0 issues - [x] CI green Closes bead bookshelf-v1to on merge.

recompute-match-score-kebab-open.png

109 KiB

wf-detail-older-execution.png

56 KiB

wf-detail-older-execution.png

55 KiB

recompute-match-score-kebab-open.png

109 KiB

zombor added 1 commit

2026-06-29 00:18:53 +00:00

fix(books): comic-aware metadata_match_score so comics aren't over-scored (bookshelf-v1to)

/ JS Unit Tests (pull_request) Successful in 40s

Details

/ Hugo build (pull_request) Successful in 45s

Details

/ E2E API (pull_request) Successful in 2m11s

Details

/ Lint (pull_request) Successful in 2m12s

Details

/ Integration (pull_request) Successful in 2m56s

Details

/ Test (pull_request) Successful in 3m3s

Details

/ E2E Browser (pull_request) Successful in 3m22s

Details

d808b204f8

- Add comic fields to MatchScoreInput (IsComic, HasComicMetadata, IssueNumber,
  StoryArc, Imprint, VolumeName, Format, CoverDate, CreatorCount,
  CharacterCount, TeamCount, LocationCount)
- CalculateMatchScoreWeighted branches on IsComic: comic books use a hardcoded
  comic weight table (104 total); ebooks use the existing user-configurable
  weight table unchanged
- Add ComicMatchScoreRow + GetComicMatchScoreData (hand-written SQL, single
  round-trip with 12 correlated subqueries) + GetMetadataForMatchScoreWithComic
  that wraps the existing function and enriches it with comic data
- Comic weight table: title(8)+series(8)+desc(6)+cover(8)+hasComicMetadata(5)+
  issue(15)+storyArc(12)+imprint(8)+volumeName(8)+format(5)+coverDate(6)+
  creators(5)+characters(5)+teams(3)+locations(2) = 104
- CBX with only title/series/description (book 4595 scenario) scores 22/104
  = ~21% instead of the previous 80.77%
- CBX with full comic metadata scores 100%
- Ebook scoring is UNCHANGED (IsComic=false, existing weights apply)
- Wire GetMetadataForMatchScoreWithComic in app.go and build_extended_deps.go
  so both the startup backfill and the RecalcMatchScores workflow use it
- Backfill migration key is unchanged; next RecalcMatchScores run re-scores all
  books with the new logic

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

zombor commented

2026-06-29 00:31:35 +00:00

Author

Owner

[BLOCKER] internal/books/recompute_scores.go:20-28 and internal/books/match_score_comic_store.go:42-57 — N+1 query pattern at scale

The RecomputeScoresBatch function loops through bookIDs sequentially and calls getMatchInput() for each book individually:

for _, bookID := range bookIDs {
    input, err := getMatchInput(ctx, bookID)
    // getMatchInput calls GetBookMetadataForMatchScore (1 query/book)
    // + GetComicMatchScoreData (1 query/book with 12 correlated subqueries)
}

For the startup backfill of 250k books in 100-book batches: 2,500 batches × 2 queries/book = 500,000+ database round trips. This violates the hard rule "No N+1" in project-conventions.md.

Fix: Batch the comic-data fetches. Instead of GetComicMatchScoreData(conn.QueryContext) returning a per-book curried function, create GetComicMatchScoreDataBatch(conn, bookIDs []int64) that fetches all books in one query with WHERE book_id IN (...), returning a map[int64]ComicMatchScoreRow. Then RecomputeScoresBatch can fetch comic data once per 100-book batch instead of 100 times per batch.

[MAJOR] internal/app/build_extended_deps.go:73-80 and bead comment — Stale stored scores until manual recompute

The bead says "Backfill migration key is unchanged; next RecalcMatchScores run re-scores all." This means the startup backfill will NOT re-run (it only runs once per migration-key). Comics currently scored at ~80% will keep that old score until the user manually triggers a RecalcMatchScores workflow.

Is this intentional? If yes, the bead and PR should document this — existing users will see inconsistent scores (stored: old 80%, detail-page-computed: new 21%) until they manually kick off a recompute. If unintentional, bump the backfill migration key so it re-runs and updates all scores at startup.

[MINOR] internal/books/match_score.go:111-115 — Document weight-total rationale

The comment "total=104" lacks rationale. Add a note: "chosen so a comic with only generic fields (title + series + description, but no comic_metadata row) scores below 25%, matching the book-4595 test case."

REVIEW VERDICT: 1 blocker, 1 major, 1 minor

[BLOCKER] internal/books/recompute_scores.go:20-28 and internal/books/match_score_comic_store.go:42-57 — N+1 query pattern at scale The RecomputeScoresBatch function loops through bookIDs sequentially and calls getMatchInput() for each book individually: ```go for _, bookID := range bookIDs { input, err := getMatchInput(ctx, bookID) // getMatchInput calls GetBookMetadataForMatchScore (1 query/book) // + GetComicMatchScoreData (1 query/book with 12 correlated subqueries) } ``` For the startup backfill of 250k books in 100-book batches: 2,500 batches × 2 queries/book = 500,000+ database round trips. This violates the hard rule "No N+1" in project-conventions.md. **Fix:** Batch the comic-data fetches. Instead of GetComicMatchScoreData(conn.QueryContext) returning a per-book curried function, create GetComicMatchScoreDataBatch(conn, bookIDs []int64) that fetches all books in one query with `WHERE book_id IN (...)`, returning a map[int64]ComicMatchScoreRow. Then RecomputeScoresBatch can fetch comic data once per 100-book batch instead of 100 times per batch. --- [MAJOR] internal/app/build_extended_deps.go:73-80 and bead comment — Stale stored scores until manual recompute The bead says "Backfill migration key is unchanged; next RecalcMatchScores run re-scores all." This means the startup backfill will NOT re-run (it only runs once per migration-key). Comics currently scored at ~80% will keep that old score until the user manually triggers a RecalcMatchScores workflow. Is this intentional? If yes, the bead and PR should document this — existing users will see inconsistent scores (stored: old 80%, detail-page-computed: new 21%) until they manually kick off a recompute. If unintentional, bump the backfill migration key so it re-runs and updates all scores at startup. --- [MINOR] internal/books/match_score.go:111-115 — Document weight-total rationale The comment "total=104" lacks rationale. Add a note: "chosen so a comic with only generic fields (title + series + description, but no comic_metadata row) scores below 25%, matching the book-4595 test case." --- REVIEW VERDICT: 1 blocker, 1 major, 1 minor

zombor referenced this pull request from a commit

2026-06-29 00:51:15 +00:00

fix(books): batch match-score N+1 + re-trigger backfill for comic-aware scores (bookshelf-v1to)

zombor force-pushed bd-bookshelf-v1to from d808b204f8

/ JS Unit Tests (pull_request) Successful in 40s

Details

/ Hugo build (pull_request) Successful in 45s

Details

/ E2E API (pull_request) Successful in 2m11s

Details

/ Lint (pull_request) Successful in 2m12s

Details

/ Integration (pull_request) Successful in 2m56s

Details

/ Test (pull_request) Successful in 3m3s

Details

/ E2E Browser (pull_request) Successful in 3m22s

Details

to 774d666047

/ E2E API (pull_request) Successful in 2m26s

Details

/ JS Unit Tests (pull_request) Successful in 58s

Details

/ Lint (pull_request) Successful in 2m36s

Details

/ Integration (pull_request) Successful in 3m15s

Details

/ Test (pull_request) Successful in 3m21s

Details

/ E2E Browser (pull_request) Successful in 3m38s

Details

2026-06-29 00:51:15 +00:00

Compare

forgejo-actions commented

2026-06-29 00:53:34 +00:00

Recompute Match Score — kebab open screenshot (recompute-match-score-kebab-open)

**Recompute Match Score — kebab open screenshot** (recompute-match-score-kebab-open) ![recompute-match-score-kebab-open](/attachments/fefe7107-62f8-4f0d-8988-03161e47620d)

forgejo-actions commented

2026-06-29 00:54:19 +00:00

Workflow Detail page screenshot (wf-detail-older-execution)

Older completed ContinueAsNew epoch detail — execution ID and state visible, Cancel absent.

**Workflow Detail page screenshot** (wf-detail-older-execution) Older completed ContinueAsNew epoch detail — execution ID and state visible, Cancel absent. ![wf-detail-older-execution](/attachments/27e86596-2ebd-4cf2-931d-25c0b41382ad)

zombor referenced this pull request from a commit

2026-06-29 01:07:04 +00:00

fix(books): batch match-score N+1 + re-trigger backfill for comic-aware scores (bookshelf-v1to)

zombor force-pushed bd-bookshelf-v1to from 774d666047

/ E2E API (pull_request) Successful in 2m26s

Details

/ JS Unit Tests (pull_request) Successful in 58s

Details

/ Lint (pull_request) Successful in 2m36s

Details

/ Integration (pull_request) Successful in 3m15s

Details

/ Test (pull_request) Successful in 3m21s

Details

/ E2E Browser (pull_request) Successful in 3m38s

Details

to 7d65258a4a

/ E2E API (pull_request) Successful in 2m2s

Details

/ Lint (pull_request) Successful in 2m5s

Details

/ JS Unit Tests (pull_request) Successful in 49s

Details

/ Integration (pull_request) Successful in 2m58s

Details

/ Test (pull_request) Successful in 3m8s

Details

/ E2E Browser (pull_request) Successful in 3m8s

Details

2026-06-29 01:07:04 +00:00

Compare

forgejo-actions commented

2026-06-29 01:09:35 +00:00

Workflow Detail page screenshot (wf-detail-older-execution)

Older completed ContinueAsNew epoch detail — execution ID and state visible, Cancel absent.

**Workflow Detail page screenshot** (wf-detail-older-execution) Older completed ContinueAsNew epoch detail — execution ID and state visible, Cancel absent. ![wf-detail-older-execution](/attachments/f4cf1bca-3a5e-4f4c-8328-f1b625e45caa)

forgejo-actions commented

2026-06-29 01:09:38 +00:00

Recompute Match Score — kebab open screenshot (recompute-match-score-kebab-open)

**Recompute Match Score — kebab open screenshot** (recompute-match-score-kebab-open) ![recompute-match-score-kebab-open](/attachments/2c0179bf-2be3-4296-b3f4-22fba973d5da)

zombor merged commit 0c896a32f0 into main

2026-06-29 01:13:52 +00:00

zombor referenced this pull request from a commit

2026-06-29 01:13:54 +00:00

Merge pull request 'fix(books): comic-aware metadata_match_score so comics aren't over-scored (bookshelf-v1to)' (#816) from bd-bookshelf-v1to into main

Rows
Columns

fix(books): comic-aware metadata_match_score so comics aren't over-scored (bookshelf-v1to) #816

Summary

Comic weight table (total = 104)

Deviation from Grimmory

Recompute note

Test plan