fix(llm): restore comic verify to comic-processor parity — drop confidence, lenient yes/no (bookshelf-h2af) #713

Merged
zombor merged 1 commit from bd-bookshelf-h2af into main 2026-06-23 00:31:16 +00:00
Owner

Summary

Restores the LLM comic-verify step to full parity with the reference algorithm in comic-processor/main.go. Book 4315 ("The Avengers vs. Atlas #4") was silently rejected: the vision model replied with a verbose markdown analysis ending ✅ Yes, the two comic book covers are identical. All key elements match perfectly. but pergamum parsed Match=false due to the over-engineered VERDICT-format parser.

Changes

  1. Prompts (internal/metadata/llm/llm.go): replaced the strict "same ISSUE / VERDICT: YES HIGH/MEDIUM/LOW/NO" prompt with the reference's lenient wording — same characters in roughly the same scene; colour differences are fine; answer only yes or no.

  2. VerifyResult struct: removed the Confidence field (Match + Raw only). Confidence was part of the broken design.

  3. parseVerifyResponse: replaced VERDICT machinery with robust yes/no parser — strip <think> blocks, accept affirmative conclusions anywhere in verbose responses (yes, identical, same cover, etc.), guarded by negation phrases (not identical, not the same, no match, etc.).

  4. Accept gate (bulk_llm_workflow.go): changed Match && (Confidence == "high" || Confidence == "medium") to simply Match — first yes wins.

  5. Original-resolution fetch (llm_workflow.go): Verify activity now tries /original/ cover URL first, falls back to /scale_medium/ on failure (mirrors comic-processor:582-587).

  6. Debug visibility: per-candidate logging (index, title, has_cover) + per-verify-verdict logging (Match, raw) in structured logs and richer audit-log detail on no-match.

  7. Updated all affected tests to remove Confidence field references and match new behaviour.

Regression test

verify_test.go includes the exact book 4315 response text as a test input asserting Match=true.

Test plan

  • go build ./... — clean
  • go test ./internal/metadata/llm/... ./internal/wfengine/... — 100% pass
  • make test — all 961 specs pass
  • make coverage — 100% on both changed packages
  • golangci-lint run on changed packages — 0 issues

Closes bead bookshelf-h2af on merge.

## Summary Restores the LLM comic-verify step to full parity with the reference algorithm in `comic-processor/main.go`. Book 4315 ("The Avengers vs. Atlas #4") was silently rejected: the vision model replied with a verbose markdown analysis ending `✅ Yes, the two comic book covers are identical. All key elements match perfectly.` but pergamum parsed `Match=false` due to the over-engineered VERDICT-format parser. ### Changes 1. **Prompts** (`internal/metadata/llm/llm.go`): replaced the strict "same ISSUE / VERDICT: YES HIGH/MEDIUM/LOW/NO" prompt with the reference's lenient wording — same characters in roughly the same scene; colour differences are fine; answer only yes or no. 2. **`VerifyResult` struct**: removed the `Confidence` field (Match + Raw only). Confidence was part of the broken design. 3. **`parseVerifyResponse`**: replaced VERDICT machinery with robust yes/no parser — strip `<think>` blocks, accept affirmative conclusions anywhere in verbose responses (yes, identical, same cover, etc.), guarded by negation phrases (not identical, not the same, no match, etc.). 4. **Accept gate** (`bulk_llm_workflow.go`): changed `Match && (Confidence == "high" || Confidence == "medium")` to simply `Match` — first yes wins. 5. **Original-resolution fetch** (`llm_workflow.go`): Verify activity now tries `/original/` cover URL first, falls back to `/scale_medium/` on failure (mirrors `comic-processor:582-587`). 6. **Debug visibility**: per-candidate logging (index, title, has_cover) + per-verify-verdict logging (Match, raw) in structured logs and richer audit-log detail on no-match. 7. **Updated all affected tests** to remove `Confidence` field references and match new behaviour. ### Regression test `verify_test.go` includes the exact book 4315 response text as a test input asserting `Match=true`. ### Test plan - [x] `go build ./...` — clean - [x] `go test ./internal/metadata/llm/... ./internal/wfengine/...` — 100% pass - [x] `make test` — all 961 specs pass - [x] `make coverage` — 100% on both changed packages - [x] `golangci-lint run` on changed packages — 0 issues Closes bead bookshelf-h2af on merge.
fix(llm): restore comic verify to comic-processor parity — drop confidence, lenient yes/no (bookshelf-h2af)
All checks were successful
/ JS Unit Tests (pull_request) Successful in 27s
/ Lint (pull_request) Successful in 2m26s
/ E2E API (pull_request) Successful in 2m26s
/ E2E Browser (pull_request) Successful in 3m26s
/ Integration (pull_request) Successful in 3m28s
/ Test (pull_request) Successful in 4m4s
f4b94ba895
- Replace VERDICT-format verify prompt with reference lenient wording:
  same characters/scene/arrangement; colour differences are fine; answer only yes or no
- Remove Confidence field from VerifyResult (Match + Raw only)
- Rewrite parseVerifyResponse: strip <think> blocks, accept verbose affirmative
  conclusions (not just leading "yes"), guarded by negation phrases
- Accept gate in LLMSweepBookWorkflow: first yes wins (was: high||medium confidence)
- Verify activity: try /original/ cover URL first, fall back to /scale_medium/ (mirrors comic-processor:582-587)
- Add per-candidate debug logging in LLMSweepBookWorkflow + richer audit log detail
- Regression test: book 4315 "The Avengers vs. Atlas #4" verbose response now parses Match=true

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
zombor force-pushed bd-bookshelf-h2af from f4b94ba895
All checks were successful
/ JS Unit Tests (pull_request) Successful in 27s
/ Lint (pull_request) Successful in 2m26s
/ E2E API (pull_request) Successful in 2m26s
/ E2E Browser (pull_request) Successful in 3m26s
/ Integration (pull_request) Successful in 3m28s
/ Test (pull_request) Successful in 4m4s
to d4d88929b2
All checks were successful
/ JS Unit Tests (pull_request) Successful in 2m58s
/ Lint (pull_request) Successful in 3m36s
/ E2E API (pull_request) Successful in 3m54s
/ Test (pull_request) Successful in 4m22s
/ Integration (pull_request) Successful in 4m35s
/ E2E Browser (pull_request) Successful in 5m6s
2026-06-23 00:26:03 +00:00
Compare
zombor merged commit 1a6c7fb79d into main 2026-06-23 00:31:16 +00:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
zombor/pergamum!713
No description provided.