zombor/pergamum

Fork 0

fix(llm): restore comic verify to comic-processor parity — drop confidence, lenient yes/no (bookshelf-h2af) #713

Merged

zombor merged 1 commit from bd-bookshelf-h2af into main

2026-06-23 00:31:16 +00:00

zombor commented

2026-06-22 21:50:03 +00:00

Owner

Summary

Restores the LLM comic-verify step to full parity with the reference algorithm in comic-processor/main.go. Book 4315 ("The Avengers vs. Atlas #4") was silently rejected: the vision model replied with a verbose markdown analysis ending ✅ Yes, the two comic book covers are identical. All key elements match perfectly. but pergamum parsed Match=false due to the over-engineered VERDICT-format parser.

Changes

Prompts (internal/metadata/llm/llm.go): replaced the strict "same ISSUE / VERDICT: YES HIGH/MEDIUM/LOW/NO" prompt with the reference's lenient wording — same characters in roughly the same scene; colour differences are fine; answer only yes or no.
VerifyResult struct: removed the Confidence field (Match + Raw only). Confidence was part of the broken design.
parseVerifyResponse: replaced VERDICT machinery with robust yes/no parser — strip <think> blocks, accept affirmative conclusions anywhere in verbose responses (yes, identical, same cover, etc.), guarded by negation phrases (not identical, not the same, no match, etc.).
Accept gate (bulk_llm_workflow.go): changed Match && (Confidence == "high" || Confidence == "medium") to simply Match — first yes wins.
Original-resolution fetch (llm_workflow.go): Verify activity now tries /original/ cover URL first, falls back to /scale_medium/ on failure (mirrors comic-processor:582-587).
Debug visibility: per-candidate logging (index, title, has_cover) + per-verify-verdict logging (Match, raw) in structured logs and richer audit-log detail on no-match.
Updated all affected tests to remove Confidence field references and match new behaviour.

Regression test

verify_test.go includes the exact book 4315 response text as a test input asserting Match=true.

Test plan

go build ./... — clean
go test ./internal/metadata/llm/... ./internal/wfengine/... — 100% pass
make test — all 961 specs pass
make coverage — 100% on both changed packages
golangci-lint run on changed packages — 0 issues

Closes bead bookshelf-h2af on merge.

## Summary Restores the LLM comic-verify step to full parity with the reference algorithm in `comic-processor/main.go`. Book 4315 ("The Avengers vs. Atlas #4") was silently rejected: the vision model replied with a verbose markdown analysis ending `✅ Yes, the two comic book covers are identical. All key elements match perfectly.` but pergamum parsed `Match=false` due to the over-engineered VERDICT-format parser. ### Changes 1. **Prompts** (`internal/metadata/llm/llm.go`): replaced the strict "same ISSUE / VERDICT: YES HIGH/MEDIUM/LOW/NO" prompt with the reference's lenient wording — same characters in roughly the same scene; colour differences are fine; answer only yes or no. 2. **`VerifyResult` struct**: removed the `Confidence` field (Match + Raw only). Confidence was part of the broken design. 3. **`parseVerifyResponse`**: replaced VERDICT machinery with robust yes/no parser — strip `<think>` blocks, accept affirmative conclusions anywhere in verbose responses (yes, identical, same cover, etc.), guarded by negation phrases (not identical, not the same, no match, etc.). 4. **Accept gate** (`bulk_llm_workflow.go`): changed `Match && (Confidence == "high" || Confidence == "medium")` to simply `Match` — first yes wins. 5. **Original-resolution fetch** (`llm_workflow.go`): Verify activity now tries `/original/` cover URL first, falls back to `/scale_medium/` on failure (mirrors `comic-processor:582-587`). 6. **Debug visibility**: per-candidate logging (index, title, has_cover) + per-verify-verdict logging (Match, raw) in structured logs and richer audit-log detail on no-match. 7. **Updated all affected tests** to remove `Confidence` field references and match new behaviour. ### Regression test `verify_test.go` includes the exact book 4315 response text as a test input asserting `Match=true`. ### Test plan - [x] `go build ./...` — clean - [x] `go test ./internal/metadata/llm/... ./internal/wfengine/...` — 100% pass - [x] `make test` — all 961 specs pass - [x] `make coverage` — 100% on both changed packages - [x] `golangci-lint run` on changed packages — 0 issues Closes bead bookshelf-h2af on merge.

zombor added 1 commit

2026-06-22 21:50:03 +00:00

fix(llm): restore comic verify to comic-processor parity — drop confidence, lenient yes/no (bookshelf-h2af)

/ JS Unit Tests (pull_request) Successful in 27s

Details

/ Lint (pull_request) Successful in 2m26s

Details

/ E2E API (pull_request) Successful in 2m26s

Details

/ E2E Browser (pull_request) Successful in 3m26s

Details

/ Integration (pull_request) Successful in 3m28s

Details

/ Test (pull_request) Successful in 4m4s

Details

f4b94ba895

- Replace VERDICT-format verify prompt with reference lenient wording:
  same characters/scene/arrangement; colour differences are fine; answer only yes or no
- Remove Confidence field from VerifyResult (Match + Raw only)
- Rewrite parseVerifyResponse: strip <think> blocks, accept verbose affirmative
  conclusions (not just leading "yes"), guarded by negation phrases
- Accept gate in LLMSweepBookWorkflow: first yes wins (was: high||medium confidence)
- Verify activity: try /original/ cover URL first, fall back to /scale_medium/ (mirrors comic-processor:582-587)
- Add per-candidate debug logging in LLMSweepBookWorkflow + richer audit log detail
- Regression test: book 4315 "The Avengers vs. Atlas #4" verbose response now parses Match=true

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

zombor force-pushed bd-bookshelf-h2af from f4b94ba895

/ JS Unit Tests (pull_request) Successful in 27s

Details

/ Lint (pull_request) Successful in 2m26s

Details

/ E2E API (pull_request) Successful in 2m26s

Details

/ E2E Browser (pull_request) Successful in 3m26s

Details

/ Integration (pull_request) Successful in 3m28s

Details

/ Test (pull_request) Successful in 4m4s

Details

to d4d88929b2

/ JS Unit Tests (pull_request) Successful in 2m58s

Details

/ Lint (pull_request) Successful in 3m36s

Details

/ E2E API (pull_request) Successful in 3m54s

Details

/ Test (pull_request) Successful in 4m22s

Details

/ Integration (pull_request) Successful in 4m35s

Details

/ E2E Browser (pull_request) Successful in 5m6s

Details

2026-06-23 00:26:03 +00:00

Compare

zombor merged commit 1a6c7fb79d into main

2026-06-23 00:31:16 +00:00

zombor referenced this pull request from a commit

2026-06-23 00:31:16 +00:00

Merge pull request 'fix(llm): restore comic verify to comic-processor parity — drop confidence, lenient yes/no (bookshelf-h2af)' (#713) from bd-bookshelf-h2af into main

No reviewers

No labels

No milestone

No project

No assignees

1 participant

Notifications

Due date

The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference

zombor/pergamum!713

No description provided.

Rows
Columns