feat(dedup): v4 — keep-strategy dropdown + paginated UI + pre-select keeper (bookshelf-c15a.4) #578
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "bd-bookshelf-c15a.4"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
SortGroupByStrategysorts each group and returnssuggestedKeepBookID; handler validates and acceptskeep_strategyfrom JSON body or query param<select>in modal (re-scans on change); cursor-based Prev/Next pagination with page indicator; pre-select suggested keeper per group via JS.checkedproperty afterreplaceChildren; radios use uniquename="target_book_id_N"per group so they're independent radio groupsCleanFilenameComparedirect tests + handler/service coverage; go-rod browser e2e tests for strategy change, pre-selection, pagination (Next/Prev), screenshots posted to PRKey fixes discovered
DOMParsermoves<meta>elements to<head>(invisible toresultsTarget.querySelector); fixed by storing cursor indata-next-cursoron the.dedup-paginationdiv insteadname="target_book_id"making them a single browser radio group; setting.checkedon one group's radio unchecked others; fixed by uniquename="target_book_id_N"per groupTest plan
make test— unit tests passmake coverage— 100% coverage gate passesmake e2e— 242 browser tests pass (21 dedup tests: 9 new v4 + 12 existing)Closes bead bookshelf-c15a.4 on merge.
Authenticated screenshot — Find Duplicates: keep-strategy dropdown + pagination + pre-selected keeper.
Security Review — bookshelf-c15a.4 (PR #578)
Reviewer: security-review bot
Head SHA:
26790efdScope: keepStrategy enum validation, SQL injection, pagination cursor safety, multi-user scoping, resource/DoS, XSS.
Threat-model findings
keepStrategy enum validation
parseScanRequest(handler.go) validateskeep_strategyagainst theValidKeepStrategyswitch on both the JSON body path and the query-param path before it ever reaches the service layer. Unknown values return400 ErrValidation. Tests cover both paths (returns 400 for invalid keep_strategy in JSON body,returns 400 for invalid keep_strategy in query param). ✓SQL injection via keepStrategy / sort
SortGroupByStrategyand all*Lesshelpers are pure Go comparators on already-fetchedDupBookstructs. No keepStrategy value is ever interpolated into SQL. Themost_complete_metadatastrategy readsbook.metadata_match_score— an existing Grimmory V29 column, selected with a parameterized?in the query and scanned intosql.NullFloat64. ✓Cursor decode safety
DecodeCursor(service.go) base64-URL-decodes then JSON-unmarshals. Both failure modes returnfmt.Errorf("… %w: %w", middleware.ErrValidation, err), which the middleware translates to 400. No panic path. Cursor values (afterKey,afterID) are passed as bound parameters (?) in the keyset WHERE clauses — no interpolation. A crafted cursor can only skip past group keys / book IDs; all queries additionally filter bylibrary_id IN (?)from the server-resolveduserLibraryIDs, so a forged cursor cannot enumerate another user's books. ✓Multi-user / library scoping
Handler calls
getUserLibraryIDs(ctx, userID)→filterToLibrary(userLibraryIDs, libraryID)→ returnsErrNotFoundon empty scope. All five finder functions (FindDuplicatesByISBN,FindDuplicatesByExternalID,FindDuplicatesByTitleAuthor,FindDuplicatesByDirectory,FindDuplicatesByFilename) guard onlen(userLibraryIDs) == 0and embedlibrary_id IN (?)in both the inner group subquery and the outer WHERE. Addingmetadata_match_scoreto the SELECT list does not change scope. ✓Pagination boundedness
clampLimit(service.go:77) enforces[1, 200]before any query. The JS_cursorStackgrows only as pages are navigated (no server-side session state); cursor values originate from server responses. No unbounded in-memory accumulation. ✓XSS
data-next-cursorattribute:NextCursoris a*stringcontaining base64url characters only ([A-Za-z0-9\-_=]);html/templateauto-escapes it in attribute context. ✓data-suggested-keeper="true": static literal, conditional onint64comparison. ✓FileSubPathin<code>: auto-escaped byhtml/template. ✓MatchTypefallthrough:{{else}}{{$g.MatchType}}{{end}}— auto-escaped. ✓_updatePageIndicatorwrites"Page " + pagewherepageis an integer fromthis._currentPage—textContent, notinnerHTML. ✓_preselectKeepersonly sets.checked = trueon existing radio elements — no DOM injection. ✓No new Grimmory columns
metadata_match_scoreonbookis an existing Grimmory V29 column (confirmed by migration 0038 comment). No new column added. ✓REVIEW VERDICT: 0 blocker, 0 major, 0 minor
Security Re-Review — Pass 3 (SHA
60ab9641, PR #567, bead bookshelf-c15a.2)Prior findings — resolution status
[MAJOR] Cross-criterion cursor namespace collision — RESOLVED.
The new
ScanCursor.Positions map[string]CriterionPosgives each active criterion its own(afterKey, afterID). The handler readsuserLibraryIDsand buildsscopedIDsbefore the cursor is decoded, so the cursor has no influence on the library scope. Unknown keys inPositions(attacker-injected) are silently ignored via map-lookup — they cannot widen scope or cause a panic. TheafterKey/afterIDvalues are passed only as SQL parameters in a keyset WHERE clause; the library-ID filter (WHERE b.library_id IN (?)) is applied independently in every finder. Cross-user read via cursor is not possible.[MINOR] No
http.MaxBytesReader— RESOLVED.http.MaxBytesReader(w, r.Body, 4096)is applied inScanHandlerbeforeparseScanRequestis called (handler.go:128-130). The error-mapper handles*http.MaxBytesErrorviaerrors.As(checked beforeerrors.Is(ErrValidation)in the switch), so an oversized body returns 413 not 400. The handler unit test athandler_test.go:339-344confirms 413 is returned.Fresh threat-model
SQL injection — all five finder queries (
FindDuplicatesByISBN,FindDuplicatesByExternalID,FindDuplicatesByTitleAuthor,FindDuplicatesByDirectory,FindDuplicatesByFilename) pass every user-influenced value (cursorafterKey/afterID,limit+1) as positional?parameters. Library IDs are server-computed from the session and passed the same way. The LIKE patterns use%%/%%(Gofmt.Sprintfescaping to%/%) which are string literals, not user-controlled. No injection surface found.ExternalID UNION ALL arg binding — manually verified: HC branch gets
libIn, afterKey, afterKey, afterID, limit+1, libIn; GR branch gets the same pattern; total args =4N + 8matching the four%sexpansions + eight cursor/limit?s in the SQL. Both branches are independently scoped byuserLibraryIDs.Resource/DoS — request body capped at 4096 bytes via
MaxBytesReader. The cursor query param is bounded by Go's defaultMaxHeaderBytes(1 MB) and is a server-generated opaque blob with no unbounded sub-structure. Each finder's inner subquery limits tolimit+1groups (clamped to max 200 byclampLimit); the ExternalID combined outer query has no second LIMIT but is bounded by 2x(limit+1) groups, which is documented and handled correctly by the service-layer seen-map. No unbounded scan path found.Cursor decode safety —
DecodeCursorreturns a typed error (wrappingErrValidation) on invalid base64 or malformed JSON; no panic path. ThePositionsmap is deserialized into a Go struct with typed fields; oversized maps are bounded by the cursor string length, itself bounded by HTTP headers.Multi-user fail-closed — every finder has
if len(userLibraryIDs) == 0 { return nil, nil }as the first guard.filterToLibraryrejects requests for a library not in the user's accessible set (returns 404).userLibraryIDsis derived from the authenticated session, never from the request body or cursor.keptKeyspagination correctness — the seen2 loop correctly breaks onlen(seen2) > limit, excluding the peek group fromkeptKeys. ThelastPosupdate only iterateslabeledRowsinkeptKeys, so the cursor points to the last book of the last returned group, not the discarded peek group.New findings
No new blockers or majors. One minor style note:
[MINOR] internal/dedup/handler.go:53 — double-wrapped error format
%w: %wwith ErrValidation firstfmt.Errorf("dedup scan: decode body: %w: %w", middleware.ErrValidation, err)is an unusual multi-error pattern. ThestatusForswitch checkserrors.As(err, &maxBytesErr)beforeerrors.Is(ErrValidation), so the 413 path wins correctly and the test confirms it. The double-colon separator is cosmetic and non-standard but causes no correctness or security issue. The same pattern exists in the merge handler, suggesting it is an established codebase convention. No fix required.REVIEW VERDICT: 0 blocker, 0 major, 1 minor
Code Review — bookshelf-c15a.4 (Find Duplicates v4)
Phase 0: DEMO verification
No interactive DEMO block (implementation-complete bead, CI green). Proceeding to spec/code review.
Phase 1 — Spec compliance
All required items are present:
book IDas the final tiebreaker — deterministic total order confirmed.'- 201002 Cable 021.cbz'vs'201002 Cable 021.cbz'—leadingNonAlnumcounts 2 vs 0, keeps the clean one. Spec case covered inkeep_strategy_test.go.SortGroupByStrategyis an exported pure function — reusable by v5/v6.SuggestedKeepBookIDreturned;Books[0]is the suggested keeper after sort.NextCursorfrom server, client cursor STACK for Prev, page indicator, disable states correct on page 1 / last page.data-suggested-keeper="true"+_preselectKeepers()called afterreplaceChildren.MetadataMatchScorereads from existingbook.metadata_match_score, no migration).style=attributes.Phase 2 — Code quality
[MINOR] internal/dedup/service_test.go:835 — multiple Expect calls inside one It block
The new "MetadataMatchScore.Valid = true populates DupBook.MetadataMatchScore" It block contains four Expect calls (
HaveLen,Books[0].ID,MetadataMatchScorenon-nil, float value). Project convention is exactly one Expect per It (with error folded into the value assertion). Suggested fix: split into three It blocks sharing a JustBeforeEach, so a failure pinpoints which property broke.[MINOR] static/js/controllers/find_duplicates_controller.js:218 — keep_strategy sent in URL query param AND JSON body
_buildScanUrlappendskeep_strategy=...to the query string when strategy !=preferred_format._doScanalso setscriteria.keep_strategyin the JSON body (always). The handler's JSON path reads from the body and ignores the URL param — the URL param is dead weight for every browser request. Additionally,preferred_formatis silently omitted from the URL but always present in the body, creating an asymmetry. Suggested fix: removekeep_strategyfrom_buildScanUrlentirely; the JSON body is the only path used by the browser.[MINOR] templates/pages/dedup_results.html:63 — inert pagination bar visible before page indicator is populated
When groups are returned, the pagination bar always renders. On the initial scan, the page indicator span is empty until
_updatePageIndicator()fires (JS, synchronous afterreplaceChildren). In practice the indicator fills in immediately, but briefly shows← Prev [blank] Next →during the replace. Low priority; disable states are correct throughout.REVIEW VERDICT: 0 blocker, 0 major, 3 minor
Code Review — bookshelf-c15a.4 (Find Duplicates v4)
Phase 0: DEMO verification
No interactive DEMO block (implementation-complete bead, CI green). Proceeding to spec/code review.
Phase 1 — Spec compliance
All required items are present:
book IDas the final tiebreaker — deterministic total order confirmed.'- 201002 Cable 021.cbz'vs'201002 Cable 021.cbz'—leadingNonAlnumcounts 2 vs 0, keeps the clean one. Spec case covered inkeep_strategy_test.go.SortGroupByStrategyis an exported pure function — reusable by v5/v6.SuggestedKeepBookIDreturned;Books[0]is the suggested keeper after sort.NextCursorfrom server, client cursor STACK for Prev, page indicator, disable states correct on page 1 / last page.data-suggested-keeper="true"+_preselectKeepers()called afterreplaceChildren.MetadataMatchScorereads from existingbook.metadata_match_score, no migration).style=attributes.Phase 2 — Code quality
[MINOR] internal/dedup/service_test.go:835 — multiple Expect calls inside one It block
The new "MetadataMatchScore.Valid = true populates DupBook.MetadataMatchScore" It block contains four Expect calls (
HaveLen,Books[0].ID,MetadataMatchScorenon-nil, float value). Project convention is exactly one Expect per It (with error folded into the value assertion). Suggested fix: split into three It blocks sharing a JustBeforeEach, so a failure pinpoints which property broke.[MINOR] static/js/controllers/find_duplicates_controller.js:218 — keep_strategy sent in URL query param AND JSON body
_buildScanUrlappendskeep_strategy=...to the query string when strategy !=preferred_format._doScanalso setscriteria.keep_strategyin the JSON body (always). The handler's JSON path reads from the body and ignores the URL param — the URL param is dead weight for every browser request. Additionally,preferred_formatis silently omitted from the URL but always present in the body, creating an asymmetry. Suggested fix: removekeep_strategyfrom_buildScanUrlentirely; the JSON body is the only path used by the browser.[MINOR] templates/pages/dedup_results.html:63 — inert pagination bar visible before page indicator is populated
When groups are returned, the pagination bar always renders. On the initial scan, the page indicator span is empty until
_updatePageIndicator()fires (JS, synchronous afterreplaceChildren). In practice the indicator fills in immediately, but briefly shows← Prev [blank] Next →during the replace. Low priority; disable states are correct throughout.REVIEW VERDICT: 0 blocker, 0 major, 3 minor
26790efd467f8ba1d060