feat(bookdrop): Extract Pattern mode — parse metadata from filenames (bookshelf-d1x3.1) #579
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "bd-bookshelf-d1x3.1"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
{Title},{Authors},{SeriesName},{Published:yyyy},{ISBN13},{ASIN}, etc. placeholder patterns + filenames → extracted metadata fieldsGetBookdropFilesByIDs(batch IN),UpdateBookdropFileOriginalMetadataPOST /bookdrop/extract-pattern/previewandPOST /bookdrop/extract-pattern/applybookdrop_extract_pattern_controller.jswith preset chips, live preview, and apply action{{if $.Libraries}}block so it works without a library configured)bookdrop-reviewcontroller now broadcastsbookdrop:selectionchangecustom event so the pattern controller knows which files are selectedTest plan
Closes bead bookshelf-d1x3.1 on merge.
- Pure Go pattern parser: converts {Title}/{Authors}/{SeriesName}/{Published:yyyy}/etc placeholder patterns + filenames → extracted PatternResult metadata fields - Two new SQL queries: GetBookdropFilesByIDs (batch), UpdateBookdropFileOriginalMetadata - REST endpoints: POST /bookdrop/extract-pattern/preview and /apply - Stimulus controller bookdrop_extract_pattern_controller.js with: - Preset pattern chips ({Authors} - {Title}, series, publisher presets) - Live preview via POST /bookdrop/extract-pattern/preview - Apply button updates original_metadata via POST /bookdrop/extract-pattern/apply - Selection tracking via bookdrop:selectionchange custom event - Extract Pattern panel in bookdrop_index.html (available without libraries) - Pattern parser unit tests: all placeholder types, date formats (yyyy/yy/MM/dd/d/M), unknown placeholders, ASIN, ISBN10/13, wildcards, edge cases - Handler tests (httptest), API e2e tests, browser e2e test with screenshot upload - bookdrop-review controller broadcasts bookdrop:selectionchange on selection change - 100% coverage gate passes Closes bead bookshelf-d1x3.1 on merge.Authenticated screenshot — BookDrop Extract Pattern: pattern editor + live preview of parsed title/author/series.
CODE REVIEW: NOT APPROVED
Phase 0: DEMO Verification
No explicit DEMO block was provided in the bead comments. The completion comment states "all e2e green" but provides no re-runnable shell command. CI is confirmed green (SHA
50384ce5). Treating as PARTIAL with valid reason (CI passes the e2e suite that covers the feature end-to-end) and proceeding to Phase 1.Phase 1: Spec Compliance — PASS
All nine presets from the spec are wired in the template (four chips shown, five more addressable via the pattern input — the spec says "common-pattern preset chips" and the four most common are present). Pattern parser, preview, apply, Stimulus controller, browser test, and screenshots are all delivered. No new Grimmory columns. No inline
style=attributes.Phase 2: Code Quality
[BLOCKER] internal/bookdrop/pattern.go:141 —
regexp.MustCompilecalled on user-derived string will panic on crafted inputParsePatternbuildsregexBufby appendingbuildDateRegex(format)whereformatis the raw user-supplied format string from{Published:FORMAT}. WhilebuildDateRegexcharacter-encodes most chars viaregexp.QuoteMeta, the function wraps the result in a bare capture group(...). A user who sends a pattern like{Published:(}suppliesformat = "(".buildDateRegex("(")iterates the string:(matches no prefix (yyyy/yy/MM/dd), falls through to the default branch which callsregexp.QuoteMeta("(")→\(, giving a capture group body of(\(). That is valid.However:
{Published:(?→format = "(?";buildDateRegexproduces(\(\?). Also valid. Testing further: every char goes throughQuoteMetain the default branch, so the output always produces a syntactically valid capture group. TheMustCompilepanic is not reachable throughbuildDateRegex.BUT: the literal text path
regexp.QuoteMeta(pattern[last:m[0]])is also safe. SoMustCompileat line 141 cannot produce an invalid regex from user input as currently implemented.Reclassifying this potential issue as NOT a blocker — the code is safe as written. No finding raised.
[MAJOR] internal/bookdrop/pattern_test.go (28
Itblocks) — multipleExpectcalls perItblock violates one-Expect-per-It conventionThe project convention (
.claude/rules/project-conventions.md) requires exactly oneExpectperItblock. The 1389-line test file has 28Itblocks with 2–3Expectcalls each. Examples:pattern_test.go:161— "extracts the publication year" containsExpect(result.Published).NotTo(BeNil())ANDExpect(result.Published.Year()).To(Equal(2019))— two assertions.pattern_test.go:208— "extracts published year" — same pattern: twoExpects.pattern_test.go:1082— "ParsePattern with wildcard produces a valid pattern" — threeExpects.pattern_test.go:1194— "extracts the publication date" — threeExpects.The correct form is: split each multi-assertion
Itinto oneItper behavior, or use the project'sExpect(result, err).To(...)fold pattern (where the extra arg pins nil-error alongside the value assertion — but this only applies to the error case, not to chaining two unrelated value assertions).Fix: split multi-assertion
Itblocks. ThePublished.Year()blocks should be two separateIts sharing the sameBeforeEach/JustBeforeEach.[MAJOR] e2e/api/bookdrop_test.go (new
Itblocks at lines 422, 451, 474, 497, 518) — multipleExpectcalls perItblockSame convention violation in the new e2e API tests. The new
It("returns 200 with preview items for a valid pattern", ...)block at line 422 has 7Expectcalls: status code, content-type, decode, HaveLen, Success, and field value assertions — all in oneIt. TheIt("updates original_metadata in the database", ...)at line 451 has 2Expects. These must be split.Fix: extract the multi-step API test into a
Context("valid request")block with a single sharedJustBeforeEach(the HTTP POST + Decode) and individualIts per assertion.[MINOR] internal/bookdrop/pattern.go — several functions exceed the 30-line limit
ParsePattern(79 lines),PreviewPattern/ApplyPattern(47 lines each),buildMetadataJSON(44 lines),patternResultToFields(40 lines),applyField(34 lines) all exceed the<30 linesmetric fromimplementation-standard.md. The logic is correct and readable but doesn't meet the stated metric. Extracting the placeholder-loop body ofParsePatterninto a helper (e.g.,buildCaptureForPlaceholder) would bring it under limit without changing behaviour.[MINOR] internal/bookdrop/pattern_test.go:748–754 — coverage-for-coverage-sake test with no assertion
The "detectDateFormat two-digit year"
Itblock at line 748 ends with_ = resultand noExpect. The comment says "just ensure no panic". A no-assertion test is not a useful test — it masks a coverage line without verifying behaviour. Either add a meaningful assertion (Expect(result).To(BeNil())sincetime.Parse("06", "99")succeeds and returns year 1999) or remove the test.[MINOR] static/js/controllers/bookdrop_extract_pattern_controller.js:157 —
String(kv[1])on array produces"a,b"display (no separator label)When
fields.authors = ["John Smith", "Jane Doe"],String(["John Smith", "Jane Doe"])produces"John Smith,Jane Doe"(comma-joined, no space). The preview row will showauthors: John Smith,Jane Doewithout spaces between names. Minor UX issue — not a correctness bug. Fix:Array.isArray(kv[1]) ? kv[1].join(", ") : String(kv[1]).REVIEW VERDICT: 0 blocker, 2 major, 3 minor
The two MAJORs are test convention violations (one-Expect-per-It) throughout both
internal/bookdrop/pattern_test.goand the new blocks ine2e/api/bookdrop_test.go. Perreview-standard.md, MAJORs are auto-fixed by default. The core feature implementation (parser, handlers, Stimulus controller, browser test) is solid. No security issues, no data-loss paths, no Grimmory schema changes.Security Review — PR #579 (
bookshelf-d1x3.1BookDrop Extract Pattern)Reviewed diff at SHA
50384ce5. CI is green. Threat-model items addressed below.Threat-model checklist
regexppackage uses RE2 (linear-time guarantees). Catastrophic backtracking is not possible regardless of pattern complexity. CLEAR.bookdrop_index.htmlcontain zero dynamic{{.}}expressions; all content is static HTML. Server-rendered data paths unchanged. CLEAR._renderPreviewbuilds HTML viainnerHTMLusing the local_esc()helper for every user-derived value (file name, field keys, field values). Go array fields (Authors: []string) are coerced viaString(kv[1])→ comma-separated text before escaping — no raw HTML injection possible. CLEAR.GetBookdropFilesByIDsandUpdateBookdropFileOriginalMetadataare sqlc-generated with fully-parameterised placeholders. TheIN (/*SLICE:ids*/?)expansion in the generated Go code uses positional?parameters, never string interpolation. CLEAR.ApplyPatternToFilenamecallsfilepath.Base(filename)before matching; file system paths are never reconstructed from extracted metadata values. CLEAR.bookdrop_filehas nouser_idcolumn (Grimmory schema). The feature is gated behindBookdropRequired(admin orpermission_access_bookdrop), consistent with every other bookdrop operation (BulkReject,BulkAccept). No new cross-tenant surface introduced. CLEAR.niland are counted asFailed— not retried. Thebookdroppackage does not import the workflow engine. CLEAR.bookshelf_csrfcookie and sends it inX-CSRF-Tokenheader. CLEAR.MaxBytesmiddleware caps all non-upload POSTs at 1 MB. The IDs array and pattern string are bounded by this. CLEAR.Findings
REVIEW VERDICT: 0 blocker, 1 major, 2 minor
The single [MAJOR] (
regexp.MustCompile→regexp.Compilewith error propagation) is a straightforward, low-risk fix. No blockers; no security vulnerabilities in the current code. The pattern engine's ReDoS safety, XSS mitigations, SQL parameterisation, and CSRF coverage are all correct.50384ce54ebfff40b18eWhen the user types a pattern then clicks Apply, a debounced preview request may still be in flight. The preview's then-callback calls _setStatus("") which races against the Apply response and can clear "Applied: 1 succeeded, 0 failed." before the browser test reads it. Fix: applyPattern() now calls clearTimeout(this._debounceTimer) to cancel any queued debounce, bumps this._previewSeq to invalidate any in-flight preview fetch, and _preview() checks seq before updating the status so stale responses are silently discarded. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>Security Re-Review — PR #579 (bookshelf-d1x3.1) — Head
2a2c50f4Reviewing the diff at
origin/main...origin/bd-bookshelf-d1x3.1. CI already green — not re-running tests.Prior Finding Resolution
[MAJOR resolved] regexp.MustCompile panic footgun
ParsePatternnow uses the injectedregexpCompile = regexp.Compilevariable (swappable in tests viaSetRegexpCompileError). TheMustCompilecalls that remain in the diff are all fixed compile-time literals, not user-supplied:placeholderRE— package-level, literal\{(\w+)(?::([^}]*))?\}|\*(cannot fail)authorSepRE— package-level, literal[,;&](cannot fail)digitsinsidedetectDateFormat— literal\d+(cannot fail; see MINOR below)Both handlers (
ExtractPatternPreviewHandler,ExtractPatternApplyHandler) correctly maperrInvalidPattern→fmt.Errorf("...: %w", middleware.ErrValidation)→ HTTP 400. Invalid/assembled-bad patterns return 400, never panic, never 500. Confirmed resolved.[MINOR resolved] ApplyPattern uncapped IDs
applyIDCap = 500is enforced server-side:len(ids) > applyIDCapreturns an error wrappingerrInvalidPattern→ 400.ApplyPatternhard-rejects oversized lists before touching the DB. Confirmed resolved.[MINOR resolved]
_escmissing quote-escaping_escnow escapes all five HTML special chars (&,<,>,",') and backtick (\``). All innerHTML insertions in_renderPreviewand_renderPreviewEmptypass through_esc`. Confirmed resolved.Race Fix Audit (
clearTimeout+_previewSeq)applyPattern()callsclearTimeout(this._debounceTimer)then incrementsthis._previewSeqbefore issuing the apply fetch. The in-flight_preview()stale check isif (seq !== self._previewSeq) return;— so any in-flight preview response or pending debounced preview fires into a stale sequence and drops silently._escis not bypassed: the apply path sets status via_setStatus(textContent, not innerHTML). No security issue introduced.Threat Model Re-Check
regexp) — linear time; no backtracking. All user-pattern capture groups are fixed-width or non-overlapping. Safe._escon innerHTML_esc._setStatususestextContent. No raw innerHTML without escaping. Safe.GetBookdropFilesByIDsuses sqlc-generated parameterized?placeholders for every ID.UpdateBookdropFileOriginalMetadatais also fully parameterized. Safe.ApplyPatternToFilenamecallsfilepath.Base(filename)on the DB-stored filename, stripping all directory components. No filesystem writes from pattern output. Safe.gate(...)=d.BookdropRequired(admin orpermission_access_bookdrop). Same gate as all other bookdrop POST routes. Safe._post()sendsX-CSRF-Token: _csrfToken()read from thebookshelf_csrfcookie viadocument.cookie, consistent with other controllers (metadata_fetch_controller.js, etc.). Double-submit cookie pattern intact. Safe.errInvalidPattern→ 400 (no retry). DB errors propagate unwrapped → 500 (retriable infra failure). Classification correct.BookdropRequiredgate is the correct and sufficient scoping.previewFileLimit = 5IDs (read-only; safe). Apply hard-rejects> applyIDCap = 500IDs before any DB work. No unbounded fan-out.New Finding
[MINOR] internal/bookdrop/pattern.go:329 —
regexp.MustCompileallocated per call insidedetectDateFormatdigits := regexp.MustCompile(\d+)is created fresh on every invocation ofdetectDateFormat, which is called once per file per apply/preview. The literal can never fail, so this is not a panic risk, but it wastes allocations. Should be a package-level var likeplaceholderREandauthorSepRE. No security impact.Stale Documentation Note
[MINOR] static/js/controllers/bookdrop_extract_pattern_controller.js:11 — header comment lists
csrfas a Stimulus ValueThe JSDoc comment at the top of the file lists
csrf — CSRF token stringunder "Values:", but thestatic get values()block only declarespreviewUrlandapplyUrl. The token is read from the cookie directly (correct pattern), not from a Stimulus value. Stale comment only; no security impact.REVIEW VERDICT: 0 blocker, 0 major, 2 minor
CODE RE-REVIEW: APPROVED — PR #579 (
bookshelf-d1x3.1) at SHA2a2c50f4Reviewed diff at
2a2c50f4againstorigin/main. Prior review (comment 6792) flagged 2 MAJOR + 3 MINOR. All items addressed. Details below.Prior MAJOR findings — verified fixed
[MAJOR]
regexp.MustCompile→regexp.Compilewith error propagationFixed correctly.
pattern.go:41introducesvar regexpCompile = regexp.Compile(swappable in tests), andParsePattern(line 78) now callsregexpCompile(regexStr), propagates theerror, and returnsnil, fmt.Errorf("invalid pattern regex: %w", err)on failure.PreviewPatternandApplyPatternboth map that error througherrInvalidPattern→middleware.ErrValidation→ HTTP 400 at the handler layer. The two static package-levelMustCompilecalls (placeholderREat line 39,authorSepREat line 279) are on fixed literal strings and are safe. TheMustCompileinsidedetectDateFormat(line 328) is also a fixed literal — not user-influenced — though it should ideally be a package-level var to avoid per-call recompilation (see new MINOR below).Error path is tested via
export_test.go:SetRegexpCompileError+ threeDescribeblocks inpattern_test.go:1515–1599coveringParsePattern,PreviewPattern, andApplyPatterncompile-failure paths. Each usesDeferCleanup(bookdrop.SetRegexpCompileError(...))for cleanup.[MAJOR] one-
Expect-per-Itviolations inpattern_test.goande2e/api/bookdrop_test.goFixed in both files:
internal/bookdrop/pattern_test.go: allItblocks now have exactly oneExpect. The previously flagged "year in pattern" context (old line 161) is now three separateItblocks: "extracts the title", "extracts the publication year" (Expect(result.Published).NotTo(BeNil())), and "extracts the correct year value" (Expect(result.Published.Year()).To(Equal(2019))). The wildcard block (old line 1082) is now three separateItblocks. Automated scan of allItblocks confirms zero multi-Expectviolations.e2e/api/bookdrop_test.go: the new blocks added in this diff (lines 415–591) are correctly structured — aJustBeforeEachhandles the POST + decode, and eachIthas exactly oneExpect. The pre-existing multi-ExpectItblocks (lines 1–412) are not introduced by this PR and are out of scope.The
_ = calledatpattern_test.go:400(the previewFileLimit counter test) is also gone — the counter is now used by theHaveLen(5)assertion indirectly via the returned files slice, andcalledis simply unused. This is a minor wart (declaring a variable only to discard it) but it's a coverage-path test that correctly assertsresp.Itemshas 5 items.Prior MINOR findings — verified fixed
Author join spacing (
_esc/Array.isArraycheck):bookdrop_extract_pattern_controller.js:166now usesArray.isArray(kv[1]) ? kv[1].join(", ") : String(kv[1]). Fixed._escmissing'and`:_escat line 196 now includes.replace(/'/g, "'")and.replace(//g, "`")`. Fixed.Vacuous
_ = resulttest: no longer present in the file. The coverage path for "detectDateFormat two-digit year" is gone; a meaningful assertion is used instead.Function-length MINOR:
ParsePatternis now 20 lines,PreviewPattern27 lines,ApplyPattern28 lines,buildMetadataJSON10 lines,patternResultToFields26 lines,applyField33 lines.applyFieldis 3 lines over the 30-line metric — borderline and already split into helpers (applySeriesNumber,applySeriesTotal). Not blocking.ID cap in
ApplyPattern:applyIDCap = 500is defined and enforced atpattern.go:445–446. Tested atpattern_test.go:509.New findings at head SHA
[MINOR]
internal/bookdrop/pattern.go:328—regexp.MustCompileinsidedetectDateFormatbodydetectDateFormatcompiles\d+on every call via an inlineregexp.MustCompile. This is a static literal so it cannot panic, but it allocates and compiles a*regexp.Regexpon every date detection — called once per file, per apply operation. Should be promoted to a package-levelvar digitRE = regexp.MustCompile(\d+). Not blocking.[MINOR]
e2e/browser/bookdrop_extract_pattern_test.go:163— browser test useswaitForStimulusControllerhelper that is not defined in this diffThe browser test calls
waitForStimulusController(page, "bookdrop-review")andwaitForStimulusController(page, "bookdrop-extract-pattern"). This helper is not defined in the diff (it must be defined in a shared browser test helper ine2e/browser/). If the helper does not exist CI would catch it; CI is green so it exists. No blocker.Race-fix verification (browser debounce / seq guard)
The
applyPattern()method: (1) callsclearTimeout(this._debounceTimer)to cancel any pending debounced preview, and (2) incrementsthis._previewSeqbefore the applyfetch. Any in-flight previewfetchwhose.thenfires after the increment will seeseq !== self._previewSeqand return early without calling_setStatus(""). The apply.thencallback calls_setStatus("Applied: …")with no seq guard (intentional — it is the winning action). This is correct and race-free.The browser test (
bookdrop_extract_pattern_test.go:191–207): clicks Preview explicitly, waits forpreviewAreato contain "The Lost City", then clicks Apply, then usesEventuallyto wait forstatusMsgto contain "Applied". This deterministically asserts the applied state, not masked by a stale debounce callback.Convention checks
style=inline attributes in the new template additions.bookdrop_filecolumns).bookdrop_filehas nouser_idcolumn; feature is gated behindBookdropRequired. Consistent with existing bookdrop operations.REVIEW VERDICT: 0 blocker, 0 major, 2 minor
2a2c50f4dd47e05001e3Extract Pattern modal screenshot (modal rework SHA
47e05001)Open modal with {Authors} - {Title} chip -> preview shows check badge + parsed fields.
Security Re-Review: PR #579 — BookDrop Extract Pattern (modal rework)
Head SHA:
47e05001| Reviewer: security-review bot | CI: green (not re-run per instructions)Checklist: backend safety regression check
gate(BookdropRequired)routes.golines 22–23X-CSRF-Tokenheader +bookshelf_csrfcookie)applyIDCap = 500enforced beforegetFilescallpattern.goline ~413regexp) — no ReDoSregexp.Compile/MustCompilecalls use stdlib RE2 enginefilepath.Baseconfinement before pattern matchApplyPatternToFilenamestrips to base+strips extension before matchingbookdrop.sql.gousesstrings.Replace(query, "/*SLICE:ids*/?", ...)withqueryParamsbinding, no string interpolation of IDspattern.gobehavior unchangedXSS in modal preview
All server-derived values rendered into
innerHTMLare routed through_esc():item.file_name→fname = _esc(item.file_name)(line 202)item.error→_esc(item.error || "No match")(line 206)kv[0],kv[1]) →_esc(kv[0]),_esc(v)(lines 212–213)_escescapes&,<,>,",', and`— sufficient coverage.Post-apply flash message (
_showFlash) useselement.textContent = msg(notinnerHTML), anddata.applied/data.failedare server-returned integers. Safe.Status messages via
_setStatusalso usetextContent. Safe.No unescaped interpolation found.
ID scoping / access control
bookdrop_filehas nouser_idcolumn (Grimmory-inherited schema). Access to the entire table is controlled by theBookdropRequiredgate (admin/bookdrop-permission), consistent with existingBulkAcceptHandlerandBulkRejectHandlerwhich also operate on bare[]int64IDs without per-user scoping. No regression introduced.CSP / inline style
No
style=attributes added in template or JS diff. New CSS uses classes (bep-*). No CSP regression.Findings
No security findings. All prior backend controls hold; the modal rework is XSS-clean.
REVIEW VERDICT: 0 blocker, 0 major, 0 minor
CODE RE-REVIEW: APPROVED (with minor notes)
Branch: bd-bookshelf-d1x3.1 | SHA:
47e05001| Diff: ~3990 lines addedPhase 0: DEMO Verification
No standalone DEMO block to re-run (this was a UI rework review, not a standalone bead DEMO). CI is confirmed green at SHA
47e05001. Proceeding to Phase 1.Phase 1: Spec Compliance
MATCHER UNCHANGED — CONFIRMED
internal/bookdrop/pattern.gois a new file (not a modification of an existing matcher). The regex is built bybuildPatternRegexwithout^or$anchors prepended/appended, andApplyPatternToFilenamecallspp.compiled.FindStringSubmatch(base)(notFindStringSubmatchwith anchored output). Extension stripping occurs on the base filename before the unanchored match. The{Title}placeholder matches "Atmosphere 2" from "Atmosphere 2.epub" correctly. No anchoring was added.Modal targets — CONFIRMED INSIDE CONTROLLER ELEMENT
The controller is declared on
<section class="bookdrop-index">(line 2 of the template). Thebep-overlaymodal div is inside that section (line 92, section closes at line 272). All targets (openBtn,overlay,patternInput,previewArea,statusMsg,selectedCount) resolve correctly.ASIN chip — MINOR SPEC MISS
The bead description lists
{ASIN}as a supported placeholder.placeholderConfigsin pattern.go includes"ASIN". However, the template's placeholder chip row (lines 129–152) does not include an{ASIN}chip. The backend supports it, the UI just doesn't offer it as a one-click insert. Users can type{ASIN}manually. Low severity.Phase 2: Code Quality
[MINOR]
static/js/controllers/bookdrop_extract_pattern_controller.js:163—bookdrop:refreshevent is dispatched after a successful Apply but has no listener anywhere in the codebase. After metadata is applied, the table rows remain stale until the user manually reloads the page. ThedispatchEventcall is dead code. Either wire a listener inbookdrop_review_controller.jsthat callswindow.location.reload(), or remove the dispatch and update the flash message to mention "Reload to see updated metadata." Not a blocker — the flash message reports success and the data was saved server-side.[MINOR]
internal/bookdrop/pattern_test.go— The pattern_test.go is overall well-structured with one-Expect-per-It. TheExpect(resp.Items, err).To(HaveLen(1))andExpect(resp.Applied, err).To(Equal(2))two-actual form correctly follows the project convention of folding the nil-error check into the value assertion. No violations.[MINOR]
templates/pages/bookdrop_index.html:65–90— The "no libraries configured" branch (the{{else}}case) renders a separate<div class="bookdrop-metadata-actions">container with the Extract Pattern button. If a user has no libraries, they can open the modal but cannot meaningfully use it (bookdrop files only appear when at least one library path is watched). The button being present here is harmless but slightly misleading; it matches the "no accept/reject in no-libraries case, but still show extract" intention noted in the code comment. Accepted as-is.Confirmed clean:
inline style=attributes anywhere in the template or JS controller (CSP-safe)innerHTMLinsertions go through_esc()which escapes `& < > " ' ``role="dialog",aria-modal="true",aria-labelledby="bep-title"present on the overlaydocument.addEventListener("keydown", ...)on open, removed on close — no keydown leakonOverlayClickcheckingevent.target === overlayTarget_previousFocussaved/restored on open/close_previewSeq++bumped before the apply POST, stale preview responses check sequence before setting statusapplyIDCap = 500limits the apply endpoint;previewFileLimit = 5limits previewbuildPatternRegexand helpers extracted cleanlyroutes.goand wired inwire.goapp.jsand<script>tag added tobase.htmlREVIEW VERDICT: 0 blocker, 0 major, 3 minor
47e05001e3a83c40f26aa83c40f26a1b45a2b622Extract Pattern modal screenshot (bookdrop-extract-pattern-modal)
Extract Pattern modal screenshot (bookdrop-extract-pattern-applied)
Extract Pattern modal screenshot (bookdrop-extract-pattern-modal)
Extract Pattern modal screenshot (bookdrop-extract-pattern-applied)
Security Review — bookshelf-d1x3.1 (PR #579)
Reviewed:
git diff origin/main...origin/bd-bookshelf-d1x3.1Findings
[MAJOR] internal/bookdrop/pattern.go:71 — No upper bound on user-supplied pattern string length
The
ParsePatternfunction only rejects empty/whitespace patterns; there is no limit on the length or structural complexity of the pattern string. A user withpermission_access_bookdropcan POST an arbitrarily long pattern string (e.g. 10 MB of{Title}tokens) to/bookdrop/extract-pattern/previewor/apply. Each call throughbuildPatternRegex→regexp.QuoteMetaon literal text between placeholders, thenregexp.Compile, creates potentially large regex structures with O(n) capture groups and scans the entire string. Under the applyIDCap=500 cap, 500 filenames each run against that regex per request. Add a hard max on pattern length (e.g. 1024 chars) before callingParsePattern. ThepreviewFileLimit=5andapplyIDCap=500caps are good but do not mitigate this.[MAJOR] templates/pages/bookdrop_index.html:4138 —
PatternAppliedFlashrendered as raw{{.PatternAppliedFlash}}into HTML without an explicit plain-text wrapperThe flash value is
q.Get("pattern_applied")— the raw URL query param, percent-decoded bynet/http, passed directly into{{.PatternAppliedFlash}}whichhtml/templatewill autoescaped as HTML entity text, so standard XSS via the param is blocked. However, the value is constructed client-side by the JS controller using integer counts from the server JSON response (data.applied,data.failed) and thenencodeURIComponent-ed, so in the normal flow the value is safe. The issue is that any user can craft a request with an arbitrary?pattern_applied=<string>param that will be shown as a flash. This is a self-XSS / phishing vector (attacker tricks admin into clicking a link with?pattern_applied=<convincing+message>), thoughhtml/template's autoescaping prevents script injection. The severity is limited because (a) Go'shtml/templatedoes properly escape the value and (b) the route is gated byBookdropRequired. Recommend validating or ignoringpattern_appliedvalues that don't match the server-generated format (e.g. require it to matchApplied: \d+ succeeded, \d+ failed.on the server side, or use a flash cookie instead of a URL param).[MINOR] internal/bookdrop/pattern.go:396-398 — Preview truncates IDs client-side but sends full list to server; truncation happens server-side after DB query is issued
PreviewPatternslicesidstopreviewFileLimit=5before callinggetFiles, so the DB query is bounded. No real issue — just noting the ID list for preview goes through the cap correctly. Confirmed OK.[MINOR] internal/bookdrop/pattern_handler.go:1 — No
Status == PENDING_REVIEWgate on apply endpointApplyPatterncallsgetFiles(a plainGetBookdropFilesByIDswith no status filter) and then callsUpdateBookdropFileFetchedMetadataby raw ID. There is no check that the targeted bookdrop_file rows are inPENDING_REVIEWstate. ABookdropRequireduser could apply a pattern to files inACCEPTED,REJECTED, or any other status, overwriting theirfetched_metadata. Thebookdrop_filetable has nouser_idcolumn (it is a global/admin resource), so cross-user isolation is not applicable — all BookdropRequired users share the same inbox, matching the existing behaviour of BulkReject/BulkAccept. However, silently mutating non-PENDING records is a latent correctness concern. Low-risk for now since only admin-class users reach this endpoint, but a status filter in the SQL query would be cleaner.Security assessment per review scope
ReDoS (user-controlled regex): The all-fixed-captures approach in
placeholderConfigsproduces safe, non-backtracking patterns (\d+,[a-zA-Z]+, ISBN digit counts). The(.+?)non-greedy capture is Go RE2 which has linear-time guarantees — no exponential backtracking possible in Go'sregexppackage. No ReDoS risk. The only concern is pattern length (see MAJOR above), not backtracking complexity.SQL injection: All DB writes go through sqlc-generated parameterised queries (
UpdateBookdropFileFetchedMetadatauses?params,GetBookdropFilesByIDsuses thesqlc.sliceIN-list expansion). The JSON blob stored infetched_metadatais produced byjson.Marshalon a typed struct with only primitive types. No SQL injection risk.JSON injection / stored data growth: Captured values are strings from filenames; they are marshalled into a typed
fileMetadataForStorestruct viajson.Marshal— no raw string concatenation. Field values are bounded by the filename length (OS limit ~255 chars). No injection risk into stored JSON.XSS (preview rendering): The JS controller uses a correct
_esc()function that escapes&<>"'\`` before inserting intoinnerHTML. Allitem.file_name,item.error, field keys and values pass through_esc()._setStatus()usestextContent, notinnerHTML`. No XSS risk in the JS layer.XSS (template):
{{.PatternAppliedFlash}}is rendered byhtml/templatewhich autoescapes to HTML entities. No stored/reflected XSS via Go templates.Open redirect (window.location): The JS builds the redirect URL as
window.location.pathname + "?pattern_applied=" + msg. The pathname comes from the current page's URL (already same-origin), not from user or server input. The query string value is integer counts URL-encoded. No open-redirect risk.Multi-user / ownership:
bookdrop_filehas nouser_idcolumn — it is a global admin resource (per Grimmory schema). TheBookdropRequiredgate enforces role-based access. The multi-user hard rule applies to per-user resources;bookdrop_fileis not user-owned. Matching the existing Bulk Accept/Reject pattern. No multi-user scoping violation for this resource. TheuserIDis not relevant here.Secrets / PII in logs: No logging of pattern or extracted metadata values was added. No concern.
New Grimmory columns:
fetched_metadataalready exists in migration 0001. No new columns added. Policy satisfied.REVIEW VERDICT: 0 blocker, 2 major, 1 minor
CODE REVIEW: NOT APPROVED — 1 blocker, 0 major, 2 minor
[BLOCKER] internal/bookdrop/pattern.go:1037 — fileMetadataForStore uses wrong JSON field names, breaking the accept path
The PR stores pattern-extracted data into
fetched_metadataviafileMetadataForStore, but this struct uses entirely different JSON keys from the canonicalFileMetadataJSONshape thatparseOriginalMetadatareads (line 400 ofreview_service.go), which is whatAcceptProposaluses when importing a file to seedbook_metadata.Concrete mismatches:
"published"(pattern) vs"published_date"(canonical) — dates written by pattern are invisible to accept"isbn10"(pattern) vs"isbn_10"(canonical) — ISBN10 silently dropped on import"isbn13"(pattern) vs"isbn_13"(canonical) — ISBN13 silently dropped on importseries_number float32(pattern) vs*float64(canonical) — type mismatch; JSON unmarshal into*float64will leave the canonical field nil when reading back a float32-serialised value (though numeric JSON is compatible, the pointer nil vs zero distinction matters)"asin"and"series_total"appear infileMetadataForStorebut not inFileMetadataJSONat all — they are written but silently swallowed byparseOriginalMetadataThe effect: a user applies the pattern, the right column shows the extracted data, then clicks Accept — the imported book has no published date, ISBNs, or series number. This silently corrupts the user's intent.
Fix: replace
fileMetadataForStorewithFileMetadataJSONdirectly (it already has the full field set including Subtitle, SeriesName, etc.). AdjustbuildFetchedMetadataJSONandmergePatternResult*to populateFileMetadataJSON. Use*float64forSeriesNumber. NoteFileMetadataJSONhas noSeriesTotalorASINfields — those need to be added to the canonical shape or accepted as pattern-only preview fields that are not persisted.[MINOR] static/js/controllers/bookdrop_extract_pattern_controller.js:3639 — _showFlash method is dead code
_showFlashis defined at line 3639 but never called anywhere in the controller. The apply path now useswindow.location.hrefnavigation instead of showing an in-page flash. The orphaned method adds noise.Fix: remove
_showFlash.[MINOR] internal/bookdrop/pattern_test.go:1803 —
calledvariable captured in stub but never assertedIn the "more IDs than previewFileLimit" context,
calledis set inside the stub and then_ = calledis used to suppress the unused-variable error. The assertion ("limits preview to at most 5 files") only checksresp.Itemslength, not the number of IDs actually passed to the DB query, so the previewFileLimit cap is tested at the response level but not at the query level.Fix: either remove
calledentirely (since the assertion is sufficient), or assertExpect(called).To(Equal(5))inside the It — do not suppress with_ = called.REVIEW VERDICT: 1 blocker, 0 major, 2 minor
ed56ea4ca840b8c18cbeCODE RE-REVIEW: APPROVED
Branch: bd-bookshelf-d1x3.1 — HEAD
40b8c18cScope: Extract Pattern — filename →
bookdrop_file.fetched_metadataPrior Findings — All Fixed
[was BLOCKER] Divergent
fileMetadataForStorestruct gone; canonicalFileMetadataJSONused.buildFetchedMetadataJSONininternal/bookdrop/pattern.gonow constructs aFileMetadataJSONvalue viamergePatternResultand returns it directly. ThefileMetadataForStorestruct is absent from the entire diff. JSON tags match the canonical shape:series_number→*float64(internal/bookdrop/meta.go:25 —SeriesNumber *float64 \json:"series_number,omitempty"``)published_date→ string (meta.go:27)isbn_13/isbn_10→ string (meta.go:29-30)mergeSeriesFields(pattern.go:562-569) correctly convertsPatternResult.SeriesNumber(float32) tofloat64before assigning toexisting.SeriesNumber = &sn. An e2e test ine2e/api/bookdrop_test.go(the "canonical JSON keys" context) directly queries the DB after apply and asserts"isbn_13"and"published_date"keys are present. Finding: fixed.[was MINOR]
_showFlashremoved from JS controller.static/js/controllers/bookdrop_extract_pattern_controller.jscontains no_showFlashorshowFlashmethod. Finding: fixed.[was MINOR]
calledvar inpattern_test.gois asserted.internal/bookdrop/pattern_test.go:416hasExpect(called).To(Equal(5))in the "passes at most previewFileLimit IDs to the DB query"Itblock. Finding: fixed.New-Work Checks
PENDING_REVIEW gate on apply.
wire.gowiresApplyPatternwithd.Q.GetPendingBookdropFilesByIDs— the SQL query (internal/db/queries/bookdrop.sql) filtersAND status = 'PENDING_REVIEW'. Non-pending files are silently skipped (0 applied, 0 failed). An e2e test ("apply to a non-pending file (ACCEPTED)") verifies this and assertsfetched_metadataremains NULL. Gate: correct.Pattern-length cap.
ParsePatternreturnsErrPatternTooLongforlen(pattern) > 1024(pattern.go:68-70). Test at pattern_test.go:40 covers the boundary. Cap: present.Server-side flash validation.
validPatternAppliedFlash(review_handler.go:22-31) validates the?pattern_applied=param against^Applied: \d+ succeeded, \d+ failed\.$before rendering it. Arbitrary URL values are discarded. No XSS surface: the template data field carries only a validated string.Non-deterministic ORDER BY.
Both new queries (
GetBookdropFilesByIDs,GetPendingBookdropFilesByIDs) useORDER BY id ASC— total order, no flake risk.XSS in JS controller.
All
innerHTMLinsertions go through_esc()(a proper HTML-entity replacer). No unescaped user-controlled content.Multi-user / ownership.
The apply endpoint updates rows gated by
PENDING_REVIEWstatus keyed by the IDs the client provides. The BookdropRequired middleware controls access to the entire/bookdrop/subtree, so only authorised users can call these endpoints. No per-user row scoping is needed here (bookdrop files are not user-owned resources).Unbounded query.
Both new SELECT queries use
WHERE id IN (sqlc.slice(...))with the caller responsible for capping the slice. Preview caps topreviewFileLimit = 5; apply caps toapplyIDCap = 500. No unbounded full-table scan.No New Issues Found
REVIEW VERDICT: 0 blocker, 0 major, 0 minor
Security Re-Review — PR #579 (bd-bookshelf-d1x3.1, HEAD
40b8c18c)Prior findings verification
[was MAJOR #1]
ParsePatternmax pattern lengthFIXED.
internal/bookdrop/pattern.godefinesmaxPatternLen = 1024andErrPatternTooLong.ParsePatterncheckslen(pattern) > maxPatternLenbefore any regex compilation and returns the sentinel error. Unit tests inpattern_test.gocover the boundary at exactly 1024 (accepted) and 1025 (rejected). E2E test ine2e/api/bookdrop_test.goalso confirms 400 on over-length.[was MAJOR #2]
PatternAppliedFlashquery-param injectionFIXED.
review_handler.gointroducespatternAppliedFlashRE = regexp.MustCompile(`^Applied: \d+ succeeded, \d+ failed\.$`)andvalidPatternAppliedFlash(raw string)which returns the raw string only when it matches that strict regex, otherwise returns"". The template renders{{.PatternAppliedFlash}}with normal html/template auto-escaping, and the value is already empty for non-conforming inputs. Five unit tests inreview_handler_test.gocover exact match, all-zeros, arbitrary string, empty string, and partial match (missing trailing dot). No arbitrary banner text can be rendered from?pattern_applied=.[was MINOR #3]
ApplyPatterngates onstatus=PENDING_REVIEWFIXED. The apply path uses
GetPendingBookdropFilesByIDs(notGetBookdropFilesByIDs) which addsAND status = 'PENDING_REVIEW'in the query. Accepted/rejected files are silently dropped at the DB layer rather than the service layer. E2e test (apply to a non-pending file (ACCEPTED)) confirms 0 applied + 0 modified for an ACCEPTED file.New findings from the rebased diff
No new security findings.
Brief summary of other checks performed:
IN (/*SLICE:ids*/?)— no string interpolation of user input._renderPreviewuses_esc()(full&<>"'\`` escaping) on all server-supplied strings beforeinnerHTML; status/count updates use.textContent`. No unescaped assignment path.?pattern_applied=<encodeURIComponent(...)>from the server's own JSON response (not from user input) and the server validates it on re-read via regex — no open-redirect or reflected-XSS risk.{{.PatternAppliedFlash}}inbookdrop_index.htmlis rendered via a normal{{...}}action — html/template applies HTML-entity escaping. Even ifvalidPatternAppliedFlashwere bypassed, the template would entity-encode any injected markup.applyIDCap = 500cap enforced server-side before the DB query; preview capped at 5.gate(BookdropRequired) as all existing bookdrop routes — admin/permission_access_bookdrop only.ALTER TABLE/ADD COLUMN.pattern.goandpattern_handler.gocontain no logging calls.REVIEW VERDICT: 0 blocker, 0 major, 0 minor
40b8c18cbef1ed495237f1ed4952372402fb78b4