POC.2: port cover.generate to go-workflows workflow+activity (bookshelf-vguw.2) #327

Merged
zombor merged 2 commits from bd-bookshelf-vguw.2 into poc/go-workflows 2026-06-04 03:33:41 +00:00
Owner

Summary

  • Adds GenerateCoverWorkflow(bookID) + CoverActivities.Generate in internal/wfengine; activity body is a zero-reimplementation wrapper over the existing cover.BuildGenerateFunc (same pipeline as the pool handler)
  • Activity retry policy mirrors the pool policy: MaxAttempts=5, 30s first interval, 2× backoff, 5 min cap — configured via workflow.ActivityOptions.RetryOptions
  • New trigger endpoint POST /admin/workflows/_poc/cover/{bookID} (AdminRequired, explicit POST method — avoids the GET / conflict that bit POC.1)
  • Engine.New / newWithFactory extended to accept generateCover func(context.Context, int64) error; cover workflow + activity registered on the worker only when non-nil
  • Wired in app.go and cmd/pergamum/worker.go via cover.BuildGenerateFunc(appwire.Deps{...})

Test plan

Local verification (requires PERGAMUM_WFENGINE_ENABLED=true)

  1. Start the stack: make dev
  2. POST the trigger for a seeded book:
    curl -X POST http://localhost:8080/admin/workflows/_poc/cover/1
    # → {"instance_id":"<uuid>"}
    
  3. Open /admin/workflows/ — the instance should appear as Completed
  4. Check that data/images/1/cover.jpg and thumbnail.jpg exist

Retry durability check (manual)

  1. Temporarily break the cover pipeline (e.g. wrong DataDir) so the activity returns an error
  2. POST the trigger — observe the workflow retrying in the diag UI (up to 5 attempts with growing delays)
  3. Fix the break — the next retry succeeds

Worker mid-run kill (durability)

  1. Start a cover workflow for a large book that takes several seconds
  2. Kill the worker process mid-generation (docker compose kill worker)
  3. Restart — the workflow engine resumes from event history (activity schedules are re-delivered, not re-started from scratch)

CI

Standard CI runs make test + make coverage (100% gate). The enabled-path integration test (engine_integration_test.go) verifies the cover route with a stub generateCover against a real MySQL (testcontainers), catching the class of bugs that hit POC.1.

Part of POC epic bookshelf-vguw; base is poc/go-workflows, do not merge to main.

Closes bead bookshelf-vguw.2 on merge.

## Summary - Adds `GenerateCoverWorkflow(bookID)` + `CoverActivities.Generate` in `internal/wfengine`; activity body is a zero-reimplementation wrapper over the existing `cover.BuildGenerateFunc` (same pipeline as the pool handler) - Activity retry policy mirrors the pool policy: MaxAttempts=5, 30s first interval, 2× backoff, 5 min cap — configured via `workflow.ActivityOptions.RetryOptions` - New trigger endpoint `POST /admin/workflows/_poc/cover/{bookID}` (AdminRequired, explicit `POST` method — avoids the `GET /` conflict that bit POC.1) - `Engine.New` / `newWithFactory` extended to accept `generateCover func(context.Context, int64) error`; cover workflow + activity registered on the worker only when non-nil - Wired in `app.go` and `cmd/pergamum/worker.go` via `cover.BuildGenerateFunc(appwire.Deps{...})` ## Test plan ### Local verification (requires `PERGAMUM_WFENGINE_ENABLED=true`) 1. Start the stack: `make dev` 2. POST the trigger for a seeded book: ``` curl -X POST http://localhost:8080/admin/workflows/_poc/cover/1 # → {"instance_id":"<uuid>"} ``` 3. Open `/admin/workflows/` — the instance should appear as `Completed` 4. Check that `data/images/1/cover.jpg` and `thumbnail.jpg` exist ### Retry durability check (manual) 1. Temporarily break the cover pipeline (e.g. wrong `DataDir`) so the activity returns an error 2. POST the trigger — observe the workflow retrying in the diag UI (up to 5 attempts with growing delays) 3. Fix the break — the next retry succeeds ### Worker mid-run kill (durability) 1. Start a cover workflow for a large book that takes several seconds 2. Kill the worker process mid-generation (`docker compose kill worker`) 3. Restart — the workflow engine resumes from event history (activity schedules are re-delivered, not re-started from scratch) ## CI Standard CI runs `make test` + `make coverage` (100% gate). The enabled-path integration test (`engine_integration_test.go`) verifies the cover route with a stub `generateCover` against a real MySQL (testcontainers), catching the class of bugs that hit POC.1. Part of POC epic bookshelf-vguw; base is poc/go-workflows, do not merge to main. Closes bead bookshelf-vguw.2 on merge.
feat(wfengine): POC.2 — port cover.generate to go-workflows workflow+activity (bookshelf-vguw.2)
Some checks failed
/ Lint (pull_request) Successful in 2m41s
/ Test (pull_request) Successful in 3m0s
/ Integration (pull_request) Failing after 4m6s
/ E2E API (pull_request) Successful in 4m43s
/ E2E Browser (pull_request) Successful in 6m17s
0d0b5ca894
- Add GenerateCoverWorkflow + CoverActivities to internal/wfengine
- Wire activity body via BuildGenerateFunc in cover/wire.go — reuses the
  existing cover.generate pipeline with zero re-implementation
- Configure activity retry policy to mirror pool policy: 5 attempts,
  30s first interval, 2x backoff, 5min max
- Add POST /admin/workflows/_poc/cover/{bookID} trigger route (AdminRequired)
- Update Engine.New/newWithFactory to accept generateCover func dep
- Register cover workflow+activity on the worker when dep is non-nil
- Wire generateCover in both app.go and cmd/pergamum/worker.go
- 100% coverage: unit tests via workflow tester + curried stubs; enabled-path
  integration test with real MySQL + stub generateCover

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
fix(wfengine): remove unused generateCalled var in integration test (CI build failure)
All checks were successful
/ Lint (pull_request) Successful in 2m23s
/ Test (pull_request) Successful in 2m35s
/ Integration (pull_request) Successful in 3m55s
/ E2E API (pull_request) Successful in 3m59s
/ E2E Browser (pull_request) Successful in 5m52s
31e13a31e2
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Author
Owner

Security Review — bd-bookshelf-vguw.2 (POC.2: cover workflow)

Scope

Reviewed git diff origin/poc/go-workflows...origin/bd-bookshelf-vguw.2 against .claude/rules/review-standard.md. CI is green; tests were not re-run.


Findings

No blockers. No majors. Three minors:


[MINOR] internal/wfengine/engine.go:180 — bookID logged at Info without a guard on non-existent book

logger.Info("wfengine: cover workflow started", "instance_id", instanceID, "book_id", bookID) fires before the activity runs and confirms the book exists. An admin can deduce that a workflow was enqueued for any int64, including IDs that don't exist in the DB (the activity will simply return an error later). For a POC behind AdminRequired this is acceptable — book IDs are not secrets and admins already have full DB access — but worth noting for when this pattern graduates to production scale.

Suggested fix: no immediate action needed for the POC; log at Info is correct. When graduating, consider logging only after the first activity heartbeat confirms book existence.


[MINOR] internal/wfengine/cover_handler.go:40 — json.Encode error discarded silently

_ = json.NewEncoder(w).Encode(result) discards the encode error. After w.WriteHeader(http.StatusAccepted) the status is committed and a partial write can't be recovered, so discarding is pragmatically harmless. However a log line on error would surface network-level issues silently:

if err := json.NewEncoder(w).Encode(result); err != nil {
    logger.Warn("wfengine: cover handler: encode response", "err", err)
}

[MINOR] internal/wfengine/engine.go (no line) — no per-bookID deduplication / queue-depth guard

An admin can POST /admin/workflows/_poc/cover/{bookID} in a tight loop, enqueuing unbounded workflow instances for the same bookID. Each call generates a fresh UUID instance ID so there is no idempotency guard. For a POC feature gated on PERGAMUM_WFENGINE_ENABLED and restricted to admins, spamming the queue is a self-inflicted problem and not a security issue — but it is a resource-exhaustion footgun that must be addressed before the pattern reaches any broader surface. File a follow-up bead when this graduates from POC.


Security checklist results

  1. AdminRequired confirmed: RegisterRoutes wraps both Ping and Cover routes with adminRequired(...). AdminRequired in internal/users/middleware.go checks JWT claims + IsAdmin flag; unauthenticated requests get 401/redirect, non-admin get 403. ✓
  2. bookID validation: strconv.ParseInt(bookIDStr, 10, 64) + bookID <= 0 guard before any further use. The int64 value is passed to sqlc-generated queries (parameterised) and into fmt.Sprintf({"book_id":%d}, bookID) — no string interpolation into SQL; no injection risk. ✓
  3. Multi-user scoping: Cover generation is library-level; it reads GetPrimaryBookFile and GetBookLibraryPath — both keyed by book_id with no per-user data read or written. No user_book_progress or other per-user tables are touched. ✓
  4. DSN/password in logs: POC.1's DSN-in-logs issue is not reintroduced. maskDSN is present in backend.go; the only log additions in this diff are "book_id" and "instance_id" — no credentials. ✓
  5. Error messages: Internal error detail is not returned to the caller. On 500, http.Error(w, "failed to start cover workflow", ...) is used; internal error is logged server-side only. The test at cover_handler_test.go:268 explicitly asserts the internal error string is not in the response body. ✓
  6. Swallowed errors: No security-relevant errors are swallowed. The _ = json.NewEncoder(w).Encode(result) discard is post-commit (minor, above). All DB and workflow errors propagate up and result in logged 500s. ✓

REVIEW VERDICT: 0 blocker, 0 major, 3 minor

## Security Review — bd-bookshelf-vguw.2 (POC.2: cover workflow) ### Scope Reviewed `git diff origin/poc/go-workflows...origin/bd-bookshelf-vguw.2` against `.claude/rules/review-standard.md`. CI is green; tests were not re-run. --- ### Findings No blockers. No majors. Three minors: --- **[MINOR] internal/wfengine/engine.go:180 — bookID logged at Info without a guard on non-existent book** `logger.Info("wfengine: cover workflow started", "instance_id", instanceID, "book_id", bookID)` fires _before_ the activity runs and confirms the book exists. An admin can deduce that a workflow was enqueued for any int64, including IDs that don't exist in the DB (the activity will simply return an error later). For a POC behind `AdminRequired` this is acceptable — book IDs are not secrets and admins already have full DB access — but worth noting for when this pattern graduates to production scale. Suggested fix: no immediate action needed for the POC; log at Info is correct. When graduating, consider logging only after the first activity heartbeat confirms book existence. --- **[MINOR] internal/wfengine/cover_handler.go:40 — json.Encode error discarded silently** `_ = json.NewEncoder(w).Encode(result)` discards the encode error. After `w.WriteHeader(http.StatusAccepted)` the status is committed and a partial write can't be recovered, so discarding is _pragmatically_ harmless. However a log line on error would surface network-level issues silently: ```go if err := json.NewEncoder(w).Encode(result); err != nil { logger.Warn("wfengine: cover handler: encode response", "err", err) } ``` --- **[MINOR] internal/wfengine/engine.go (no line) — no per-bookID deduplication / queue-depth guard** An admin can POST `/admin/workflows/_poc/cover/{bookID}` in a tight loop, enqueuing unbounded workflow instances for the same bookID. Each call generates a fresh UUID instance ID so there is no idempotency guard. For a POC feature gated on `PERGAMUM_WFENGINE_ENABLED` and restricted to admins, spamming the queue is a self-inflicted problem and not a security issue — but it is a resource-exhaustion footgun that must be addressed before the pattern reaches any broader surface. File a follow-up bead when this graduates from POC. --- ### Security checklist results 1. **AdminRequired confirmed**: `RegisterRoutes` wraps both Ping and Cover routes with `adminRequired(...)`. `AdminRequired` in `internal/users/middleware.go` checks JWT claims + `IsAdmin` flag; unauthenticated requests get 401/redirect, non-admin get 403. ✓ 2. **bookID validation**: `strconv.ParseInt(bookIDStr, 10, 64)` + `bookID <= 0` guard before any further use. The int64 value is passed to sqlc-generated queries (parameterised) and into `fmt.Sprintf(`{"book_id":%d}`, bookID)` — no string interpolation into SQL; no injection risk. ✓ 3. **Multi-user scoping**: Cover generation is library-level; it reads `GetPrimaryBookFile` and `GetBookLibraryPath` — both keyed by `book_id` with no per-user data read or written. No `user_book_progress` or other per-user tables are touched. ✓ 4. **DSN/password in logs**: POC.1's DSN-in-logs issue is not reintroduced. `maskDSN` is present in `backend.go`; the only log additions in this diff are `"book_id"` and `"instance_id"` — no credentials. ✓ 5. **Error messages**: Internal error detail is not returned to the caller. On 500, `http.Error(w, "failed to start cover workflow", ...)` is used; internal error is logged server-side only. The test at `cover_handler_test.go:268` explicitly asserts the internal error string is not in the response body. ✓ 6. **Swallowed errors**: No security-relevant errors are swallowed. The `_ = json.NewEncoder(w).Encode(result)` discard is post-commit (minor, above). All DB and workflow errors propagate up and result in logged 500s. ✓ --- REVIEW VERDICT: 0 blocker, 0 major, 3 minor
Author
Owner

Code Review — POC.2 cover.generate workflow (bookshelf-vguw.2)

Phase 0: DEMO Verification

DEMO requires a running dev stack with PERGAMUM_WFENGINE_ENABLED=true and a seeded book. PARTIAL DEMO with valid reason (local Docker + workflow engine required). Noted for human verification; proceeding to Phase 1/2.

Phase 1: Spec Compliance

All spec items met: GenerateCoverWorkflow+CoverActivities in cover_workflow.go, MaxAttempts=5 retry policy, POST trigger endpoint behind AdminRequired, BuildGenerateFunc reuses buildGenerateHandler, old pool handler intact, 100% coverage gate per CI.

Phase 2: Code Quality

[MINOR] internal/wfengine/cover_workflow.go:20 — Retry policy comment misrepresents backoff sequence. The comment says "5 attempts with exponential back-off starting at 30 s" but go-workflows retries.go line 87 computes FirstRetryInterval * BackoffCoefficient^attempt (where attempt is post-incremented before the wait), so actual delays are 60 s, 120 s, 240 s (capped 300 s), 300 s. The pool actual sequence (workerpolicy/policy.go) is [0, 30 s, 60 s, 120 s, 300 s] — first pool retry is immediate; first activity retry is 60 s. MaxAttempts=5 and the 5-min cap are correct. Suggested fix: update comment to state actual computed sequence or align delays to match the pool exactly.

[MINOR] internal/wfengine/cover_handler_test.go:109 — Two Expect calls in one It block. It("returns a generic client message (not the internal error detail)") calls Expect twice (ContainSubstring + NotTo ContainSubstring). Project convention requires one Expect per It. Split into two It blocks.

[MINOR] internal/wfengine/engine_integration_test.go:251 — Four Expect calls in one It block ("serves POST /admin/workflows/_poc/cover/{bookID} with 202 and an instance_id"). Pre-existing pattern carried from the ping route test, but still violates one-Expect-per-It.

Summary

Workflow determinism: CLEAN. new(CoverActivities) in workflow body only supplies fn.Name reflection for activity lookup; the registered instance (with real generate closure) executes. No I/O, no time.Now, no randomness in workflow body.

bookID validation: ParseInt + <= 0 guard before any workflow call. No panic on bad input.

Reuse wrapper: BuildGenerateFunc delegates to buildGenerateHandler; no-op callbacks do not drop errors. Error propagation intact.

Wiring: generateCover=nil path skips registration cleanly; StartCoverWorkflow returns typed error when not registered. Route is POST method-specific (avoids GET-/ conflict). AdminRequired applied.

REVIEW VERDICT: 0 blocker, 0 major, 3 minor

## Code Review — POC.2 cover.generate workflow (bookshelf-vguw.2) ### Phase 0: DEMO Verification DEMO requires a running dev stack with PERGAMUM_WFENGINE_ENABLED=true and a seeded book. PARTIAL DEMO with valid reason (local Docker + workflow engine required). Noted for human verification; proceeding to Phase 1/2. ### Phase 1: Spec Compliance All spec items met: GenerateCoverWorkflow+CoverActivities in cover_workflow.go, MaxAttempts=5 retry policy, POST trigger endpoint behind AdminRequired, BuildGenerateFunc reuses buildGenerateHandler, old pool handler intact, 100% coverage gate per CI. ### Phase 2: Code Quality [MINOR] internal/wfengine/cover_workflow.go:20 — Retry policy comment misrepresents backoff sequence. The comment says "5 attempts with exponential back-off starting at 30 s" but go-workflows retries.go line 87 computes FirstRetryInterval * BackoffCoefficient^attempt (where attempt is post-incremented before the wait), so actual delays are 60 s, 120 s, 240 s (capped 300 s), 300 s. The pool actual sequence (workerpolicy/policy.go) is [0, 30 s, 60 s, 120 s, 300 s] — first pool retry is immediate; first activity retry is 60 s. MaxAttempts=5 and the 5-min cap are correct. Suggested fix: update comment to state actual computed sequence or align delays to match the pool exactly. [MINOR] internal/wfengine/cover_handler_test.go:109 — Two Expect calls in one It block. It("returns a generic client message (not the internal error detail)") calls Expect twice (ContainSubstring + NotTo ContainSubstring). Project convention requires one Expect per It. Split into two It blocks. [MINOR] internal/wfengine/engine_integration_test.go:251 — Four Expect calls in one It block ("serves POST /admin/workflows/_poc/cover/{bookID} with 202 and an instance_id"). Pre-existing pattern carried from the ping route test, but still violates one-Expect-per-It. ### Summary Workflow determinism: CLEAN. new(CoverActivities) in workflow body only supplies fn.Name reflection for activity lookup; the registered instance (with real generate closure) executes. No I/O, no time.Now, no randomness in workflow body. bookID validation: ParseInt + <= 0 guard before any workflow call. No panic on bad input. Reuse wrapper: BuildGenerateFunc delegates to buildGenerateHandler; no-op callbacks do not drop errors. Error propagation intact. Wiring: generateCover=nil path skips registration cleanly; StartCoverWorkflow returns typed error when not registered. Route is POST method-specific (avoids GET-/ conflict). AdminRequired applied. REVIEW VERDICT: 0 blocker, 0 major, 3 minor
zombor merged commit 22c893f8c5 into poc/go-workflows 2026-06-04 03:33:41 +00:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
zombor/pergamum!327
No description provided.