fix(wfengine): gate worker start on worker.Mode; client/diag always-on (bookshelf-vguw.17) #339
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "bd-bookshelf-vguw.17"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Engine.Start→Engine.StartWorkerto make the go-workflows client/worker split explicit.workerStarted boolfield soEngine.Stopis a no-op if the worker was never started.wfEngine.StartWorkerinapp.goonworker.ShouldStartPool(a.workerMode), mirroring the existing old-pool gating exactly.external/disabledmode the app still serves the diag UI (/admin/workflows/) and handles workflow-trigger endpoints (the client is always wired); only the poller is skipped.cmd/pergamum/worker.goalways callsStartWorker— it IS the worker fleet.Base: poc/go-workflows — do not merge to main.
Test plan
Engine — worker-mode gatingGinkgo suite covers embedded, external, and client-always-available casesEngine.Stoptests updated to reflect new no-op-if-not-started contractmake testgreen (all 395+ wfengine specs pass)make coverage100% (no exclusions added)make e2egreen (526 API + 296 browser specs)make lintclean for changed packagesCloses bead bookshelf-vguw.17 on merge.
Code Review: bd-bookshelf-vguw.17
Phase 0: DEMO verification
No DEMO block was required for this bead (internal refactor with CI as source of truth per review-standard.md). Proceeding to spec compliance.
Phase 1: Spec Compliance — PASS
All three requirements verified:
Gating correctness:
wfengine.RegisterRoutesis called at line 298 ofinternal/app/app.gounconditionally within thecfg.WFEngineEnabledblock, before thea.workerModecheck inRun. Routes (diag UI + trigger endpoints) are always mounted regardless of worker mode.StartWorkeris gated onworker.ShouldStartPool(a.workerMode)at line 975. Client methods (StartPingWorkflowetc.) operate one.createInstancewhich is wired at construction time and independent ofStartWorker. Correct.Engine.Stop safety:
Stopatinternal/wfengine/engine.go:563-573checks!e.workerStartedand returns nil immediately.workerStartedis only set totrueaftere.startWorker(ctx)returns nil. IfstartCronDispatcherthen fails,workerStartedis alreadytrue— but this is fine because the worker goroutines are live and need draining;Stopcorrectly callsstopWorkerin that case. No nil-deref, no double-stop.No regression: The embedded path (
ShouldStartPooltrue) and worker-binary path both callStartWorkerand get error propagation.app.go:978returns the error (not log-and-continue). The worker binary's log-and-continue pattern (cmd/pergamum/worker.go:176-178) pre-existed inpoc/go-workflowsand is unchanged by this diff — it is not a regression introduced here.Phase 2: Code Quality
[MINOR]
internal/wfengine/engine.go:10— stale package-doc comment saysEngine.Start/Engine.StopThe package-level godoc at line 10 still reads
Engine.Start/Engine.Stop manage the worker lifecycle.The method was renamed toStartWorker. Not a correctness issue, but confusing for any reader of the package doc. Update toEngine.StartWorker/Engine.Stop.[MINOR]
internal/wfengine/engine_lifecycle_test.go—Engine.StartWorkersuccess context missingWorkerStartedassertionThe
Context("when start succeeds")block (lines 42–52) assertsstartCalledanderrbut does not assertengine.WorkerStarted()istrue. The mode-gating suite added later does cover this viaExpect(engine.WorkerStarted()).To(BeTrue()), but the canonicalEngine.StartWorkerdescribe block leaves the key side-effect of the call unverified. Low risk since the gating suite covers it, but the unit describe is incomplete.[MINOR]
internal/wfengine/engine_lifecycle_test.go—Engine.StartWorkererror context missingWorkerStartedfalse assertionThe
Context("when start returns an error")block (lines 54–60) only asserts the wrapped error is returned. It does not assertengine.WorkerStarted()isfalse. SinceworkerStartedis set aftere.startWorkerreturns nil, the false-on-error behaviour is correct in code, but unasserted in tests. The gating suite does not cover this failure subpath.Summary: The core gating logic is correct.
Stopis safe as a no-op whenStartWorkerwas never called. Routes and client are always available. The worker-binary log-and-continue was pre-existing. Findings are minors only — stale comment and two missing test assertions.REVIEW VERDICT: 0 blocker, 0 major, 3 minor