fix(wfengine): gate worker start on worker.Mode; client/diag always-on (bookshelf-vguw.17) #339

Merged
zombor merged 1 commit from bd-bookshelf-vguw.17 into poc/go-workflows 2026-06-04 22:45:36 +00:00
Owner

Summary

  • Renames Engine.StartEngine.StartWorker to make the go-workflows client/worker split explicit.
  • Adds workerStarted bool field so Engine.Stop is a no-op if the worker was never started.
  • Gates wfEngine.StartWorker in app.go on worker.ShouldStartPool(a.workerMode), mirroring the existing old-pool gating exactly.
  • In external/disabled mode the app still serves the diag UI (/admin/workflows/) and handles workflow-trigger endpoints (the client is always wired); only the poller is skipped.
  • cmd/pergamum/worker.go always calls StartWorker — it IS the worker fleet.
  • Result: total concurrent activities = (worker fleet size) × MaxParallelActivityTasks; no double-fleet in external mode.

Base: poc/go-workflows — do not merge to main.

Test plan

  • New Engine — worker-mode gating Ginkgo suite covers embedded, external, and client-always-available cases
  • Existing Engine.Stop tests updated to reflect new no-op-if-not-started contract
  • make test green (all 395+ wfengine specs pass)
  • make coverage 100% (no exclusions added)
  • make e2e green (526 API + 296 browser specs)
  • make lint clean for changed packages

Closes bead bookshelf-vguw.17 on merge.

## Summary - Renames `Engine.Start` → `Engine.StartWorker` to make the go-workflows client/worker split explicit. - Adds `workerStarted bool` field so `Engine.Stop` is a no-op if the worker was never started. - Gates `wfEngine.StartWorker` in `app.go` on `worker.ShouldStartPool(a.workerMode)`, mirroring the existing old-pool gating exactly. - In `external`/`disabled` mode the app still serves the diag UI (`/admin/workflows/`) and handles workflow-trigger endpoints (the client is always wired); only the poller is skipped. - `cmd/pergamum/worker.go` always calls `StartWorker` — it IS the worker fleet. - Result: total concurrent activities = (worker fleet size) × MaxParallelActivityTasks; no double-fleet in external mode. **Base: poc/go-workflows — do not merge to main.** ## Test plan - [x] New `Engine — worker-mode gating` Ginkgo suite covers embedded, external, and client-always-available cases - [x] Existing `Engine.Stop` tests updated to reflect new no-op-if-not-started contract - [x] `make test` green (all 395+ wfengine specs pass) - [x] `make coverage` 100% (no exclusions added) - [x] `make e2e` green (526 API + 296 browser specs) - [x] `make lint` clean for changed packages Closes bead bookshelf-vguw.17 on merge.
fix(wfengine): gate worker start on worker.Mode; client/diag always-on (bookshelf-vguw.17)
All checks were successful
/ Lint (pull_request) Successful in 2m39s
/ Test (pull_request) Successful in 3m10s
/ Integration (pull_request) Successful in 4m20s
/ E2E API (pull_request) Successful in 4m46s
/ E2E Browser (pull_request) Successful in 6m26s
8f78c2543f
Rename Engine.Start → Engine.StartWorker to make the client/worker split
explicit. Add workerStarted tracking so Engine.Stop is a no-op when the
worker was never started (external/disabled mode).

app.go: gate wfEngine.StartWorker on worker.ShouldStartPool(a.workerMode),
mirroring the existing old-pool gating. In external/disabled mode the app
still serves the diag UI and handles workflow-trigger endpoints (client is
always available); only the poller is skipped.

cmd/pergamum/worker.go: always calls StartWorker (it IS the worker fleet).

Result: total concurrent activities = (worker fleet size) × MaxParallelActivityTasks.
The app is no longer an accidental extra worker fleet in external mode.
Author
Owner

Code Review: bd-bookshelf-vguw.17

Phase 0: DEMO verification

No DEMO block was required for this bead (internal refactor with CI as source of truth per review-standard.md). Proceeding to spec compliance.


Phase 1: Spec Compliance — PASS

All three requirements verified:

  1. Gating correctness: wfengine.RegisterRoutes is called at line 298 of internal/app/app.go unconditionally within the cfg.WFEngineEnabled block, before the a.workerMode check in Run. Routes (diag UI + trigger endpoints) are always mounted regardless of worker mode. StartWorker is gated on worker.ShouldStartPool(a.workerMode) at line 975. Client methods (StartPingWorkflow etc.) operate on e.createInstance which is wired at construction time and independent of StartWorker. Correct.

  2. Engine.Stop safety: Stop at internal/wfengine/engine.go:563-573 checks !e.workerStarted and returns nil immediately. workerStarted is only set to true after e.startWorker(ctx) returns nil. If startCronDispatcher then fails, workerStarted is already true — but this is fine because the worker goroutines are live and need draining; Stop correctly calls stopWorker in that case. No nil-deref, no double-stop.

  3. No regression: The embedded path (ShouldStartPool true) and worker-binary path both call StartWorker and get error propagation. app.go:978 returns the error (not log-and-continue). The worker binary's log-and-continue pattern (cmd/pergamum/worker.go:176-178) pre-existed in poc/go-workflows and is unchanged by this diff — it is not a regression introduced here.


Phase 2: Code Quality

[MINOR] internal/wfengine/engine.go:10 — stale package-doc comment says Engine.Start/Engine.Stop
The package-level godoc at line 10 still reads Engine.Start/Engine.Stop manage the worker lifecycle. The method was renamed to StartWorker. Not a correctness issue, but confusing for any reader of the package doc. Update to Engine.StartWorker/Engine.Stop.

[MINOR] internal/wfengine/engine_lifecycle_test.goEngine.StartWorker success context missing WorkerStarted assertion
The Context("when start succeeds") block (lines 42–52) asserts startCalled and err but does not assert engine.WorkerStarted() is true. The mode-gating suite added later does cover this via Expect(engine.WorkerStarted()).To(BeTrue()), but the canonical Engine.StartWorker describe block leaves the key side-effect of the call unverified. Low risk since the gating suite covers it, but the unit describe is incomplete.

[MINOR] internal/wfengine/engine_lifecycle_test.goEngine.StartWorker error context missing WorkerStarted false assertion
The Context("when start returns an error") block (lines 54–60) only asserts the wrapped error is returned. It does not assert engine.WorkerStarted() is false. Since workerStarted is set after e.startWorker returns nil, the false-on-error behaviour is correct in code, but unasserted in tests. The gating suite does not cover this failure subpath.


Summary: The core gating logic is correct. Stop is safe as a no-op when StartWorker was never called. Routes and client are always available. The worker-binary log-and-continue was pre-existing. Findings are minors only — stale comment and two missing test assertions.

REVIEW VERDICT: 0 blocker, 0 major, 3 minor

## Code Review: bd-bookshelf-vguw.17 ### Phase 0: DEMO verification No DEMO block was required for this bead (internal refactor with CI as source of truth per review-standard.md). Proceeding to spec compliance. --- ### Phase 1: Spec Compliance — PASS All three requirements verified: 1. **Gating correctness**: `wfengine.RegisterRoutes` is called at line 298 of `internal/app/app.go` unconditionally within the `cfg.WFEngineEnabled` block, before the `a.workerMode` check in `Run`. Routes (diag UI + trigger endpoints) are always mounted regardless of worker mode. `StartWorker` is gated on `worker.ShouldStartPool(a.workerMode)` at line 975. Client methods (`StartPingWorkflow` etc.) operate on `e.createInstance` which is wired at construction time and independent of `StartWorker`. Correct. 2. **Engine.Stop safety**: `Stop` at `internal/wfengine/engine.go:563-573` checks `!e.workerStarted` and returns nil immediately. `workerStarted` is only set to `true` after `e.startWorker(ctx)` returns nil. If `startCronDispatcher` then fails, `workerStarted` is already `true` — but this is fine because the worker goroutines are live and need draining; `Stop` correctly calls `stopWorker` in that case. No nil-deref, no double-stop. 3. **No regression**: The embedded path (`ShouldStartPool` true) and worker-binary path both call `StartWorker` and get error propagation. `app.go:978` returns the error (not log-and-continue). The worker binary's log-and-continue pattern (`cmd/pergamum/worker.go:176-178`) pre-existed in `poc/go-workflows` and is unchanged by this diff — it is not a regression introduced here. --- ### Phase 2: Code Quality [MINOR] `internal/wfengine/engine.go:10` — stale package-doc comment says `Engine.Start/Engine.Stop` The package-level godoc at line 10 still reads `Engine.Start/Engine.Stop manage the worker lifecycle.` The method was renamed to `StartWorker`. Not a correctness issue, but confusing for any reader of the package doc. Update to `Engine.StartWorker/Engine.Stop`. [MINOR] `internal/wfengine/engine_lifecycle_test.go` — `Engine.StartWorker` success context missing `WorkerStarted` assertion The `Context("when start succeeds")` block (lines 42–52) asserts `startCalled` and `err` but does not assert `engine.WorkerStarted()` is `true`. The mode-gating suite added later does cover this via `Expect(engine.WorkerStarted()).To(BeTrue())`, but the canonical `Engine.StartWorker` describe block leaves the key side-effect of the call unverified. Low risk since the gating suite covers it, but the unit describe is incomplete. [MINOR] `internal/wfengine/engine_lifecycle_test.go` — `Engine.StartWorker` error context missing `WorkerStarted` false assertion The `Context("when start returns an error")` block (lines 54–60) only asserts the wrapped error is returned. It does not assert `engine.WorkerStarted()` is `false`. Since `workerStarted` is set after `e.startWorker` returns nil, the false-on-error behaviour is correct in code, but unasserted in tests. The gating suite does not cover this failure subpath. --- **Summary:** The core gating logic is correct. `Stop` is safe as a no-op when `StartWorker` was never called. Routes and client are always available. The worker-binary log-and-continue was pre-existing. Findings are minors only — stale comment and two missing test assertions. REVIEW VERDICT: 0 blocker, 0 major, 3 minor
zombor merged commit 9dc3edffed into poc/go-workflows 2026-06-04 22:45:36 +00:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
zombor/pergamum!339
No description provided.