feat(metadata): provider rate-limiting + Retry-After + GB quota + OL contact UA (bookshelf-0j7.15) #127
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "bd-bookshelf-0j7.15"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
internal/metadata/ratelimiter):New(interval)/Unlimited()— injected asfunc(ctx) errorinto both providers so bulk enrichment paces at ~1 req/sec (GB) / ~3 req/sec (OL) instead of bursting into 429sretryAfterSleephook before retryingisQuotaExhausteddetectsdailyLimitExceeded/userRateLimitExceededbody and retries gracefully instead of failing hardmailto:contact email to qualify for 3 req/sec identified tier--metadata-google-books-rate-limit(default 1s) and--metadata-open-library-rate-limit(default 350ms) — configurable at runtimeTest plan
make test— all unit tests passmake coverage— 100% coverage maintained on all changed packagesgo test -race ./internal/metadata/...)@(contact email)isQuotaExhausted,parseRetryAfter(both providers)Closes bead bookshelf-0j7.15 on merge.
@ -166,6 +176,8 @@ func NewFlagSet() (*ff.FlagSet, *Config) {fs.StringVar(&cfg.GoogleBooksAPIKey, 0, "google-books-api-key", cfg.GoogleBooksAPIKey, "Optional Google Books API key (unauthenticated if empty)")fs.BoolVarDefault(&cfg.MetadataProviderGoogleBooksEnabled, 0, "metadata-provider-google-books-enabled", cfg.MetadataProviderGoogleBooksEnabled, "Enable the Google Books metadata provider (default true)")fs.BoolVarDefault(&cfg.MetadataProviderOpenLibraryEnabled, 0, "metadata-provider-open-library-enabled", cfg.MetadataProviderOpenLibraryEnabled, "Enable the Open Library metadata provider (default true)")fs.DurationVar(&cfg.MetadataGoogleBooksRateLimit, 0, "metadata-google-books-rate-limit", cfg.MetadataGoogleBooksRateLimit, "Minimum interval between Google Books API requests (default 1s; 0 disables proactive rate limiting)")I'm confused about these two flags. the default here is 0 but the comment says default is 1s. how can it be both 1s and 0?
@ -29,2 +31,4 @@maxBodyBytes = 1 << 20 // 1 MiB// MaxRetryAfter is the maximum duration we will sleep when honoring a// Retry-After header. Server-specified values above this are capped.Why would we not respect the exact retry after header?
@ -32,3 +34,1 @@// consumers to identify their app and provide a contact URL or risk// throttling/blocking of anonymous clients.userAgent = "Bookshelf (+https://git.zombor.net/zombor/bookshelf)"// MaxRetryAfter is the maximum duration we will sleep when honoring aSame here, why would we cap the retry after? we should respect what the remote sent us.
@ -35,0 +38,4 @@// userAgent is sent on every outbound HTTP request. Open Library requires// consumers to include a contact URL and, ideally, an email address to qualify// for the 3 req/sec identified tier (vs 1 req/sec anonymous).userAgent = "Bookshelf (+https://git.zombor.net/zombor/bookshelf; mailto:bookshelf@zombor.net)"email should be jeremy.bush@zombor.net
@ -0,0 +1,114 @@// Package ratelimiter provides a token-bucket rate limiter for metadata providerCan we use https://pkg.go.dev/golang.org/x/time/rate instead of implementing our own?
CODE REVIEW: APPROVED — 0 blockers, 0 majors, 4 minors
Phase 0: DEMO Verification
The bead description contains no DEMO block with a runnable command + expected output. The project uses CI green as the behavioral verification mechanism, and the completion comment confirms all 5 CI jobs are green on SHA
900ee7dc. Phase 0 passes on that basis.Phase 1: Spec Compliance
All five spec requirements are satisfied:
internal/metadata/ratelimiter/package —New(interval)+Unlimited(), injected asfunc(ctx) errorinto both providersmailto:bookshelf@zombor.netdailyLimitExceeded/userRateLimitExceededtreated as retriableNew(0)=Unlimitedguard preventstime.NewTicker(0)panicConfig ff flags added for both rate-limit durations with correct defaults (GB 1 s, OL 350 ms).
Phase 2: Code Quality
Concurrency Correctness
The
Limiteruses a buffered channel (cap=1) as the token bucket. All reads/writes are channel operations — goroutine-safe by construction.Stop()closesdoneinside aselectguard making it idempotent.refilldefersclose(l.tokens)so any blockedWaitunblocks withok=false → ErrStopped. No mutex, no shared mutable state. Clean.Ctx cancellation in
Waitis handled by the firstselectarm. No deadlock path found.The
refillUnlimitedgoroutine is a hot loop that only exits on<-l.done. Whennilis passed toNew()in the provider,Unlimited()is called butlimis not returned to the caller — onlylim.Waitis captured. There is noStop()call path for this anonymous limiter, so its goroutine runs for the lifetime of the process. For an always-on server process this is acceptable (one goroutine per provider startup), but it is a subtle goroutine leak.Retry-After Parsing
parseRetryAfterhandles delay-seconds form viastrconv.ParseFloat. Negative and zero values → 0 (no sleep). Values above 60 s capped atMaxRetryAfter. Garbage/non-numeric → 0. All safe.The HTTP-date form (
Fri, 31 Dec 1999 23:59:59 GMT) is NOT supported —parseRetryAfterreturns 0 silently, falling through to the regular backoff. This is acceptable given Google Books and Open Library both use delay-seconds in practice, and 0 is a safe fallback.New(0)=Unlimited Guard
New(interval)hasif interval <= 0 { return Unlimited() }at line 29. This is correct and prevents the pre-existingtime.NewTicker(0)panic.Unlimited()itself never callstime.NewTicker. Guard is correct.[MINOR] internal/metadata/googlebooks/provider.go:156-159 — nil-limiter path leaks Unlimited goroutine
When
limiter == nil,ratelimiter.Unlimited()is called and onlylim.Waitis captured;limis unreachable, solim.Stop()is never called and therefillUnlimitedgoroutine runs forever. For a long-lived server process this is one goroutine perNew()call with nil limiter, which is fine in practice but differs from the documented contract. Consider using a package-levelunlimitedWaitfunc (a no-op that never blocks) instead of anUnlimited()instance, avoiding the goroutine entirely.[MINOR] internal/metadata/googlebooks/provider.go:289-296 — redundant
continueafter conditionalcontinueThe 429 branch has two separate
continuestatements: one insideif retryAfter > 0 { ... continue }and one after the closing brace. The outercontinueis unreachable whenretryAfter > 0and redundant whenretryAfter == 0. The intent is clear but the duplicate is dead code. Collapse to a singlecontinueafter theif retryAfter > 0block.[MINOR] internal/metadata/openlibrary/provider.go:233-245 — same redundant
continuepatternIdentical to the googlebooks 429 block above. Same fix applies.
[MINOR] internal/metadata/googlebooks/ratelimit_test.go:920-954 — multiple
Expectcalls inside a singleItblockThe
"Retry-After header is honored on 429"It block containsExpect(err),Expect(results),Expect(sleepCalled), andExpect(retryDelay)— four assertions in oneIt. Per project convention, eachItshould have exactly oneExpect. Split into separateItblocks sharing the sameBeforeEach/JustBeforeEach. (Same pattern in openlibrary/ratelimit_test.go:1649-1683.)REVIEW VERDICT: 0 blocker, 0 major, 4 minor
Security Review — bd-bookshelf-0j7.15 (PR #127)
Findings
[MAJOR] internal/metadata/googlebooks/provider.go:265 — API key leaks through
*url.Errorin the "request failed" logWhen
http.Client.Doreturns an error (timeout, DNS failure, connection refused), Go wraps it in a*url.ErrorwhoseError()string is"Get \"https://www.googleapis.com/books/v1/volumes?...&key=YOUR_KEY&...\": <cause>". That error value is passed directly tologger.Warn(..., "error", err, ...)(line 265) and also wrapped intolastErrwhich is returned to the caller and may be further logged upstream. TheredactKeyhelper is correctly applied to the"url"log field on all other Warn/Info lines, but this one does not log the URL at all — instead it logs the rawerrobject which contains the full URL including the API key.Fix: before logging
err, check if it is a*url.Errorand redact itsURLfield, or simply don't include aurlfield for the transport-error path (the query and attempt number are sufficient for diagnosis). Simplest:if ue := new(url.Error); errors.As(err, &ue) { ue.URL = redactKey(ue.URL) }before thelogger.Warncall.Note: this is a new transport-error scenario that could be triggered any time there is a network issue while the API key is configured. The "request failed" log path itself is pre-existing, but now that a real API key is wired in config, the impact is new.
[MINOR] internal/metadata/googlebooks/provider.go:401 / internal/metadata/openlibrary/provider.go:311 —
parseRetryAfterfloat→Duration conversion can overflow for astronomically large valuestime.Duration(secs * float64(time.Second))wheresecs≥ ~9.2e9 (i.e., a Retry-After value claiming ~292 years) causes float64 overflow to+Inf, which converts tomath.MinInt64in Go — a large negative Duration. The cap checkd > MaxRetryAfterthen evaluates false (negative < 60s), soretryAfterSleepis called with a negative duration. In practicetime.After(negative)fires immediately, so the sleep is a no-op and the DoS the attacker intended fails. This is actually safe from a security standpoint (attacker's attack backfires), but the cap semantics are violated for extreme inputs.Fix: add a float cap before conversion:
if secs > float64(MaxRetryAfter/time.Second) { return MaxRetryAfter }.[MINOR] internal/books/wire.go:67–68 / cmd/bookshelf/worker.go:186–187 —
ratelimiter.Newgoroutines are never stoppedgbLimiterandolLimiterare created inWire()andbuildPool()butStop()is never called. For a long-running process these are app-lifetime goroutines and do not leak. However, every call toWire()(exercised by e2e tests viaapp.New()) creates two goroutines with no cleanup path. IfWire()is called multiple times in a test suite (e.g. across test environments), goroutines accumulate. TheLimiterchannel has capacity 1 and blocks when full, so CPU impact is negligible, but test-suite goroutine counts will grow.Fix: return a
cleanup func()fromWire(), or register limiterStop()calls on the app's shutdown context, or expose the limiters so the caller can stop them. Alternatively, since the default-rate limiters are process-scoped singletons, create them once inapp.Newand pass them down rather than creating fresh ones inWire().Summary of non-findings (confirmed clean)
"url"log fields correctly callredactKey(reqURL). No hardcoded key anywhere. Key sourced via ff flag/env only.retryAfterSleepis context-aware and respects cancellation.limiter.Wait(ctx)is called before each HTTP request; the passedctxcarries the request/job context with its deadline. If the provider is slow the worker's context deadline fires first, aborting the wait.bookshelf@zombor.netis inuserAgentper OL's documented requirement for the identified tier. It is sent as an HTTP header to the OL server, not written to local logs. Intentional and correct.refill()callsticker.Stop()viadeferandclose(l.tokens)viadefer close. Cleanup is correct whenStop()is called.Unlimitedrefill goroutine: blocks on channel send (capacity-1 channel) when no consumer is waiting — not a busy loop.Stop()idempotency: implemented correctly viaselect { case <-l.done: /* already stopped */ default: close(l.done) }.REVIEW VERDICT: 0 blocker, 1 major, 2 minor
Addressed all 5 review comments in commit
c037e03:#1 —
internal/config/config.go(confusing0in DurationVar)The
0second argument is the short-flag rune ("no short flag"), not the default value. The actual defaults come fromDefaults()(time.Secondfor Google Books,350msfor Open Library) and are passed as the 4th argument. Added a clarifying comment near the twoDurationVarcalls to make this unambiguous.#2 —
internal/metadata/googlebooks/provider.go(Retry-After cap)Removed the 60s
MaxRetryAftercap.parseRetryAfternow respects the exact server-supplied value. Also added HTTP-date form parsing alongside delay-seconds. The sleep remains ctx-cancellable so a large value can't hang a worker indefinitely.#3 —
internal/metadata/openlibrary/provider.go(same Retry-After cap)Same fix — removed
MaxRetryAfterand its cap from the Open Library provider.#4 —
internal/metadata/openlibrary/provider.go(contact email)Updated
userAgentfrombookshelf@zombor.nettojeremy.bush@zombor.net.#5 —
internal/metadata/ratelimiter/ratelimiter.go(use x/time/rate)Replaced the hand-rolled token-bucket with
golang.org/x/time/rate. Usesrate.NewLimiter(rate.Every(interval), 1)(burst 1 preserves min-interval semantics).Stop()is now a documented no-op (no background goroutine needed).interval <= 0still givesrate.Inf(Unlimited). All callers unchanged. 100% coverage maintained.CI is green.
Fixed the MAJOR security finding (commit
72379b1):Root cause:
http.Client.Dowraps transport errors in*url.Errorwhose.Error()string contains the full request URL, including?key=API_KEY. The raw error was logged and wrapped intolastErr, leaking the key into logs and error chains.Fix applied:
googlebooks/provider.go: immediately afterDo()returns an error, useerrors.As(err, &ue)to find the*url.Errorand callredactKey(ue.URL)in-place — before logging orfmt.Errorfwrapping. Both the log"error"field and the propagated error string now carry the redacted form.openlibrary/provider.go: same pattern with a newredactURL()helper that strips the query string (OL has no API key but URL hygiene is still good practice).Also addressed:
continueafter conditionalcontinuein the 429 Retry-After branch of both providersratelimit_test.goin both packages: moved setup toBeforeEach, invocations toBeforeEach(since they depend on local mutable state captured at setup time), and split multi-ExpectItblocks to oneExpectperItlogging_test.gotest asserting the API key does not appear in logs or error strings on transport failureopenlibrary/redact_test.gocoveringRedactURLand transport error URL redactionCI green, PR #127 mergeable.
72379b13d9c5f9e30925