Skip to content

Expose DCR config in operator CRD for OAuth2 upstreams#5069

Draft
tgrunnagle wants to merge 1 commit intoissue_5039_dcr-2cfrom
issue_5040_dcr-2d
Draft

Expose DCR config in operator CRD for OAuth2 upstreams#5069
tgrunnagle wants to merge 1 commit intoissue_5039_dcr-2cfrom
issue_5040_dcr-2d

Conversation

@tgrunnagle
Copy link
Copy Markdown
Contributor

DRAFT - not ready for review

Expose DCR config in operator CRD for OAuth2 upstreams

Closes #5040

Summary

Phase 2 of the DCR story (#4978) finishes by surfacing Dynamic Client Registration (RFC 7591) in the operator API so Kubernetes users can configure DCR on VirtualMCPServer OAuth2 upstreams. The CRD changes were intentionally held back until the authserver plumbing landed, and this PR threads the new dcrConfig field from the CRD through conversion to the existing authserver.DCRUpstreamConfig runtime type. Validation lives in CEL on the CRD plus a defense-in-depth check at reconcile time, and the initial access token follows the same Secret-ref → env-var pattern already used by ClientSecretRef.

Changes Made

Operator API (cmd/thv-operator/api/v1beta1/)

  • Add DCRUpstreamConfig type with discoveryUrl, registrationEndpoint, initialAccessTokenRef (a corev1.SecretKeySelector), softwareId, and softwareStatement fields.
  • Add optional dcrConfig *DCRUpstreamConfig to OAuth2UpstreamConfig and make clientId optional.
  • CEL validation enforces: exactly one of clientId or dcrConfig on OAuth2UpstreamConfig; exactly one of discoveryUrl or registrationEndpoint inside dcrConfig; and a fail-closed discriminator on UpstreamProviderConfig so unknown future provider types are rejected at admission instead of silently accepted.
  • MaxLength=255 on softwareId and MaxLength=4096 on softwareStatement to bound CR size against etcd object limits.

Conversion (cmd/thv-operator/pkg/controllerutil/authserver.go)

  • Map DCRConfig from the CRD onto authserver.DCRUpstreamConfig in the OAuth2 upstream branch, including resolving InitialAccessTokenRef to a TOOLHIVE_UPSTREAM_DCR_INITIAL_ACCESS_TOKEN_<PROVIDER> env var sourced from the referenced Secret — same pattern as ClientSecretRef.
  • Call the exported ValidateOAuth2DCRConfig from BuildAuthServerRunConfig so malformed ClientID/DCRConfig pairs that bypass admission still fail at reconcile time rather than at authserver startup. OIDC and OAuth2 branches were split into helpers to keep gocyclo within limits.

Authserver runtime (pkg/authserver/, pkg/oauthproto/, pkg/auth/)

  • DCR credential store and resolver: pkg/authserver/runner/dcr.go, dcr_store.go. In-memory store keyed by (Issuer, RedirectURI, ScopesHash), no TTL (RFC 7591 registrations are long-lived). The key shape is designed to compose a Redis segment in Phase 3 without redefining the canonical form.
  • EmbeddedAuthServer now owns a DCRCredentialStore and resolves DCR for any OAuth2 upstream with DCRConfig before building the upstream config. The resolved ClientSecret is overlaid on the built upstream.OAuth2Config via a new applyResolutionToOAuth2Config helper.
  • Each UpstreamRunConfig element is shallow-copied and its OAuth2 sub-config deep-copied before DCR resolution to honor the "Copy Before Mutating Caller Input" rule.
  • The /oauth/register handler emits a structured Info log on success with issuer, client_id, software_id, token_endpoint_auth_method, and scopes. software_id is threaded through ValidateDCRRequest and capped at 256 printable-ASCII characters.
  • MonitoredTokenSource gains required upstream and clientID constructor parameters so DCR remediation warnings carry correlation fields, and the runner prefers the DCR-cached client_id over the statically configured one.
  • pkg/oauth is renamed to pkg/oauthproto and the DCR/discovery primitives from pkg/auth/oauth/dynamic_registration.go and pkg/auth/discovery/ are consolidated there. Callers updated.

Generated artifacts

  • zz_generated.deepcopy.go, deploy/charts/operator-crds/**/*.yaml, docs/operator/crd-api.md, and docs/server/{docs.go,swagger.json,swagger.yaml} regenerated via task gen / task crdref-gen / task operator-manifests. No hand edits.

Implementation Details

  • Fail-closed CEL discriminator on UpstreamProviderConfig. Restructured the previous binary check into a chain ending in explicit false, with a contributor-facing doc comment reminding future authors to extend both the CEL rule and validateUpstreamProvider when adding a new UpstreamProviderType.
  • No Unicode smart quotes in kubebuilder markers. gofmt's doc-comment formatter was reintroducing U+201D into != '' markers on every lint-fix. Switched to CEL's size(...) > 0 idiom so regeneration is idempotent.
  • Single error log at the boundary. The DCR resolver now wraps step failures in DCRStepError and the boundary caller (buildUpstreamConfigs) emits a single slog.Error via LogDCRStepError, instead of every error branch logging on its own.
  • URL-trailing punctuation preserved. sanitizeErrorForLog's URL-stripping regex was greedily consuming sentence-ending punctuation. Trailing terminal punctuation is now split off the match before parsing and re-appended after the query is stripped, with regression coverage for commas, periods, parens, and quoted URLs.

Testing

  • cmd/thv-operator/api/v1beta1/mcpexternalauthconfig_types_test.go: table-driven CEL validation covering dcrConfig+clientId conflict, both endpoints set, neither endpoint set, valid single-endpoint cases, OIDC-typed provider rejecting an OAuth2 dcrConfig, and the non-DCR clientId-only path.
  • cmd/thv-operator/pkg/controllerutil/authserver_test.go: new TestBuildAuthServerRunConfig_InvalidDCR suite covering all four invalid DCRConfig/ClientID pairings at the conversion layer; conversion tests for discoveryUrl/registrationEndpoint paths and initialAccessTokenRef env-var wiring; assertion that the upstream name appears exactly once in scoped error strings.
  • pkg/authserver/runner/dcr_test.go, dcr_store_test.go, embeddedauthserver_test.go: full DCR boot path against a mock authorization server, cache-hit short-circuit asserting zero additional HTTP requests, caller's original RunConfig.Upstreams slice element unchanged across calls, TestNewEmbeddedAuthServer_DCRBoot driving the full constructor and asserting dcrStore is populated after boot.
  • pkg/oauthproto/dcr_test.go, discovery_test.go: relocated and extended coverage for DCR primitives and AS metadata discovery in the consolidated package.
  • pkg/authserver/server/handlers/handler_chain_test.go: TestHandler_issuer so a real wiring bug fails loudly instead of logging issuer="".
  • task lint, task lint-fix, task license-check, and the affected unit tests all pass; CRD/deepcopy regeneration is idempotent.

API Compatibility

  • This PR does not break the v1beta1 API. dcrConfig is a new optional field, clientId becomes optional but the existing CEL rule requires exactly one of clientId or dcrConfig, so previously-valid CRs that only set clientId continue to validate.

Does this introduce a user-facing change?

Yes. Cluster admins can now configure RFC 7591 Dynamic Client Registration on VirtualMCPServer OAuth2 upstreams via the new dcrConfig block on OAuth2UpstreamConfig, including a Secret-backed initial access token, a softwareId/softwareStatement pair, and either an RFC 8414 discoveryUrl or an explicit registrationEndpoint. ToolHive registers a client with the upstream on first use and caches the resulting credentials in memory.

Special notes for reviewers

  • This unblocks Persistent DCRCredentialStore backends (memory + Redis) #4979 (Phase 3 persistent backends). The DCRCredentialStore interface is the drop-in point — no further interface churn is expected when the Redis-backed implementation lands.
  • The pkg/oauthpkg/oauthproto rename is in this PR because the DCR resolver imports the consolidated package; it touches a number of import statements but is otherwise a pure rename with no behavior change.
  • MonitoredTokenSource's upstream/clientID constructor parameters are now required (replacing the previous SetUpstreamContext setter). Call sites were updated; the change forces compile-time visibility of the correlation fields rather than relying on a post-construction setter.

@tgrunnagle tgrunnagle changed the base branch from main to issue_5039_dcr-2c April 27, 2026 13:05
@github-actions github-actions Bot added the size/XL Extra large PR: 1000+ lines changed label Apr 27, 2026
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large PR Detected

This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.

How to unblock this PR:

Add a section to your PR description with the following format:

## Large PR Justification

[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformation

Alternative:

Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.

See our Contributing Guidelines for more details.


This review will be automatically dismissed once you add the justification section.

@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 27, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 27, 2026

Codecov Report

❌ Patch coverage is 92.37288% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 67.52%. Comparing base (5f1d5ff) to head (64feb73).

Files with missing lines Patch % Lines
cmd/thv-operator/pkg/controllerutil/authserver.go 90.81% 7 Missing and 2 partials ⚠️
Additional details and impacted files
@@                  Coverage Diff                  @@
##           issue_5039_dcr-2c    #5069      +/-   ##
=====================================================
- Coverage              67.54%   67.52%   -0.02%     
=====================================================
  Files                    601      601              
  Lines                  61257    61278      +21     
=====================================================
+ Hits                   41374    41378       +4     
- Misses                 16760    16776      +16     
- Partials                3123     3124       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 27, 2026
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 27, 2026
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 27, 2026
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 28, 2026
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 29, 2026
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 29, 2026
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 29, 2026
@tgrunnagle tgrunnagle force-pushed the issue_5039_dcr-2c branch from 063e820 to 0d9a4d0 Compare May 1, 2026 13:46
@tgrunnagle tgrunnagle force-pushed the issue_5040_dcr-2d branch from 9345b3f to 96703a9 Compare May 1, 2026 13:53
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels May 1, 2026
Implements changes for issue #5040 (Phase 2 DCR CRD surface):

- Add DCRUpstreamConfig CRD type (discoveryUrl, registrationEndpoint,
  initialAccessTokenRef, softwareId, softwareStatement) and a new
  dcrConfig field on OAuth2UpstreamConfig so Kubernetes users can
  configure RFC 7591 Dynamic Client Registration on upstream providers.
- Make OAuth2UpstreamConfig.clientId optional and add CEL validation
  requiring exactly one of clientId or dcrConfig, and exactly one of
  discoveryUrl or registrationEndpoint inside dcrConfig. Mirror the
  checks at runtime via validateOAuth2DCRConfig for defense-in-depth.
- Wire the conversion in controllerutil/authserver.go so DCRConfig is
  mapped onto authserver.DCRUpstreamConfig. InitialAccessTokenRef is
  resolved to an env var (TOOLHIVE_UPSTREAM_DCR_INITIAL_ACCESS_TOKEN_*)
  populated from the referenced Secret, mirroring the ClientSecretRef
  pattern. Extract small helpers for env-var generation to keep
  cyclomatic complexity within lint limits.
- Regenerate zz_generated.deepcopy.go, CRD YAML manifests, and CRD API
  reference docs.
- Add table-driven validation tests covering DCR+ClientID conflict,
  both endpoints set, neither endpoint set, valid single-endpoint
  cases, and neither-auth configuration. Add conversion tests covering
  DCR discoveryUrl/registrationEndpoint paths and initial-access-token
  env var wiring.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Address code review feedback

Fixed issues from code review of the DCR CRD surface commit:

- CRITICAL: CEL markers contained a Unicode smart quote (U+201D) that
  gofmt's doc-comment formatter reintroduced on every lint-fix. Rewrote
  both markers to use CEL's size(...) > 0 idiom instead of `!= ''`, which
  sidesteps the typographic normalization entirely and keeps regeneration
  idempotent. Verified no U+2018-U+201F characters remain in source or CRDs.
- HIGH: buildUpstreamRunConfig now calls the exported
  mcpv1beta1.ValidateOAuth2DCRConfig before producing a RunConfig, so
  malformed ClientID/DCRConfig pairs that bypass admission fail at
  reconcile time rather than at authserver startup. Error propagation
  threaded through BuildAuthServerRunConfig; split OIDC and OAuth2
  branches into helpers to stay under the gocyclo limit.
- HIGH: Added table case exercising validateUpstreamProvider rejection
  of an OIDC-typed provider whose OAuth2Config carries a DCRConfig.
- MEDIUM: Added kubebuilder CEL XValidation on UpstreamProviderConfig
  enforcing oidcConfig/oauth2Config mutual exclusivity paired to the
  declared type, closing the silent-pod-failure YAML-apply gap.
- MEDIUM: Added MaxLength=255 to SoftwareID and MaxLength=4096 to
  SoftwareStatement to prevent unbounded input from inflating CRs
  beyond etcd object limits.
- MEDIUM: Pinned the "neither ClientID nor DCRConfig" error assertion to
  the scoped `oauth2Config:` prefix; added a regression case exercising
  the non-DCR OAuth2 path (ClientID only, DCRConfig nil); added a new
  TestBuildAuthServerRunConfig_InvalidDCR suite covering all four
  invalid DCR/ClientID pairings at the conversion layer.
- MEDIUM: Renamed UpstreamDCRInitialAccessTokenEnvVar to
  UpstreamDCRInitialAccessTokenEnvVarPrefix and expanded the godoc on
  both prefix constants to show the resolved <prefix>_<PROVIDER> form.

All task lint/lint-fix/license-check pass; regenerated CRDs and
deepcopy are idempotent; affected unit tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Address iteration-2 review feedback

Polish items raised in the second review pass:

- MEDIUM: Trim duplicate upstream name from reconcile-time DCR validation
  errors. Added scopedFieldPath() helper in
  cmd/thv-operator/api/v1beta1/mcpexternalauthconfig_types.go so
  ValidateOAuth2DCRConfig prepends a dotted prefix only when one is
  given, and the conversion call site now passes an empty prefix so
  BuildAuthServerRunConfig's outer "upstream %q: %w" wrap is the only
  mention of the upstream name. Strengthened
  TestBuildAuthServerRunConfig_InvalidDCR to assert the upstream name
  appears exactly once in the error string.
- MEDIUM: Make the UpstreamProviderConfig CEL rule fail closed for
  unrecognized future provider types. Restructured the rule from a
  binary discriminator into a chain of equality checks ending in an
  explicit `false`, and updated the message to "type must be 'oidc'
  or 'oauth2'; ...". Added a contributor-facing doc comment reminding
  future authors to extend both the rule and validateUpstreamProvider
  when adding a new UpstreamProviderType.
- MEDIUM: Refresh the godoc on extractUpstreamSecretRefs to describe
  the actual invariants that hold post-CEL: OIDC providers can only
  return a clientSecretRef; OAuth2 providers can return both
  independently; other (currently unreachable) types return nil/nil.
  Cross-linked to the CEL rule and noted that BuildAuthServerRunConfig
  is the reconcile-time backstop callers should not rely on this
  helper to enforce.

Regenerated CRD YAMLs and crd-api.md prose. task lint, lint-fix,
license-check, and the affected unit tests pass; regeneration is
idempotent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@tgrunnagle tgrunnagle force-pushed the issue_5040_dcr-2d branch from 96703a9 to 64feb73 Compare May 1, 2026 16:55
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels May 1, 2026
Copy link
Copy Markdown
Contributor Author

@tgrunnagle tgrunnagle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multi-Agent Consensus Review

Agents consulted: pr-reviewer × 4 specialists (CRD/CEL design, OAuth/security, Go architecture, test quality). Codex cross-review skipped (codex CLI not installed). Posted as a COMMENT event because this PR is a draft authored by the reviewer's GitHub account.

Consensus Summary

# Finding Consensus Severity Action
1 softwareStatement 4096-char cap may reject realistic JWTs (x5c cert chains, OIDC Federation) 10/10 MEDIUM Discuss/Fix
2 DCR URL patterns allow http:// and lack whitespace constraints 9/10 MEDIUM Fix
3 ClientSecretRef + DCRConfig together silently accepted 8/10 MEDIUM Fix
4 ValidateOAuth2DCRConfig prefix parameter is unenforced convention 8/10 MEDIUM Fix
5 DCR env-var test does not assert SecretKeyRef.Key 8/10 MEDIUM Fix
6 MaxLength=2048 doc-comment rationale is incorrect 8/10 LOW Fix
7 "OIDC+DCRConfig" test misleadingly named — fires OIDC discriminator, not DCR check 8/10 LOW Fix
8 buildOAuth2UpstreamRunConfig redundant params + signature asymmetry with OIDC sibling 8/10 LOW Fix
9 CEL discriminator vs Go validator messages diverge 7/10 LOW Polish
10 dcrConfig: {} produces less-specific outer error 7/10 LOW Polish
11 softwareStatement plaintext in CR spec (no Secret-ref alternative) 7/10 LOW Discuss
12 Dead defensive nil-check on provider.OAuth2Config 7/10 LOW Polish
13 extractUpstreamSecretRefs named-return convention inconsistent 7/10 LOW Polish
14 Multi-upstream + long-name env-var coverage missing 7/10 LOW Polish
15 Boundary length tests missing 7/10 LOW Polish
16 TestBuildAuthServerRunConfig_InvalidDCR has 3 redundant cases 7/10 LOW Polish

Overall

This is a well-thought-through operator-side slice of the larger DCR feature. The CRD additions follow established kubebuilder patterns, the conversion-layer refactor is clean, and the defense-in-depth between CEL and Go validators is the right architecture. The PR description is unusually thorough — it walks reviewers through structural choices (XOR rules, fail-closed discriminator, the gofmt smart-quote trap, the URL-trailing-punctuation regression) and explains why each one exists. That level of write-up paid off: there are no architectural blockers in the diff.

The four MEDIUMs are worth resolving before this exits draft. Three of them (F1, F2, F3) are incomplete-validation cases — the XOR contracts in this PR are correct as far as they go, but they have edges that admission silently accepts: a too-tight softwareStatement cap that some real-world IdPs will cross, plaintext http:// on credential-bearing endpoints, and the missing clientSecretRefdcrConfig mutual-exclusivity check. F4 is the architectural one — the prefix-string parameter on the validator is a documented convention with no compile-time enforcement, and a future caller passing the upstream name would silently produce duplicated identifiers.

The cross-cutting theme on the LOWs is CEL ↔ Go validator alignment — F6, F9, F2 all touch the contract that the two layers should say the same things at the same scope. Worth doing once now while the contract is fresh, rather than ratcheting it after the next XValidation rule lands. Tests are solid in shape but have two specific blind spots: missing SecretKeyRef.Key assertions (F5 — the test currently can't tell if the DCR initial-access-token Key gets crossed with the client-secret Key) and missing multi-upstream / boundary scenarios (F14, F15).

Documentation

Two doc fixes are propagated artifacts — the MaxLength=2048 rationale (F6) lives in the source field comments and is auto-replicated into both CRD YAMLs and docs/operator/crd-api.md. After correcting the source comment, re-run task operator-manifests and task crdref-gen so the propagated text picks up the correction.


Generated with Claude Code

// server trusts. Bounded to 4096 characters to prevent unbounded
// user input from inflating the CR beyond etcd object limits.
// +optional
// +kubebuilder:validation:MaxLength=4096
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

F1 / MEDIUM / consensus 10/10 — softwareStatement MaxLength=4096 may reject realistic signed JWTs (raised by CRD-specialist + security-specialist)

Real-world RFC 7591 software_statements with embedded x5c certificate chains, JWKS keys, or OIDC-Federation trust-framework metadata routinely exceed 4 KB. A two-cert RSA-2048 chain alone is ~3 KB base64url-encoded; with payload + signature you cross 4 KB easily. The cap will reject otherwise-valid statements at admission with a generic "too long" error.

Either raise the limit (e.g. to 16384 — still well under etcd object limits) or document the cap as a deliberate constraint with operator-facing guidance for what to do when their IdP exceeds it.

// MaxLength bounds CEL cost estimation on the parent struct's
// XValidation rule and matches the conventional URL length cap.
// +optional
// +kubebuilder:validation:Pattern=`^https?://.*$`
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

F2 / MEDIUM / consensus 9/10 — DCR URL pattern allows http:// and lacks whitespace constraints (raised by security-specialist)

The ^https?://.*$ pattern on discoveryUrl (here) and registrationEndpoint (line 488) accepts plaintext http:// to arbitrary hosts. Per RFC 7591 §3 and RFC 8414 §3, the registration endpoint MUST be HTTPS — it carries the initial access token (a bearer credential) and the issued client_secret in the response. The runtime DCRUpstreamConfig.Validate() enforces HTTPS-or-loopback, but admission silently accepts the misconfiguration today, shifting the failure from kubectl apply to authserver CrashLoopBackOff. The OIDCUpstreamConfig.IssuerURL precedent in this same file (line 323) already uses ^https://.*$.

Separately, the existing UserInfo patterns (lines 181, 191) use the more restrictive ^https?://[^\s?#]+[^/\s?#]$ which rejects whitespace, fragments, and trailing slashes — the DCR fields use the looser variant.

Suggested change
// +kubebuilder:validation:Pattern=`^https?://.*$`
// +kubebuilder:validation:Pattern=`^https://[^\s?#]+[^/\s?#]$`

Apply the same change at line 488 for RegistrationEndpoint. If http:// on loopback must be supported for parity with the runtime layer's local-dev allowance, encode that explicitly rather than allowing arbitrary http://.

// Exported so the controllerutil conversion layer can reuse the same
// invariants when building runtime configs, rejecting malformed objects at
// reconcile time rather than waiting until the authserver process parses them.
func ValidateOAuth2DCRConfig(prefix string, cfg *OAuth2UpstreamConfig) error {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

F3 / MEDIUM / consensus 8/10 — ClientSecretRef + DCRConfig together is silently accepted (raised by security-specialist)

ValidateOAuth2DCRConfig (and the matching CEL on OAuth2UpstreamConfig) only enforces XOR between ClientID and DCRConfig. A user can still set ClientSecretRef alongside DCRConfig, but in DCR mode the client_secret comes from the registration response (RFC 7591 §3.2.1 makes the response authoritative). The static ClientSecretRef is then either dead config or a competing source of truth — a future-debugging trap either way.

Add the rejection here and a matching CEL counterpart on OAuth2UpstreamConfig (e.g. !has(self.dcrConfig) || !has(self.clientSecretRef)), plus a new test case in both TestMCPExternalAuthConfig_validateUpstreamProvider and TestBuildAuthServerRunConfig_InvalidDCR.

// the prefix is empty, it returns just the field name so callers that already
// supply their own scope (e.g., a wrapping `fmt.Errorf("upstream %q: %w", ...)`)
// don't end up with a leading dot.
func scopedFieldPath(prefix, field string) string {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

F4 / MEDIUM / consensus 8/10 — prefix parameter is an unenforced convention; scopedFieldPath is single-use (raised by Go-architecture-specialist)

The prefix parameter on ValidateOAuth2DCRConfig (above, line 1083) has two implicit modes: pass upstreamProviders[i] from admission, pass "" from the conversion layer (so BuildAuthServerRunConfig's outer upstream %q: %w wrap doesn't duplicate the name). The convention lives in two doc comments — there is no compile-time enforcement. A third caller passing the upstream name would silently produce a doubly-prefixed message, and the strings.Count(err.Error(), "acme-idp") == 1 test guards only the path that exists today.

Two cleaner shapes:

  • Split into validateOAuth2DCRConfigScoped(prefix, cfg) error (admission) and ValidateOAuth2DCRConfig(cfg) error (always unscoped, callers wrap).
  • Drop the prefix parameter entirely; always emit oauth2Config: ... and let every caller wrap with their own scope.

Either change makes scopedFieldPath (this line, single caller) disappear, which is the right outcome — it doesn't pull its weight as a separate helper.

},
},
wantEnvNames: []string{UpstreamDCRInitialAccessTokenEnvVarPrefix + "_ACME_IDP"},
wantSecretNames: []string{"acme-dcr-token"},
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

F5 / MEDIUM / consensus 8/10 — DCR env-var test does not assert SecretKeyRef.Key (raised by test-specialist)

wantSecretNames only checks SecretKeyRef.Name. The next test row at line 558 uses two different Keys ("client-secret" for ClientSecretRef and "token" for the DCR InitialAccessTokenRef) — the perfect case to also pin Key, but the assertion at line 600 ignores it. A refactor that swapped the Keys would still pass the test.

Add a parallel wantSecretKeys []string field to the test struct and assert envVars[i].ValueFrom.SecretKeyRef.Key alongside Name.

}

// Validate OAuth2-specific DCR / ClientID constraints (defense-in-depth with CEL).
if provider.Type == UpstreamProviderTypeOAuth2 && provider.OAuth2Config != nil {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

F12 / LOW / consensus 7/10 — Dead defensive nil-check on provider.OAuth2Config (raised by Go-architecture-specialist)

The pairing check at line 1052 already guarantees: if Type is OAuth2 and we're past that check, OAuth2Config != nil. The second nil-check here is dead.

Suggested change
if provider.Type == UpstreamProviderTypeOAuth2 && provider.OAuth2Config != nil {
if provider.Type == UpstreamProviderTypeOAuth2 {

//
// Callers must not rely on the third bullet to mask an admission-bypassing
// object — `BuildAuthServerRunConfig` is the reconcile-time backstop for that.
func extractUpstreamSecretRefs(
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

F13 / LOW / consensus 7/10 — Named-return convention is inconsistent with sibling helpers (raised by security + Go specialists)

extractUpstreamSecretRefs declares named returns (clientSecretRef, initialAccessTokenRef *mcpv1beta1.SecretKeyRef) but ends with an explicit return clientSecretRef, initialAccessTokenRef, not a naked return. The named returns only document the result shape; siblings (buildOIDCUpstreamRunConfig, buildOAuth2UpstreamRunConfig, buildDCRUpstreamRunConfig) all use unnamed returns. Pick one convention.

wantSecretNames: []string{"okta-secret", "github-secret"},
},
{
name: "OAuth2 provider with DCR initial access token ref emits separate env var",
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

F14 / LOW / consensus 7/10 — Multi-upstream + long-name env-var coverage missing (raised by test-specialist)

All new DCR rows test single-provider scenarios. The realistic case for VirtualMCPServer is multiple OAuth2 upstreams each with their own DCR. A regression that hashed names instead of sanitize-and-uppercase would not be caught by the current table.

Add a row with two OAuth2 providers, each with InitialAccessTokenRef set, asserting two distinct env vars with distinct Names and distinct per-upstream secret refs. Optionally add a row exercising a name near the 63-char DNS-label boundary.

@@ -571,6 +571,125 @@ func TestMCPExternalAuthConfig_validateUpstreamProvider(t *testing.T) {
},
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

F15 / LOW / consensus 7/10 — Boundary length tests missing for softwareStatement and URLs (raised by test-specialist)

The CRD imposes MaxLength=4096 on softwareStatement and MaxLength=2048 on URLs. Test cases use only short fixed strings. If the Go validator (ValidateOAuth2DCRConfig) is the documented reconcile-time backstop, length should be enforced there too — and tested at the boundary (cap, cap+1, cap-1). If only CEL enforces lengths, the asymmetry is itself a defense-in-depth gap worth calling out.

// mutual-exclusivity invariants (matching the CEL XValidation rules on
// OAuth2UpstreamConfig and DCRUpstreamConfig). This is the reconcile-time
// defense-in-depth against stored objects that bypassed admission.
func TestBuildAuthServerRunConfig_InvalidDCR(t *testing.T) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

F16 / LOW / consensus 7/10 — TestBuildAuthServerRunConfig_InvalidDCR has 3 redundant cases (raised by test-specialist)

All four cases here mirror cases in the validateUpstreamProvider table (types_test.go lines 242, 260, 278, 293). Both call paths route through ValidateOAuth2DCRConfig — same validator. The unique value this function adds is (a) the outer upstream %q: wrap and (b) the upstream-name-appears-once invariant — neither requires four cases.

Reduce to a single representative case (e.g. "both ClientID and DCRConfig set") that asserts the outer wrap, the substring count, and the error content. Or document explicitly per-case what each one uniquely contributes that the types-level table doesn't.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL Extra large PR: 1000+ lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Authserver DCR: expose DCR config in operator CRD (Phase 2, CRD surface)

1 participant