Skip to content

Redis is a required part of our infra#2348

Open
djeebus wants to merge 2 commits intomainfrom
redis-is-required
Open

Redis is a required part of our infra#2348
djeebus wants to merge 2 commits intomainfrom
redis-is-required

Conversation

@djeebus
Copy link
Copy Markdown
Contributor

@djeebus djeebus commented Apr 9, 2026

No description provided.

@cursor
Copy link
Copy Markdown

cursor bot commented Apr 9, 2026

PR Summary

Medium Risk
This changes orchestrator initialization to hard-fail if Redis is unavailable, which can impact deployment/availability if Redis config or connectivity is wrong. It also makes Redis-based peer discovery and sandbox event delivery unconditional, so any unexpected nil Redis client behavior would surface at runtime.

Overview
This PR makes Redis a required orchestrator dependency by failing fast on any Redis client init error and then unconditionally wiring Redis into shutdown cleanup, peer registry/resolver creation, and sandbox event delivery via Redis Streams.

Reviewed by Cursor Bugbot for commit 8bb36ad. Bugbot is set up for automated code reviews on this repo. Configure here.

@djeebus djeebus marked this pull request as ready for review April 9, 2026 23:00
Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — straightforward enforcement of Redis as a required dependency.

Extended reasoning...

Overview

Single-file change to run.go that removes the ErrRedisDisabled carve-out and all redisClient \!= nil guards, making Redis unconditionally required at startup. The companion commit (21aa127) already establishes this as an intentional infrastructure decision.

Security risks

None. The change removes optional code paths rather than introducing new ones; it tightens startup requirements rather than relaxing any security boundary.

Level of scrutiny

Low. This is a small, mechanical removal of fallback/optional logic with clear and explicit intent. No new logic is introduced — only guards are deleted. The fatal-on-error path already existed for other error types.

Other factors

One nit was flagged (inline comment): when ErrRedisDisabled is returned, the log message 'Could not connect to Redis' may mislead operators into debugging connectivity rather than checking for a missing env var. This is a diagnostic quality issue only — the behavior (fatal exit) is correct and intended. The structured zap.Error(err) field will include 'redis is disabled' for operators who look, mitigating the impact.

Comment on lines +386 to 388
if err != nil {
logger.L().Fatal(ctx, "Could not connect to Redis", zap.Error(err))
} else if err == nil {
closers = append(closers, closer{"redis client", func(context.Context) error {
return sharedFactories.CloseCleanly(redisClient)
}})
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 When neither RedisURL nor RedisClusterURL is set, NewRedisClient returns ErrRedisDisabled (no connection is ever attempted), but the new code logs 'Could not connect to Redis' — implying a network or auth failure. While zap.Error(err) will include 'redis is disabled' in the structured log, operators may waste time investigating connectivity instead of checking for a missing env var. Consider detecting ErrRedisDisabled explicitly and logging 'Redis is required but not configured'.

Extended reasoning...

What the bug is: When neither RedisURL nor RedisClusterURL is configured, sharedFactories.NewRedisClient returns ErrRedisDisabled (message: "redis is disabled") with a nil client — no TCP connection is ever attempted. After this PR removes the ErrRedisDisabled exclusion, the generic if err \!= nil branch fires and logs a fatal with the primary message "Could not connect to Redis".

The specific code path: run.go:386-388 — the new if err \!= nil block no longer carves out ErrRedisDisabled, so a missing-config error and a genuine connection failure (wrong host, auth failure, TLS mismatch) both surface identically as "Could not connect to Redis".

Why existing code doesn't prevent it: The zap.Error(err) field will emit the underlying error text "redis is disabled", which partially mitigates the confusion. However, structured-log fields are often less prominent than the message string in dashboards, alerting rules, or plain-text log tailing, so the misleading primary message is what operators see first.

Impact: An operator whose deploy fails because they forgot to set REDIS_URL or REDIS_CLUSTER_URL will see "Could not connect to Redis" and naturally start debugging network connectivity, firewall rules, TLS certificates, or Redis auth — none of which are the problem.

Addressing the refutation: The refutation is correct that the behavior (fatal exit when Redis is unconfigured) is intentional and correct — this PR explicitly makes Redis required. The bug is only in the diagnostic quality of the log message. The fix does not change semantics; it simply distinguishes the two failure modes.

Proof (concrete example): Deploy orchestrator without setting any Redis env vars → NewRedisClient short-circuits at the URL check, returns (nil, ErrRedisDisabled) (redis.go:100) → err \!= nil is true → fatal: "Could not connect to Redis" {error: "redis is disabled"}. An operator reading the log message alone diagnoses a connection failure, not a misconfiguration.

Fix: if errors.Is(err, sharedFactories.ErrRedisDisabled) { logger.L().Fatal(ctx, "Redis is required but not configured", zap.Error(err)) } else if err \!= nil { logger.L().Fatal(ctx, "Could not connect to Redis", zap.Error(err)) }

Copy link
Copy Markdown
Contributor

@levb levb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1/5 (basically what Claude said ^^) Right now the error message a developer would see is "redis is disabled" wrapped inside "Could not connect to Redis", which is a bit misleading - they didn't fail to connect, they failed to configure. A dedicated config validation at startup ("REDIS_URL or REDIS_CLUSTER_URL must be set") would be clearer.

Are local runs affected by this change?

PoolSize: config.RedisPoolSize,
})
if err != nil && !errors.Is(err, sharedFactories.ErrRedisDisabled) {
if err != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1/5 client-proxy/main.go still has a similar check, should it be eliminated also?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants