Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 9 additions & 11 deletions packages/orchestrator/pkg/factories/run.go
Original file line number Diff line number Diff line change
Expand Up @@ -383,17 +383,17 @@ func run(config cfg.Config, opts Options) (success bool) {
RedisTLSCABase64: config.RedisTLSCABase64,
PoolSize: config.RedisPoolSize,
})
if err != nil && !errors.Is(err, sharedFactories.ErrRedisDisabled) {
if err != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1/5 client-proxy/main.go still has a similar check, should it be eliminated also?

logger.L().Fatal(ctx, "Could not connect to Redis", zap.Error(err))
} else if err == nil {
closers = append(closers, closer{"redis client", func(context.Context) error {
return sharedFactories.CloseCleanly(redisClient)
}})
}
Comment on lines +386 to 388
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 When neither RedisURL nor RedisClusterURL is set, NewRedisClient returns ErrRedisDisabled (no connection is ever attempted), but the new code logs 'Could not connect to Redis' — implying a network or auth failure. While zap.Error(err) will include 'redis is disabled' in the structured log, operators may waste time investigating connectivity instead of checking for a missing env var. Consider detecting ErrRedisDisabled explicitly and logging 'Redis is required but not configured'.

Extended reasoning...

What the bug is: When neither RedisURL nor RedisClusterURL is configured, sharedFactories.NewRedisClient returns ErrRedisDisabled (message: "redis is disabled") with a nil client — no TCP connection is ever attempted. After this PR removes the ErrRedisDisabled exclusion, the generic if err \!= nil branch fires and logs a fatal with the primary message "Could not connect to Redis".

The specific code path: run.go:386-388 — the new if err \!= nil block no longer carves out ErrRedisDisabled, so a missing-config error and a genuine connection failure (wrong host, auth failure, TLS mismatch) both surface identically as "Could not connect to Redis".

Why existing code doesn't prevent it: The zap.Error(err) field will emit the underlying error text "redis is disabled", which partially mitigates the confusion. However, structured-log fields are often less prominent than the message string in dashboards, alerting rules, or plain-text log tailing, so the misleading primary message is what operators see first.

Impact: An operator whose deploy fails because they forgot to set REDIS_URL or REDIS_CLUSTER_URL will see "Could not connect to Redis" and naturally start debugging network connectivity, firewall rules, TLS certificates, or Redis auth — none of which are the problem.

Addressing the refutation: The refutation is correct that the behavior (fatal exit when Redis is unconfigured) is intentional and correct — this PR explicitly makes Redis required. The bug is only in the diagnostic quality of the log message. The fix does not change semantics; it simply distinguishes the two failure modes.

Proof (concrete example): Deploy orchestrator without setting any Redis env vars → NewRedisClient short-circuits at the URL check, returns (nil, ErrRedisDisabled) (redis.go:100) → err \!= nil is true → fatal: "Could not connect to Redis" {error: "redis is disabled"}. An operator reading the log message alone diagnoses a connection failure, not a misconfiguration.

Fix: if errors.Is(err, sharedFactories.ErrRedisDisabled) { logger.L().Fatal(ctx, "Redis is required but not configured", zap.Error(err)) } else if err \!= nil { logger.L().Fatal(ctx, "Could not connect to Redis", zap.Error(err)) }


closers = append(closers, closer{"redis client", func(context.Context) error {
return sharedFactories.CloseCleanly(redisClient)
}})

peerRegistry := peerclient.NopRegistry()
peerResolver := peerclient.NopResolver()
if nodeAddress := config.NodeAddress(); redisClient != nil && nodeAddress != nil {
if nodeAddress := config.NodeAddress(); nodeAddress != nil {
peerRegistry = peerclient.NewRedisRegistry(redisClient, *nodeAddress)
peerResolver = peerclient.NewResolver(peerRegistry, *nodeAddress)
}
Expand Down Expand Up @@ -453,11 +453,9 @@ func run(config cfg.Config, opts Options) (success bool) {
logger.L().Info(ctx, "cgroup accounting enabled", zap.String("root", cgroup.RootCgroupPath))

// Redis sandbox events delivery target
if redisClient != nil {
sbxEventsDeliveryRedis := event.NewRedisStreamsDelivery[event.SandboxEvent](redisClient, event.SandboxEventsStreamName)
sbxEventsDeliveryTargets = append(sbxEventsDeliveryTargets, sbxEventsDeliveryRedis)
closers = append(closers, closer{"sandbox events delivery for redis", sbxEventsDeliveryRedis.Close})
}
sbxEventsDeliveryRedis := event.NewRedisStreamsDelivery[event.SandboxEvent](redisClient, event.SandboxEventsStreamName)
sbxEventsDeliveryTargets = append(sbxEventsDeliveryTargets, sbxEventsDeliveryRedis)
closers = append(closers, closer{"sandbox events delivery for redis", sbxEventsDeliveryRedis.Close})

// sandbox observer
sandboxObserver, err := metrics.NewSandboxObserver(ctx, nodeID, serviceName, commitSHA, version, serviceInstanceID, sandboxes)
Expand Down
Loading