Skip to content

fix(aws): improve verify - check all accounts and improve role assumption reliability#739

Merged
ShubhamRasal merged 4 commits intodevfrom
improve-aws-verification
Mar 11, 2026
Merged

fix(aws): improve verify - check all accounts and improve role assumption reliability#739
ShubhamRasal merged 4 commits intodevfrom
improve-aws-verification

Conversation

@girish-cheedala
Copy link
Contributor

@girish-cheedala girish-cheedala commented Mar 11, 2026

Summary

  • Parallel account verification: Verify() now checks role assumption against all account IDs concurrently instead of only the first one.
    Reports all failed account IDs in a single actionable error message.
  • Resilient region discovery: New() now iterates through account IDs when the base session lacks DescribeRegions permission, instead of failing
    on the first account.
  • Explicit account ID in role assumption: createAssumedRoleSession now takes an explicit accountId parameter instead of hardcoding
    AccountIds[0].

Context

With org discovery, account ordering is arbitrary — the first account may not have the scanner role configured. Previously, verification only checked the first account in the list — if it passed, the entire verification was considered successful, even if other accounts lacked the scanner role. This meant broken accounts were silently skipped during enumeration. Now all accounts are verified in parallel, and the error message lists exactly which accounts failed, with guidance to add them to exclude_account_ids when using org discovery.

Changes

  • createAssumedRoleSession — accepts accountId param instead of using AccountIds[0]
  • New() — loops through accounts for DescribeRegions fallback until one succeeds
  • Verify() — checks all accounts in parallel using goroutines, collects and reports failures
    • Org discovery: suggests adding failed accounts to exclude_account_ids
    • Manual account_ids: reports failure without the suggestion

Summary by CodeRabbit

  • Bug Fixes
    • Multi-account region discovery now tries each configured account when the base user lacks region permissions, improving success rates.
    • Role verification runs in parallel across accounts for faster validation.
    • Error messages and reporting for multi-account failures have been clarified and consolidated, including guidance when organization-level discovery is configured.

…as parameter and improve error handling in Verify method

- Updated createAssumedRoleSession to take accountId as an argument, allowing for more flexible role assumption.
- Modified Verify method to iterate over AccountIds, capturing errors for each account and returning the last encountered error if all attempts fail.
…as parameter and improve error handling in Verify method

- Updated createAssumedRoleSession to take accountId as an argument, allowing for more flexible role assumption.
- Modified Verify method to iterate over AccountIds, capturing errors for each account and returning the last encountered error if all attempts fail.
…ating over AccountIds

- Enhanced the New function to attempt region retrieval using multiple AccountIds.
- Implemented error handling to continue attempts with subsequent accounts if the initial one fails.
- Ensured that a comprehensive error message is returned if all attempts to retrieve regions fail.
@neo-by-projectdiscovery-dev
Copy link

neo-by-projectdiscovery-dev bot commented Mar 11, 2026

Neo - PR Security Review

No security issues found

Highlights

  • Implements semaphore with capacity 200 to limit concurrent STS AssumeRole API calls during account verification
  • Addresses previous hardening note about rate limiting to prevent AWS API throttling
  • Proper goroutine lifecycle management with deferred semaphore release
Hardening Notes
  • Add timeout context to createAssumedRoleSession() calls in the region discovery loop (lines 253-264) to prevent indefinite hangs if an account's STS endpoint is slow

Comment @pdneo help for available commands. · Open in Neo

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 11, 2026

Walkthrough

The AWS provider now supports multi-account role assumption: New iterates AccountIds to build assumed-role sessions for DescribeRegions when the base user lacks permission; createAssumedRoleSession accepts accountId to construct RoleArn; Verify runs parallel per-account verification and aggregates failures.

Changes

Cohort / File(s) Summary
Multi-Account AWS Provider Flow
pkg/providers/aws/aws.go
Changed createAssumedRoleSession to accept accountId and build RoleArn with it. New now iterates AccountIds, creating assumed-role sessions per account and trying DescribeRegions until one succeeds or all fail (errors aggregated). Verify parallelizes per-account role assumption and verification with goroutines, mutex-protected failure collection, and consolidated error reporting.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Provider
    participant BaseAWS as AWS (Base User)
    participant AssumeRole as Assumed Role Session
    participant TargetAWS as AWS (Target Account)

    Client->>Provider: New(options)
    Provider->>BaseAWS: DescribeRegions()
    BaseAWS-->>Provider: Permission Denied

    loop For each AccountId in options.AccountIds
        Provider->>AssumeRole: createAssumedRoleSession(accountId)
        AssumeRole->>AssumeRole: Build RoleArn using accountId
        AssumeRole-->>Provider: Session created
        Provider->>TargetAWS: DescribeRegions(assumed role)
        alt Success
            TargetAWS-->>Provider: Regions list
            Provider->>Provider: Set regions & stop loop
        else Failure
            TargetAWS-->>Provider: Error
            Provider->>Provider: Aggregate error & continue
        end
    end

    alt Any account succeeded
        Provider-->>Client: Initialized successfully
    else All accounts failed
        Provider-->>Client: Aggregated error returned
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I hop through accounts, one by one,

building ARNs till the work is done.
If one denies, I call the rest,
goroutines bustle, doing their best,
and aggregate errors settle the quest.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly addresses the main changes: improving the verify function and enhancing role assumption reliability across all accounts rather than just the first one.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch improve-aws-verification

Comment @coderabbitai help to get the list of available commands and usage tips.

@girish-cheedala girish-cheedala changed the title fix(aws): verify all accounts and improve role assumption reliability fix(aws): improve verify - check all accounts and improve role assumption reliability Mar 11, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/providers/aws/aws.go (1)

523-560: ⚠️ Potential issue | 🔴 Critical

Verify() still skips account-role validation when the base session works.

Because of the early return at Lines 524-526, this new fan-out only runs when p.verify() fails on the source credentials. If the base creds can access EC2/S3/etc. but one or more target accounts are missing the role, Verify() still returns nil and those broken accounts stay undetected.

💡 Suggested fix
-	err := p.verify()
-	if err == nil {
-		return nil
-	}
+	baseErr := p.verify()
 
 	if p.options.AssumeRoleName != "" && len(p.options.AccountIds) > 0 {
 		var mu sync.Mutex
 		var failedAccounts []string
 		var wg sync.WaitGroup
 
@@
 		if len(failedAccounts) > 0 {
 			msg := fmt.Sprintf("failed to assume role %s in accounts: %s", p.options.AssumeRoleName, strings.Join(failedAccounts, ", "))
 			if p.options.OrgDiscoveryRoleArn != "" {
 				msg += ". Add these to exclude_account_ids if they should not be part of discovery"
 			}
 			return errors.New(msg)
 		}
 		return nil
 	}
-	return err
+	return baseErr
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/providers/aws/aws.go` around lines 523 - 560, The Verify() flow currently
returns early when p.verify() succeeds so per-account assume-role checks never
run; instead, always run the fan-out when p.options.AssumeRoleName != "" &&
len(p.options.AccountIds) > 0 regardless of the base verify result: call
p.verify() and keep its error value, then iterate accounts and for each use
createAssumedRoleSession (with p.session/p.session.Config and the account id),
init services on the temporary Provider (tempProvider.initServices) and call
tempProvider.verify(), collecting failedAccounts via sync.Mutex as done now;
after the fan-out, if any failedAccounts exist return a constructed error
including p.options.AssumeRoleName and p.options.OrgDiscoveryRoleArn guidance,
otherwise return the original base verify error (which may be nil). Ensure
identifiers referenced: Verify(), createAssumedRoleSession,
Provider.initServices, tempProvider.verify(), p.options.AssumeRoleName,
p.options.AccountIds, p.options.OrgDiscoveryRoleArn.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/providers/aws/aws.go`:
- Around line 252-266: The loop currently overwrites regionErr for each account
so callers only see the last failure; change the logic in the block using
regionErr/createAssumedRoleSession/tempRC.DescribeRegions and options.AccountIds
to collect per-account errors (e.g., map or slice of errors keyed by accountId)
and append or wrap each failure instead of replacing regionErr, then when all
accounts fail return a single error that aggregates the per-account failures
(include accountId and original error text for each) so the caller sees every
account-level failure rather than just the final one.
- Around line 529-553: The account verification loop spawns an unbounded
goroutine per p.options.AccountIds and ignores ctx, so add a bounded worker
semaphore (e.g., a buffered chan semaphore with a configurable maxWorkers like
10) and acquire it before launching each goroutine and release it at the end to
cap fan-out, and make the goroutine respect cancellation by selecting on
ctx.Done() at start and before appending to failedAccounts; also update calls to
createAssumedRoleSession and Provider.verify to accept and use ctx (or wrap them
so they return early if ctx is done) and keep using mu to protect failedAccounts
and wg for completion so cancellation stops further work and prevents STS
throttling.

---

Outside diff comments:
In `@pkg/providers/aws/aws.go`:
- Around line 523-560: The Verify() flow currently returns early when p.verify()
succeeds so per-account assume-role checks never run; instead, always run the
fan-out when p.options.AssumeRoleName != "" && len(p.options.AccountIds) > 0
regardless of the base verify result: call p.verify() and keep its error value,
then iterate accounts and for each use createAssumedRoleSession (with
p.session/p.session.Config and the account id), init services on the temporary
Provider (tempProvider.initServices) and call tempProvider.verify(), collecting
failedAccounts via sync.Mutex as done now; after the fan-out, if any
failedAccounts exist return a constructed error including
p.options.AssumeRoleName and p.options.OrgDiscoveryRoleArn guidance, otherwise
return the original base verify error (which may be nil). Ensure identifiers
referenced: Verify(), createAssumedRoleSession, Provider.initServices,
tempProvider.verify(), p.options.AssumeRoleName, p.options.AccountIds,
p.options.OrgDiscoveryRoleArn.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: fa099f27-b4e6-4f7e-be88-4043ad3718d0

📥 Commits

Reviewing files that changed from the base of the PR and between c53a8ef and c8d044f.

📒 Files selected for processing (1)
  • pkg/providers/aws/aws.go

@girish-cheedala girish-cheedala requested review from ShubhamRasal and removed request for ShubhamRasal March 11, 2026 12:29
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
pkg/providers/aws/aws.go (1)

529-556: ⚠️ Potential issue | 🟠 Major

Context cancellation is still not honored in the parallel verification loop.

The semaphore limiting concurrency to 200 was added (good), but the loop and goroutines still don't respect ctx:

  1. Line 536 (sem <- struct{}{}) can block indefinitely without checking ctx.Done()
  2. The goroutine body doesn't check for cancellation before performing work

If the caller cancels the context, all spawned goroutines will continue executing their STS AssumeRole calls.

💡 Suggested fix to honor ctx
 		sem := make(chan struct{}, 200)
 
 		for _, accountId := range p.options.AccountIds {
+			select {
+			case <-ctx.Done():
+				break
+			default:
+			}
 			wg.Add(1)
-			sem <- struct{}{}
+			select {
+			case sem <- struct{}{}:
+			case <-ctx.Done():
+				wg.Done()
+				continue
+			}
 			go func(id string) {
 				defer wg.Done()
 				defer func() { <-sem }()
+
+				if ctx.Err() != nil {
+					return
+				}
+
 				tempSession, err := createAssumedRoleSession(p.options, p.session, p.session.Config, id)

As per coding guidelines, "Always pass and honor context.Context in Resources() and long-running API calls to support cancellation".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/providers/aws/aws.go` around lines 529 - 556, The parallel verification
loop must honor the incoming ctx: change the semaphore send (sem <- struct{}{})
to a ctx-aware select that returns when ctx.Done() is closed so you don't block
forever, and update the goroutine to early-return if ctx is cancelled before
doing work; also propagate ctx into long-running calls by calling
createAssumedRoleSession and Provider.verify with ctx (or their ctx-aware
variants) and check ctx.Done() between operations (after acquiring the semaphore
and before initServices/verify) so that goroutines add to failedAccounts only
when appropriate and exit promptly on cancellation (references: sem, wg,
failedAccounts, createAssumedRoleSession, Provider.initServices,
Provider.verify, p.options.AccountIds, ctx).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@pkg/providers/aws/aws.go`:
- Around line 529-556: The parallel verification loop must honor the incoming
ctx: change the semaphore send (sem <- struct{}{}) to a ctx-aware select that
returns when ctx.Done() is closed so you don't block forever, and update the
goroutine to early-return if ctx is cancelled before doing work; also propagate
ctx into long-running calls by calling createAssumedRoleSession and
Provider.verify with ctx (or their ctx-aware variants) and check ctx.Done()
between operations (after acquiring the semaphore and before
initServices/verify) so that goroutines add to failedAccounts only when
appropriate and exit promptly on cancellation (references: sem, wg,
failedAccounts, createAssumedRoleSession, Provider.initServices,
Provider.verify, p.options.AccountIds, ctx).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 51cc0213-3737-4cc4-94e3-5ae2f7f2bf33

📥 Commits

Reviewing files that changed from the base of the PR and between c8d044f and 1d7e604.

📒 Files selected for processing (1)
  • pkg/providers/aws/aws.go

@ShubhamRasal ShubhamRasal merged commit d88768a into dev Mar 11, 2026
10 checks passed
@ShubhamRasal ShubhamRasal deleted the improve-aws-verification branch March 11, 2026 13:29
@girish-cheedala girish-cheedala restored the improve-aws-verification branch March 11, 2026 13:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants