Skip to content

fix(gitlab): restore Docker v2 fallback for connecting registries (SUB-7198)#29

Open
rotemamsa wants to merge 2 commits intomainfrom
fix/gitlab-discovery-regression-SUB-7198
Open

fix(gitlab): restore Docker v2 fallback for connecting registries (SUB-7198)#29
rotemamsa wants to merge 2 commits intomainfrom
fix/gitlab-discovery-regression-SUB-7198

Conversation

@rotemamsa
Copy link
Copy Markdown

@rotemamsa rotemamsa commented May 7, 2026

Summary

  • Root cause: In registryx v0.0.31, GetAllRepositories switched from Docker Registry v2 _catalog to GitLab API (/api/v4/projects). When RegistryURL points to the registry host (e.g. gitlab-si-reg.hefr.ch) rather than the GitLab web host, the API calls fail and the customer cannot connect their registry at all.
  • Fix — connect path: GetAllRepositories now falls back to the Docker Registry v2 _catalog endpoint when the GitLab API path fails, restoring the pre-v0.0.31 behavior as a safety net.
  • Fix — scan path: GetImagesToScan now tries the raw (user-entered) URL for registry host discovery before applying heuristic transforms, and extractRegistryHost() properly strips scheme/path so name.NewRegistry() receives a valid hostname.
  • Also: Added getRawAPIBaseURL() (preserves user-entered hostname verbatim) and raw-URL fallback in getRepositoriesFromGitLabAPI().

Test plan

  • All existing tests pass (60+ tests)
  • New tests for getRawAPIBaseURL, extractRegistryHost, discovery via raw URL, scheme-stripping fallback, heuristic fallback, and GetAllRepositories raw URL fallback
  • Bump registryx in operator and verify customer Hefr can connect their self-hosted GitLab registry

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Improved repository discovery with automatic fallback mechanisms for more reliable GitLab repository detection
    • Enhanced support for various GitLab URL formats and registry hostname handling
  • Tests

    • Added comprehensive test coverage for repository discovery and URL handling scenarios

rotemamsa and others added 2 commits May 7, 2026 12:45
…allback (SUB-7198)

The refactor in 2947947 introduced two regressions for self-hosted GitLab:
1. discoverRegistryHost used getGitLabAPIBaseURL() heuristics which mangle
   non-standard hostnames, preventing discovery from reaching the GitLab API
2. The fallback path passed RegistryURL with scheme/path to name.NewRegistry
   which expects a bare hostname

Now discovery tries the raw URL first (preserving the user-entered host),
falls back to the heuristic URL if different, and extractRegistryHost()
strips scheme/path for the final fallback. GetAllRepositories also falls
back to the raw URL when the heuristic URL fails.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When RegistryURL points to the registry host (not the GitLab web host),
the GitLab API calls fail because the registry host doesn't serve the
GitLab API. This restores the pre-v0.0.31 behavior as a fallback: if
the GitLab API path fails, fall back to the Docker Registry v2 _catalog
endpoint which works directly against the registry host.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 7, 2026 10:17
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 7, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

The PR introduces GitLab container registry hostname discovery with Docker Registry v2 fallback. New URL helper functions normalize API base URLs and extract bare hostnames. Repository and image discovery now retry using raw API base URLs, then fall back to Docker Registry _catalog endpoint or extracted hostnames.

Changes

GitLab Registry Discovery with Docker Fallback

Layer / File(s) Summary
URL Helper Functions
registryclients/gitlab.go
getRawAPIBaseURL() preserves user-entered hostname with scheme normalization and appends /api/v4. extractRegistryHost() strips scheme and path from RegistryURL for Docker fallback hostname.
URL Helper Unit Tests
registryclients/gitlab_test.go
TestGitLabRegistryClient_getRawAPIBaseURL validates URL construction across schemes, ports, path stripping, and whitespace. TestGitLabRegistryClient_extractRegistryHost validates hostname extraction from various URL formats.
Repository Discovery Implementation
registryclients/gitlab.go
GetAllRepositories() tries GitLab API then falls back to Docker Registry v2. getRepositoriesFromGitLabAPI() retries with raw API base URL on failure. getRepositoriesFromDockerAPI() constructs Docker client from extracted hostname.
Image Scanning Discovery Sequence
registryclients/gitlab.go
GetImagesToScan() discovers hostname via raw API base URL, retries with heuristic base URL on failure, and falls back to extracted hostname from RegistryURL.
Discovery Integration Tests
registryclients/gitlab_test.go
TestGitLabRegistryClient_GetImagesToScan_discoveryViaRawURL validates raw URL discovery with mocked GitLab API. TestGitLabRegistryClient_GetImagesToScan_fallbackStripsScheme verifies fallback uses bare hostname. TestGitLabRegistryClient_GetImagesToScan_heuristicFallbackDiscovery validates heuristic fallback when raw discovery fails. TestGitLabRegistryClient_getRepositoriesFromGitLabAPI_fallbackToRawURL confirms repositories returned on raw URL fallback success.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • armosec/registryx#26: Modifies GitLab API base-URL/host-derivation logic and related tests with new helpers for detecting/normalizing GitLab hosts and updating discovery/fallback wiring.
  • armosec/registryx#28: Implements GitLab container-registry hostname auto-discovery and updates GetImagesToScan/GetAllRepositories to prefer discovered registry host with fallbacks.

Suggested reviewers

  • matthyx
  • jnathangreeg

Poem

🐰 A rabbit hops through registries bold,
With URLs raw and hostnames gold,
When GitLab paths fail, Docker takes the call—
Smart fallback chains, we discover them all! 🌟

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the primary fix: restoring Docker v2 fallback for GitLab registries, matching the main change in the PR.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/gitlab-discovery-regression-SUB-7198

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Restores resilience for GitLab registry connections by reintroducing a Docker Registry v2 _catalog fallback when GitLab API-based repository discovery fails, and improves registry host discovery by trying the raw (user-entered) URL before heuristic transformations.

Changes:

  • GetAllRepositories now falls back to Docker Registry v2 _catalog when GitLab API discovery fails.
  • Added getRawAPIBaseURL() and extractRegistryHost() to improve host parsing and discovery behavior for self-hosted GitLab instances.
  • Expanded GitLab client tests to cover raw URL base building and scheme/path stripping behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
registryclients/gitlab.go Adds Docker _catalog fallback for repository listing; improves host/base-URL derivation for GitLab API + scanning flows.
registryclients/gitlab_test.go Adds/updates tests for raw base URL parsing and host extraction; adjusts GetImagesToScan test coverage.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread registryclients/gitlab.go
return g.getRepositoriesFromGitLabAPI(ctx)
repos, err := g.getRepositoriesFromGitLabAPI(ctx)
if err != nil {
return g.getRepositoriesFromDockerAPI(ctx)
Comment thread registryclients/gitlab.go
Comment on lines +116 to +123
func (g *GitLabRegistryClient) extractRegistryHost() string {
raw := strings.TrimSpace(g.Registry.RegistryURL)
raw = strings.TrimPrefix(raw, "https://")
raw = strings.TrimPrefix(raw, "http://")
if idx := strings.IndexAny(raw, "/?#"); idx != -1 {
raw = raw[:idx]
}
return raw
Comment thread registryclients/gitlab.go
Comment on lines +77 to +93
// getRepositoriesFromDockerAPI lists repositories using the Docker Registry v2
// _catalog endpoint. This is used as a fallback when the GitLab API is
// unreachable (e.g. RegistryURL points to the registry host, not the GitLab web host).
func (g *GitLabRegistryClient) getRepositoriesFromDockerAPI(ctx context.Context) ([]string, error) {
registryHost := g.extractRegistryHost()
registry, err := name.NewRegistry(registryHost)
if err != nil {
return nil, err
}
iRegistry, err := defaultregistry.NewRegistry(&authn.AuthConfig{Username: g.Registry.Username, Password: g.Registry.AccessToken}, &registry, g.Options)
if err != nil {
return nil, err
}

iRegistry.SetMaxPageSize(1000)
return getAllRepositories(ctx, iRegistry)
}
Comment on lines +567 to +623
func TestGitLabRegistryClient_GetImagesToScan_heuristicFallbackDiscovery(t *testing.T) {
const discoveredHost = "actual-registry.example.test"
const repoPath = "group/myproject"

mux := http.NewServeMux()
mux.HandleFunc("/api/v4/projects", func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/json")
_ = json.NewEncoder(w).Encode([]gitLabProject{{ID: 1, PathWithNamespace: "group"}})
})
mux.HandleFunc("/api/v4/projects/1/registry/repositories", func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/json")
_ = json.NewEncoder(w).Encode([]gitLabRepository{{
ID: 10,
Path: repoPath,
Location: fmt.Sprintf("%s/%s", discoveredHost, repoPath),
}})
})
srv := httptest.NewServer(mux)
defer srv.Close()

srvHost := strings.TrimPrefix(srv.URL, "http://")

// Verify the heuristic fallback path: raw URL ≠ heuristic URL, raw fails,
// heuristic succeeds. We use "registry.gitlab-<host>" as RegistryURL:
// raw URL → http://registry.gitlab-<host>/api/v4 (unreachable)
// heuristic → https://gitlab-<host>/api/v4 (scheme mismatch with mock)
// Since httptest only serves HTTP and getGitLabAPIBaseURL always returns https://,
// we test the fallback logic at the discoverRegistryHost level instead.
client := &GitLabRegistryClient{
Registry: &armotypes.GitlabImageRegistry{
RegistryURL: "registry.gitlab-" + srvHost,
},
Options: &common.RegistryOptions{},
}
client.Registry.Repositories = []string{repoPath}

rawBaseURL := client.getRawAPIBaseURL()
heuristicBaseURL := client.getGitLabAPIBaseURL()
if rawBaseURL == heuristicBaseURL {
t.Fatalf("raw and heuristic URLs should differ for this test, got %q", rawBaseURL)
}

// Raw URL should fail (unreachable host)
host := client.discoverRegistryHost(context.Background(), rawBaseURL)
if host != "" {
t.Errorf("raw URL discovery should fail, got %q", host)
}

// Heuristic URL reaches the mock (after stripping "registry." prefix)
// We need to use HTTP to reach the mock, so build the URL manually
heuristicHTTP := fmt.Sprintf("http://gitlab-%s/api/v4", srvHost)
_ = heuristicHTTP // can't reach mock with heuristic either due to DNS

// Instead, verify the getGitLabAPIBaseURL strips "registry." correctly
if !strings.Contains(heuristicBaseURL, "gitlab-"+srvHost) {
t.Errorf("heuristic URL should contain gitlab-%s, got %q", srvHost, heuristicBaseURL)
}
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
registryclients/gitlab.go (1)

64-74: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Propagate a hard failure when every repository lookup fails.

getProjectRepositories() errors are always skipped here. If /projects succeeds but every /registry/repositories call returns 403/5xx, this method returns []string, nil, so GetAllRepositories() never falls back to Docker v2 even though GitLab discovery effectively failed.

Suggested fix
 	var allRepos []string
 	httpClient := &http.Client{}
+	var reposLookupErr error
+	successfulLookups := 0
 
 	for _, project := range projects {
 		repos, err := g.getProjectRepositories(ctx, httpClient, baseURL, project.ID)
 		if err != nil {
+			reposLookupErr = err
 			continue
 		}
+		successfulLookups++
 		for _, repo := range repos {
 			allRepos = append(allRepos, repo.Path)
 		}
 	}
+
+	if successfulLookups == 0 && reposLookupErr != nil {
+		return nil, fmt.Errorf("failed to get project repositories: %w", reposLookupErr)
+	}
 
 	return allRepos, nil
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@registryclients/gitlab.go` around lines 64 - 74, The loop over projects
swallows all errors from getProjectRepositories so if every repository lookup
fails you still return ([]string, nil); change the logic in the function
containing the projects loop (the one building allRepos using projects and
calling getProjectRepositories) to track failures: keep a counter (or boolean)
and the last error when getProjectRepositories returns an error, increment on
each failure, and after the loop if failures == len(projects) return nil with
that error (or wrap it) instead of returning allRepos,nil; if at least one
project succeeded, continue to append repos to allRepos and return allRepos,nil
as before.
🧹 Nitpick comments (1)
registryclients/gitlab_test.go (1)

626-661: 🏗️ Heavy lift

The Docker _catalog fallback is still untested.

This case succeeds through the raw GitLab-API retry path, so it never executes getRepositoriesFromDockerAPI(). That leaves the actual regression fix in GetAllRepositories() unprotected by tests.

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@registryclients/gitlab.go`:
- Around line 116-123: The extractRegistryHost method fails on mixed-case
schemes because strings.TrimPrefix is case-sensitive; update extractRegistryHost
(operating on g.Registry.RegistryURL) to parse the URL case-insensitively by
using net/url's url.Parse (or normalize the scheme with strings.ToLower before
trimming) and then return the parsed URL.Host (stripping any port if needed);
this ensures inputs like "HTTPS://registry.example.com/foo" correctly yield
"registry.example.com" instead of "HTTPS:".

---

Outside diff comments:
In `@registryclients/gitlab.go`:
- Around line 64-74: The loop over projects swallows all errors from
getProjectRepositories so if every repository lookup fails you still return
([]string, nil); change the logic in the function containing the projects loop
(the one building allRepos using projects and calling getProjectRepositories) to
track failures: keep a counter (or boolean) and the last error when
getProjectRepositories returns an error, increment on each failure, and after
the loop if failures == len(projects) return nil with that error (or wrap it)
instead of returning allRepos,nil; if at least one project succeeded, continue
to append repos to allRepos and return allRepos,nil as before.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9cbbd3be-794a-45ed-9b52-e58303dfda57

📥 Commits

Reviewing files that changed from the base of the PR and between f87b89f and 9033923.

📒 Files selected for processing (2)
  • registryclients/gitlab.go
  • registryclients/gitlab_test.go

Comment thread registryclients/gitlab.go
Comment on lines +116 to +123
func (g *GitLabRegistryClient) extractRegistryHost() string {
raw := strings.TrimSpace(g.Registry.RegistryURL)
raw = strings.TrimPrefix(raw, "https://")
raw = strings.TrimPrefix(raw, "http://")
if idx := strings.IndexAny(raw, "/?#"); idx != -1 {
raw = raw[:idx]
}
return raw
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

extractRegistryHost() breaks on mixed-case schemes.

Inputs like HTTPS://registry.example.com/foo currently become HTTPS: because the prefix trim is case-sensitive and the slash trim then cuts at the first /. That makes the final hostname fallback invalid.

Suggested fix
 func (g *GitLabRegistryClient) extractRegistryHost() string {
 	raw := strings.TrimSpace(g.Registry.RegistryURL)
-	raw = strings.TrimPrefix(raw, "https://")
-	raw = strings.TrimPrefix(raw, "http://")
+	if !strings.Contains(raw, "://") {
+		raw = "https://" + raw
+	}
+	if u, err := url.Parse(raw); err == nil && u.Host != "" {
+		return u.Host
+	}
+	lower := strings.ToLower(raw)
+	switch {
+	case strings.HasPrefix(lower, "https://"):
+		raw = raw[len("https://"):]
+	case strings.HasPrefix(lower, "http://"):
+		raw = raw[len("http://"):]
+	}
 	if idx := strings.IndexAny(raw, "/?#"); idx != -1 {
 		raw = raw[:idx]
 	}
 	return raw
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@registryclients/gitlab.go` around lines 116 - 123, The extractRegistryHost
method fails on mixed-case schemes because strings.TrimPrefix is case-sensitive;
update extractRegistryHost (operating on g.Registry.RegistryURL) to parse the
URL case-insensitively by using net/url's url.Parse (or normalize the scheme
with strings.ToLower before trimming) and then return the parsed URL.Host
(stripping any port if needed); this ensures inputs like
"HTTPS://registry.example.com/foo" correctly yield "registry.example.com"
instead of "HTTPS:".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants