feat(source/cloud-storage): add Cloud Storage source with list_objects and read_object tools by huangjiahua · Pull Request #3081 · googleapis/mcp-toolbox

huangjiahua · 2026-04-16T20:26:57Z

Description

Adds Google Cloud Storage as a first-class source in MCP Toolbox, enabling LLM agents to work with objects across buckets in a GCP project. The source is project-scoped and authenticates via Application Default Credentials, mirroring Firestore/Bigtable.

This first PR ships the source plus two read-only tools from the approved design (14 total):

cloud-storage-list-objects — prefix filter, delimiter-based grouping (returns prefixes), and pagination via max_results / page_token. Passes through whatever metadata the GCS client returns (*storage.ObjectAttrs) so we don't have to plumb new fields later.
cloud-storage-read-object — reads an object's bytes, textual data only, with optional HTTP-style byte ranges (bytes=0-999, bytes=-500, bytes=500-).

GCS-aware error categorization (per DEVELOPER.md) is implemented in a new cloudstoragecommon helper that maps GCS sentinels and *googleapi.Error codes to Agent errors (missing bucket/object, bad request, unsatisfiable range) vs. Server errors (auth, IAM denial, quota, 5xx, context cancellation). This replaces the coarse util.ProcessGcpError for the two new tools.

Remaining 12 tools from the design doc (list_buckets, create_bucket, copy/move/delete_object, etc.) will land in follow-up PRs.

CI note: the cloud-storage shard in .ci/integration.cloudbuild.yaml expects CLOUD_STORAGE_PROJECT=$PROJECT_ID and requires the test service account to have a Cloud Storage admin role in the test project. Integration test self-manages its own UUID-suffixed bucket with defer-based cleanup.

PR Checklist

Make sure you reviewed CONTRIBUTING.md
Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea (communicated internally)
Ensure the tests and linter pass
Code coverage does not decrease (if any source code was changed)
Appropriate docs were updated (if necessary)
Make sure to add ! if this involve a breaking change

What's included

New source: internal/sources/cloudstorage/ (+ YAML-parse unit tests)
Two tools: internal/tools/cloudstorage/cloudstoragelistobjects/, .../cloudstoragereadobject/ (+ YAML-parse + range-parser unit tests)
New cloudstoragecommon error classifier (+ 17-case unit test covering sentinels, HTTP statuses, context.Canceled/DeadlineExceeded, and fallback)
Integration test: tests/cloudstorage/cloud_storage_integration_test.go — 12 sub-tests against a real bucket (self-created, self-cleaned)
Docs: docs/en/integrations/cloud-storage/ (source + both tool pages; passes .ci/lint-docs-{source,tool}-page.sh)
CI shard: cloud-storage in .ci/integration.cloudbuild.yaml
Dependency: cloud.google.com/go/storage v1.62.1

Opening as draft for initial review — happy to split the error-classifier refactor into a separate commit if reviewers prefer.

gemini-code-assist

Code Review

This pull request adds Google Cloud Storage integration, introducing a new source and tools for listing and reading objects. The implementation includes configuration, error handling, and tests. Feedback recommends capping listing page sizes at 1000 for consistency, implementing memory safety limits when reading objects, and updating documentation titles to include the 'Tool' suffix.

…s and read_object tools Adds a new project-scoped `cloud-storage` source using ADC, plus two read-only tools: `cloud-storage-list-objects` (with prefix/delimiter/pagination) and `cloud-storage-read-object` (with HTTP-style byte range and base64 payload). Introduces a GCS-aware error classifier in `cloudstoragecommon` that splits failures into Agent errors (missing bucket/object, bad request, unsatisfiable range) and Server errors (auth, IAM denial, quota, 5xx, cancellation) per DEVELOPER.md, replacing the coarse-grained `util.ProcessGcpError`. Ships YAML-parse unit tests, an error-classifier unit test, a range-parser unit test, a live-GCS integration test (12 sub-tests, UUID-suffixed bucket with self-cleanup), docs under `docs/en/integrations/cloud-storage/`, and a `cloud-storage` CI shard. The remaining 12 tools from the approved design doc land in follow-up PRs.

…dObject at 1 MiB - ListObjects: pageSize() now clamps to the GCS API max of 1000 so callers that pass a larger max_results don't pre-allocate oversized buffers. - ReadObject: reject objects/ranges over 1 MiB with the new sentinel cloudstoragecommon.ErrReadSizeLimitExceeded, which the classifier maps to an Agent error so the LLM can retry with a narrower 'range'. - Docs + integration tests updated (two new sub-tests: oversize rejection and oversize-narrowed-by-range success).

… MiB 8 MiB gives agents more headroom for typical text/JSON/log payloads while still guarding against OOM. Doc and the oversize integration seed updated to match.

…ckage DefaultMaxReadBytes doesn't belong in errors.go — the limit is a source-side invariant, not an error-classification concern. The sentinel ErrReadSizeLimitExceeded stays in cloudstoragecommon because the classifier still needs to recognize it.

…geSize bounds Cleanup loop in the integration test was treating any iterator error as iterator.Done; now distinguishes the two and logs non-Done errors so flaky teardowns are debuggable. Also adds an internal unit test for pageSize covering 0, negative, in-range, and over-cap inputs.

MCP tool results only carry text today, so the previous base64-encoded content was unusable by the LLM. Validate object bytes with utf8.Valid and return plain-text content; non-UTF-8 objects surface as an agent-fixable ErrBinaryContent error. TODO notes mark the spots to revisit once MCP supports embedded resources.

Yuan325

Hi @huangjiahua Thank you for the contribution! Please let me know if you need any clarifications

Yuan325 · 2026-04-20T20:13:14Z

+
+type Source struct {
+	Config
+	Client *storage.Client


Suggested change

Client *storage.Client

client *storage.Client

Probably don't need to export this~

Yuan325 · 2026-04-20T20:17:23Z

can we move this test function into cloudstorage_test.go instead?

Yuan325 · 2026-04-20T20:26:10Z

+// results at 1000 per page; we enforce the same cap here so callers don't
+// pre-allocate larger buffers and so the contract matches the tool's
+// 'max_results' documentation.
+func pageSize(maxResults int) int {


Should we trigger an AgentError in the tool during parameter extraction if the value exceeds 1,000? This makes the limit explicit to the agent/user, preventing confusion when the returned page count is lower than requested.

Yes, I did that in internal/tools/cloudstorage/cloudstoragelistobjects/cloudstoragelistobjects.go

Yuan325 · 2026-04-20T20:31:25Z

+func (s *Source) ListObjects(ctx context.Context, bucket, prefix, delimiter string, maxResults int, pageToken string) (map[string]any, error) {
+	it := s.Client.Bucket(bucket).Objects(ctx, &storage.Query{
+		Prefix:    prefix,
+		Delimiter: delimiter,


just confirming, will be okay if these 2 values are ""?

Can confirm. I've also add integration test with these 2 values being "".

Yuan325 · 2026-04-20T20:32:09Z

+		Prefix:    prefix,
+		Delimiter: delimiter,
+	})
+	pager := iterator.NewPager(it, pageSize(maxResults), pageToken)


will this be okay if pageToken is ""

Yes. Also added an integration test case.

Yuan325 · 2026-04-20T20:38:01Z

Let's just move this test to cloudstoragereadobject_test.go

Yuan325 · 2026-04-20T20:42:06Z

+}
+
+func initStorageClient(ctx context.Context) (*storage.Client, error) {
+	return storage.NewClient(ctx, option.WithUserAgent("genai-toolbox-integration-test"))


Suggested change

return storage.NewClient(ctx, option.WithUserAgent("genai-toolbox-integration-test"))

return storage.NewClient(ctx)

Can we just init without the user agent option for int test?

Yuan325 · 2026-04-20T20:45:27Z

+		t.Fatalf("toolbox didn't start successfully: %s", err)
+	}
+
+	runCloudStorageToolGetTest(t)


Probably wouldn't need this function. We can just utilize this existing function

mcp-toolbox/tests/tool.go

Line 90 in 2375ffc

func RunToolGetTestByName(t *testing.T, name string, want map[string]any) {

ref:

mcp-toolbox/tests/looker/looker_integration_test.go

Lines 358 to 366 in 2375ffc

tests.RunToolGetTestByName(t, "get_models",

map[string]any{

"get_models": map[string]any{

"description": "Simple tool to test end to end functionality.",

"authRequired": []any{},

"parameters": []any{},

},

},

)

Do you mean we can remove this test?

Yuan325 · 2026-04-20T20:47:20Z

+	}
+}
+
+func runCloudStorageListObjectsTest(t *testing.T, bucket string) {


For this function and runCloudStorageReadObjectTest(), is it possible to use the golang's table-driven tests? There's probably alot of duplication here in each t.Run().

Reference:

mcp-toolbox/tests/tool.go

Line 231 in 2375ffc

func RunToolInvokeTest(t *testing.T, select1Want string, options ...InvokeTestOption) {

The storage.Client is an implementation detail; external callers that need it use the StorageClient() accessor, so the field itself doesn't need to be exported.

…e tests into single test file per package Merge TestPageSize into cloudstorage_test.go and TestParseRange into cloudstoragereadobject_test.go. Both test files now use the internal package so they can exercise the unexported pageSize and parseRange helpers directly, removing the need for separate *_internal_test.go files.

…with AgentError Previously, values above the GCS per-page cap of 1000 were silently clamped by the pageSize helper, which could confuse agents when the returned page was smaller than requested. Validate max_results during Invoke and return an AgentError so the limit is explicit. Docs and the parameter description are updated to match; the pageSize clamp remains as defense in depth. A unit test covers the rejection path and an integration test exercises it over HTTP.

…age_token inputs Add two integration sub-tests confirming that empty-string inputs are accepted by the GCS client as expected: ListObjects with empty prefix and delimiter returns an unfiltered listing, and an empty page_token returns the first page rather than erroring. These cases address review questions about whether the values passed through to storage.Query and iterator.NewPager are safe when unset.

… simplify storage client init in integration test Drop the initStorageClient wrapper and the option.WithUserAgent call; the integration test now uses storage.NewClient(ctx) directly, matching the suggestion in review and removing a needless indirection.

… table-drive integration tests and reuse RunToolGetTestByName Replace the bespoke runCloudStorageToolGetTest with two tests.RunToolGetTestByName calls that assert the full manifest for each tool. Convert the list_objects and read_object sub-tests to table-driven form: each case declares a request body plus substring, content, or contentType expectations, driven by a single assertion loop. The inherently two-step pagination test stays as its own t.Run. Behaviour is unchanged; the file is ~220 lines shorter in boilerplate.

… drop tool-get manifest test Remove runCloudStorageToolGetTest entirely. The manifest-shape check it performed was redundant: unit tests already cover ParseFromYaml for each config, and the invoke sub-tests exercise the tool handlers over HTTP. Keeping a full-manifest deep-equal here just duplicates that coverage and has to be updated whenever parameter docs change.

gemini-code-assist bot reviewed Apr 16, 2026

View reviewed changes

Comment thread docs/en/integrations/cloud-storage/tools/cloud-storage-list-objects.md

Comment thread docs/en/integrations/cloud-storage/tools/cloud-storage-read-object.md

Comment thread internal/sources/cloudstorage/cloudstorage.go

Comment thread internal/sources/cloudstorage/cloudstorage.go

huangjiahua marked this pull request as ready for review April 16, 2026 23:28

huangjiahua requested a review from a team as a code owner April 16, 2026 23:28

blunderbuss-gcf bot assigned duwenxin99 Apr 16, 2026

huangjiahua added 6 commits April 17, 2026 19:29

feat(source/cloud-storage): raise ReadObject size cap from 1 MiB to 8…

39be640

… MiB 8 MiB gives agents more headroom for typical text/JSON/log payloads while still guarding against OOM. Doc and the oversize integration seed updated to match.

huangjiahua force-pushed the feat/cloud-storage-source branch from 91a222a to 4919821 Compare April 17, 2026 19:30

Yuan325 requested changes Apr 20, 2026

View reviewed changes

huangjiahua added 7 commits April 21, 2026 00:12

refactor(source/cloud-storage): unexport Source.client field

8b1aa46

The storage.Client is an implementation detail; external callers that need it use the StorageClient() accessor, so the field itself doesn't need to be exported.

	return storage.NewClient(ctx, option.WithUserAgent("genai-toolbox-integration-test"))
	return storage.NewClient(ctx)

	tests.RunToolGetTestByName(t, "get_models",
	map[string]any{
	"get_models": map[string]any{
	"description": "Simple tool to test end to end functionality.",
	"authRequired": []any{},
	"parameters": []any{},
	},
	},
	)

Conversation

huangjiahua commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

PR Checklist

What's included

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Yuan325 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

huangjiahua commented Apr 16, 2026 •

edited

Loading