perf(common): Parallelize metadata fetching during query planning by fordN · Pull Request #2017 · edgeandnode/amp

fordN · 2026-03-24T16:35:08Z

Background

During query planning, parquet file metadata (footers) are fetched one at a time. On cold cache with many files, this serializes hundreds of network round trips. In a large example query 684 files are fetched, so at 50ms per file it takes ~34 seconds just to fetch metadata before any data is read. Parallelizing the fetching with .buffered(32) takes theoretical time to fetch those 684 files in example down to ~1 second.

On warm cache the impact is minimal since cache lookups are sub-millisecond, but cold starts, new tables, and large time-range queries benefit significantly.

Changes

Replace sequential .then() with .buffered(N) in resolve_file_groups(), so up to N parquet metadata fetches run concurrently during query planning. Order is preserved to maintain deterministic round-robin partition assignment.
Add configurable metadata_fetch_concurrency (default: 32) via config file or AMP_CONFIG_METADATA_FETCH_CONCURRENCY env var
- Config flows through: config file (metadata_fetch_concurrency) → Config → server/worker Config → ExecEnv → ExecContextBuilder → QueryableSnapshot → resolve_file_groups()

Replace sequential `.then()` with `.buffered(32)` in resolve_file_groups() so up to 32 parquet metadata fetches run concurrently. Uses `.buffered()` to preserve ordering for deterministic round-robin partition assignment.

Configurable via config file or AMP_CONFIG_METADATA_FETCH_CONCURRENCY env var. (default: 32) Controls how many parquet footer fetches run concurrently during query planning.

LNSD

LGTM ✅

fordN force-pushed the ford/optimizations/parallel-metadata-fetching branch from 76d9b1f to db3133e Compare March 24, 2026 16:54

fordN requested review from JohnSwan1503 and LNSD March 24, 2026 16:54

fordN changed the title ~~perf(common: Parallelize metadata fetching during query planning~~ perf(common): Parallelize metadata fetching during query planning Mar 24, 2026

fordN force-pushed the ford/optimizations/parallel-metadata-fetching branch from db3133e to 8e4d432 Compare March 24, 2026 17:06

feat(common): make metadata fetch concurrency configurable at startup

b47416c

Configurable via config file or AMP_CONFIG_METADATA_FETCH_CONCURRENCY env var. (default: 32) Controls how many parquet footer fetches run concurrently during query planning.

fordN force-pushed the ford/optimizations/parallel-metadata-fetching branch from 8e4d432 to b47416c Compare March 24, 2026 17:15

LNSD approved these changes Mar 24, 2026

View reviewed changes

fordN merged commit 4d815eb into main Mar 24, 2026
8 checks passed

fordN deleted the ford/optimizations/parallel-metadata-fetching branch March 24, 2026 17:20

fordN added the performance label Mar 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(common): Parallelize metadata fetching during query planning#2017

perf(common): Parallelize metadata fetching during query planning#2017
fordN merged 2 commits intomainfrom
ford/optimizations/parallel-metadata-fetching

fordN commented Mar 24, 2026

Uh oh!

LNSD left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

fordN commented Mar 24, 2026

Background

Changes

Uh oh!

LNSD left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants