Skip to content

ci: add riscv64 C++ CI job on RISE native runners#1

Open
gounthar wants to merge 8 commits intomainfrom
feat/riscv64-cpp-ci
Open

ci: add riscv64 C++ CI job on RISE native runners#1
gounthar wants to merge 8 commits intomainfrom
feat/riscv64-cpp-ci

Conversation

@gounthar
Copy link
Owner

@gounthar gounthar commented Mar 22, 2026

Test PR for validating the Arrow C++ CI on native riscv64 RISE runners.

Changes to .github/workflows/cpp.yml:

  1. New matrix entry: riscv64 / ubuntu-24.04-riscv / Ubuntu 24.04 / LLVM 18
  2. Widened Python setup condition to include RISE runners

This follows the same pattern as the ARM64 entry (native runner, Docker-based build via archery).

Per pitrou's request on apache#49556: "Can we perhaps start by having a regular C++ CI job on RISC-V?"

Arrow C++ already validated on native riscv64 hardware (BananaPi F3, 1h13m build).

Summary by CodeRabbit

  • Chores
    • CI now includes RISC‑V 64‑bit builds on an Ubuntu 24.04 RISC‑V runner with LLVM/Clang 18.
    • Self‑hosted runner setup updated so Python setup runs on RISC‑V and user-local binaries are added to PATH.
    • RISC‑V runner installs the Docker Compose CLI plugin and validates Docker availability.
    • Docker build step respects a buildkit setting from the CI matrix (configurable per run).

Signed-off-by: Bruno Verachten <gounthar@gmail.com>
@gemini-code-assist
Copy link

Note

Gemini is unable to generate a summary for this pull request due to the file types involved not being currently supported.

@coderabbitai
Copy link

coderabbitai bot commented Mar 22, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Added a RISC‑V Docker build matrix entry (riscv64) using an ubuntu-24.04-riscv runner and ubuntu-cpp image (LLVM/clang-tools 18); updated self-hosted Python setup to export $HOME/.local/bin to GITHUB_PATH; added a RISC‑V-only step to install Docker Compose CLI binary and validate Docker, and made DOCKER_BUILDKIT configurable from the matrix.

Changes

Cohort / File(s) Summary
GitHub Actions workflow
.github/workflows/cpp.yml
Added riscv64 matrix entry (ubuntu-24.04-riscv, ubuntu-cpp, clang-tools-18) and set its matrix.buildkit to "0".
Self-hosted runner setup
.github/workflows/cpp.yml
Updated "Setup Python on self-hosted runner" condition to include ubuntu-24.04-riscv and appended "$HOME/.local/bin" to GITHUB_PATH.
RISC‑V-specific Docker setup & build
.github/workflows/cpp.yml
Added a Setup Docker on RISE runner step (runs only on ubuntu-24.04-riscv) that downloads the docker-compose-linux-riscv64 binary (Compose CLI plugin), installs/validates it (docker compose version), runs docker info and docker buildx ls. Extended "Execute Docker Build" to set DOCKER_BUILDKIT from matrix.buildkit (defaults to 1).

Sequence Diagram(s)

sequenceDiagram
    participant Dev as Developer (PR)
    participant GH as GitHub Actions
    participant Runner as ubuntu-24.04-riscv (self-hosted)
    participant Image as ubuntu-cpp (clang/llvm 18)
    participant DockerSvc as Docker daemon
    participant Compose as Docker Compose CLI (binary)

    Dev->>GH: Push workflow update
    GH->>Runner: Schedule job (matrix: riscv64)
    Runner->>Image: Pull/start build container
    Runner->>Runner: Run "Setup Python" (export $HOME/.local/bin to GITHUB_PATH)
    Runner->>Compose: Download/install `docker-compose-linux-riscv64` (chmod + move)
    Runner->>Compose: docker compose version
    Runner->>DockerSvc: docker info
    Runner->>DockerSvc: docker buildx ls
    Runner->>GH: Report job status
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 I hopped into CI with a nimble twitch,
RISC‑V lands and clang made quite a switch,
I fetched a tiny compose, gave it a spin,
Docker hummed, builds queued — the carrots will win.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely identifies the main change: adding a riscv64 C++ CI job on RISE native runners.
Description check ✅ Passed The description addresses the template sections with relevant context: rationale (adding riscv64 validation per request), changes (matrix entry and Python setup condition), and testing approach (follows ARM64 pattern).
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/riscv64-cpp-ci

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
.github/workflows/cpp.yml (1)

138-139: Prefer exact match for the RISC-V branch in this condition.

contains(matrix.runs-on, 'ubuntu-24.04-riscv') is substring-based for string values. Exact equality is more precise here.

Proposed refinement
       - name: Setup Python on self-hosted runner
         if: |
           contains(matrix.runs-on, 'self-hosted') ||
-          contains(matrix.runs-on, 'ubuntu-24.04-riscv')
+          matrix.runs-on == 'ubuntu-24.04-riscv'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/cpp.yml around lines 138 - 139, The condition uses
contains(matrix.runs-on, 'ubuntu-24.04-riscv') which matches substrings; change
it to an exact equality check (matrix.runs-on == 'ubuntu-24.04-riscv') so the
RISC-V branch is matched precisely — update the conditional expression that
currently reads contains(matrix.runs-on, 'ubuntu-24.04-riscv') to use equality
instead.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/cpp.yml:
- Around line 105-111: Cache keys for Docker layers currently use only the image
name and can collide across architectures; update the workflow's cache key
generation in the Docker/cache steps to include the matrix.arch (and optional
matrix.ubuntu) so keys become architecture-aware. Locate the matrix entries
(e.g., the riscv64 entry with "arch: riscv64" and "image: ubuntu-cpp") and
change the cache key expression used in the cache or actions/cache steps to
incorporate "${{ matrix.arch }}" (and/or "${{ matrix.ubuntu }}") alongside the
image name so separate caches are created per-architecture.
- Line 109: Replace the loose matrix value usage "runs-on: ubuntu-24.04-riscv"
with an exact equality check in the conditional that currently reads uses
contains(matrix.runs-on, 'self-hosted') || ...; update that condition to:
contains(matrix.runs-on, 'self-hosted') || matrix.runs-on ==
'ubuntu-24.04-riscv' so it matches the exact-matching style used for
matrix.runs-on == 'ubuntu-latest'.

---

Nitpick comments:
In @.github/workflows/cpp.yml:
- Around line 138-139: The condition uses contains(matrix.runs-on,
'ubuntu-24.04-riscv') which matches substrings; change it to an exact equality
check (matrix.runs-on == 'ubuntu-24.04-riscv') so the RISC-V branch is matched
precisely — update the conditional expression that currently reads
contains(matrix.runs-on, 'ubuntu-24.04-riscv') to use equality instead.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ef3dffa2-e70a-490e-ae34-996fb4158f96

📥 Commits

Reviewing files that changed from the base of the PR and between d08d5e6 and deca166.

📒 Files selected for processing (1)
  • .github/workflows/cpp.yml

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new native RISC-V (riscv64) entry to the Arrow C++ GitHub Actions workflow to validate builds/tests on RISE runners, following the existing Docker/archery-based pattern used for other architectures.

Changes:

  • Added a new matrix configuration for riscv64 targeting ubuntu-24.04-riscv with LLVM/clang-tools 18.
  • Extended the “Setup Python on self-hosted runner” step condition to also run on the RISC-V runner label.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +105 to +111
- arch: riscv64
clang-tools: 18
image: ubuntu-cpp
llvm: 18
runs-on: ubuntu-24.04-riscv
title: RISCV64 Ubuntu 24.04 C++
ubuntu: 24.04
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new riscv64 job will likely run with reduced filesystem test coverage: on riscv64, ci/scripts/install_minio.sh and ci/scripts/install_gcs_testbench.sh currently skip installation for unsupported architectures, and ci/scripts/cpp_test.sh then excludes arrow-s3fs-test / arrow-gcsfs-test when minio / storage-testbench aren’t present. If the intent is a “regular” C++ CI job, consider adding riscv64 support for these dependencies (or making the reduced coverage explicit in the workflow/job config).

Suggested change
- arch: riscv64
clang-tools: 18
image: ubuntu-cpp
llvm: 18
runs-on: ubuntu-24.04-riscv
title: RISCV64 Ubuntu 24.04 C++
ubuntu: 24.04
# NOTE: On riscv64, MinIO and the GCS storage testbench are not installed,
# so S3/GCS filesystem tests (arrow-s3fs-test / arrow-gcsfs-test) run with
# reduced coverage or are skipped. This job is intentionally marked as such.
- arch: riscv64
clang-tools: 18
image: ubuntu-cpp
llvm: 18
runs-on: ubuntu-24.04-riscv
title: RISCV64 Ubuntu 24.04 C++ (reduced filesystem tests)
ubuntu: 24.04
filesystem-tests: reduced

Copilot uses AI. Check for mistakes.
if: |
contains(matrix.runs-on, 'self-hosted')
contains(matrix.runs-on, 'self-hosted') ||
contains(matrix.runs-on, 'ubuntu-24.04-riscv')
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Python-setup condition is now coupled to a specific runner label (contains(matrix.runs-on, 'ubuntu-24.04-riscv')). For maintainability (e.g., if the runner label changes or additional RISC-V labels are introduced), consider keying this off the matrix data (matrix.arch == 'riscv64') or a dedicated matrix flag (e.g., needs-python-apt: true) instead of matching on the runs-on string.

Suggested change
contains(matrix.runs-on, 'ubuntu-24.04-riscv')
matrix.arch == 'riscv64'

Copilot uses AI. Check for mistakes.
Signed-off-by: Bruno Verachten <gounthar@gmail.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
.github/workflows/cpp.yml (1)

105-111: ⚠️ Potential issue | 🟠 Major

Prevent cross-architecture Docker cache collisions for ubuntu-cpp.

With the new riscv64 entry, both arm64v8 and riscv64 share image: ubuntu-cpp, but Line 128 keys cache by image only. That can restore incompatible .docker cache data across architectures.

Proposed fix
       - name: Cache Docker Volumes
         uses: actions/cache@v5
         with:
           path: .docker
-          key: ${{ matrix.image }}-${{ hashFiles('cpp/**') }}
-          restore-keys: ${{ matrix.image }}-
+          key: ${{ matrix.image }}-${{ matrix.arch }}-${{ hashFiles('cpp/**') }}
+          restore-keys: |
+            ${{ matrix.image }}-${{ matrix.arch }}-
+            ${{ matrix.image }}-
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/cpp.yml around lines 105 - 111, The Docker cache for the
"ubuntu-cpp" image can collide across architectures (arm64v8 vs riscv64); modify
the cache key used for the "ubuntu-cpp" job so it includes the architecture
(e.g., append matrix.arch or runner.arch to the cache key string) or include
arch in the cache scope/name used where "image: ubuntu-cpp" is referenced so
caches are separated per arch; update the cache step that currently keys by
image only to incorporate the arch identifier to prevent restoring incompatible
.docker cache data for the "ubuntu-cpp" job with arch: riscv64.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In @.github/workflows/cpp.yml:
- Around line 105-111: The Docker cache for the "ubuntu-cpp" image can collide
across architectures (arm64v8 vs riscv64); modify the cache key used for the
"ubuntu-cpp" job so it includes the architecture (e.g., append matrix.arch or
runner.arch to the cache key string) or include arch in the cache scope/name
used where "image: ubuntu-cpp" is referenced so caches are separated per arch;
update the cache step that currently keys by image only to incorporate the arch
identifier to prevent restoring incompatible .docker cache data for the
"ubuntu-cpp" job with arch: riscv64.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2a01bd22-2c60-4559-a401-8c276705ab52

📥 Commits

Reviewing files that changed from the base of the PR and between deca166 and 5a14094.

📒 Files selected for processing (1)
  • .github/workflows/cpp.yml

Signed-off-by: Bruno Verachten <gounthar@gmail.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
.github/workflows/cpp.yml (1)

105-111: ⚠️ Potential issue | 🟠 Major

Cache key is still architecture-agnostic and can cross-restore wrong Docker cache.

With the new riscv64 entry sharing image: ubuntu-cpp, Line 128/Line 129 keys can collide across arm64v8 and riscv64.

Proposed fix
       - name: Cache Docker Volumes
         uses: actions/cache@v5
         with:
           path: .docker
-          key: ${{ matrix.image }}-${{ hashFiles('cpp/**') }}
-          restore-keys: ${{ matrix.image }}-
+          key: ${{ matrix.image }}-${{ matrix.arch }}-${{ matrix.ubuntu }}-${{ hashFiles('cpp/**') }}
+          restore-keys: |
+            ${{ matrix.image }}-${{ matrix.arch }}-${{ matrix.ubuntu }}-
+            ${{ matrix.image }}-${{ matrix.arch }}-
+            ${{ matrix.image }}-

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 61414bfe-8f46-49fe-8890-73e1e2f87e8c

📥 Commits

Reviewing files that changed from the base of the PR and between 5a14094 and eb80b0d.

📒 Files selected for processing (1)
  • .github/workflows/cpp.yml

…4 repos)

Signed-off-by: Bruno Verachten <gounthar@gmail.com>
@gounthar
Copy link
Owner Author

@coderabbitai Good catch on the cache key. However, this is a pre-existing issue in the upstream workflow - arm64v8 and amd64 already share the same matrix.image-only key. Fixing it here would change behavior for existing arches and widen the PR scope.

I'll note this as a potential follow-up improvement when submitting upstream, but keep this PR focused on adding the riscv64 matrix entry.

@coderabbitai
Copy link

coderabbitai bot commented Mar 22, 2026

Tip

For best results, initiate chat on the files or code changes.

@gounthar That makes complete sense — matching the existing pattern for arm64v8 and amd64 is the right call here to keep the PR focused and avoid unintended side effects on those arches. Noted that you'll track it as a follow-up for the upstream submission.

Would you like me to open a GitHub issue on this repository to track the cache key improvement as a follow-up? It could serve as a useful reference when you submit upstream.

(^_^) ~🐇


✏️ Learnings added
Learnt from: gounthar
URL: https://github.com/gounthar/arrow/pull/1

Timestamp: 2026-03-22T14:31:10.418Z
Learning: In gounthar/arrow (.github/workflows/cpp.yml), the Docker build cache key uses only `matrix.image` (not including `matrix.llvm` or `matrix.arch`). This is a pre-existing pattern shared by the arm64v8 and amd64 matrix entries in the upstream workflow. The author intentionally kept the same pattern for the riscv64 entry to avoid widening the PR scope; a cache key improvement is planned as a follow-up when submitting upstream (apache/arrow#49556).

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Dump existing daemon.json for debugging, then reset to a clean config.
Fixes 'Client sent an HTTP request to an HTTPS server' from
misconfigured registry mirrors on the runner.

Signed-off-by: Bruno Verachten <gounthar@gmail.com>
BuildKit causes 'HTTP request to HTTPS server' on RISE runner.
No systemd available, so can't restart dockerd.
Disable BuildKit for riscv64 (use legacy builder) and add debug output.

Signed-off-by: Bruno Verachten <gounthar@gmail.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
.github/workflows/cpp.yml (1)

149-153: Pin Docker Compose to a repo-controlled version.

Resolving releases/latest at runtime makes this leg non-reproducible and lets an upstream Compose release break Arrow CI without any repo change. Prefer a tested version in env/matrix and update it deliberately.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/cpp.yml around lines 149 - 153, The workflow currently
resolves COMPOSE_VERSION at runtime via releases/latest which makes the job
non-reproducible; change it to use a repo-controlled version variable instead
(e.g., a matrix/env variable like DOCKER_COMPOSE_VERSION) and stop querying the
GitHub API. Update the steps that reference COMPOSE_VERSION (the curl/grep
assignment and the download URL lines that use
/${COMPOSE_VERSION}/docker-compose-linux-riscv64 and the chmod step) to read
from that stable variable, and add a guard that fails the job with a clear error
if the variable is unset so CI fails fast when the pinned version is missing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/cpp.yml:
- Around line 105-111: The riscv64 job leg ("arch: riscv64", "title: RISCV64
Ubuntu 24.04 C++") needs a longer GitHub Actions timeout to avoid flakes; add or
increase the workflow/job-level timeout (e.g., set timeout-minutes: 180 or at
least 120) for that job or the containing workflow so the riscv64 leg has
sufficient headroom for Python bootstrap, Compose setup, Docker restart and
Archery install before the build starts.
- Around line 158-166: The workflow currently prints and entirely replaces
/etc/docker/daemon.json via a heredoc (using sudo cat /etc/docker/daemon.json
and sudo tee /etc/docker/daemon.json), which exposes runner-local settings and
clobbers unrelated keys; instead update only the targeted key and avoid leaking
the original: read the file into a temp backup, use a JSON-aware tool (jq) to
merge or set .features["containerd-snapshotter"]=true into the existing JSON (or
create an object if absent), write the merged result to a temp file and
atomically move it to /etc/docker/daemon.json, then restart docker;
alternatively perform this change in runner provisioning so the workflow never
prints or permanently overwrites the host daemon.json.

---

Nitpick comments:
In @.github/workflows/cpp.yml:
- Around line 149-153: The workflow currently resolves COMPOSE_VERSION at
runtime via releases/latest which makes the job non-reproducible; change it to
use a repo-controlled version variable instead (e.g., a matrix/env variable like
DOCKER_COMPOSE_VERSION) and stop querying the GitHub API. Update the steps that
reference COMPOSE_VERSION (the curl/grep assignment and the download URL lines
that use /${COMPOSE_VERSION}/docker-compose-linux-riscv64 and the chmod step) to
read from that stable variable, and add a guard that fails the job with a clear
error if the variable is unset so CI fails fast when the pinned version is
missing.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a2fad128-fe69-4f5c-80b2-fb37b61f2f4f

📥 Commits

Reviewing files that changed from the base of the PR and between eb80b0d and 70b63e9.

📒 Files selected for processing (1)
  • .github/workflows/cpp.yml

Comment on lines +105 to +111
- arch: riscv64
clang-tools: 18
image: ubuntu-cpp
llvm: 18
runs-on: ubuntu-24.04-riscv
title: RISCV64 Ubuntu 24.04 C++
ubuntu: 24.04
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Give the riscv64 leg more timeout headroom.

The PR notes roughly 1h13m on native hardware, and this job still has Python bootstrap, Compose setup, Docker restart, and Archery install before the build even starts. With the shared 75-minute cap, this new leg is very likely to flake on cold caches or slower runners.

🧰 Tools
🪛 actionlint (1.7.11)

[error] 109-109: label "ubuntu-24.04-riscv" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2025", "windows-2025-vs2026", "windows-2022", "windows-11-arm", "ubuntu-slim", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-24.04-arm", "ubuntu-22.04", "ubuntu-22.04-arm", "macos-latest", "macos-latest-xlarge", "macos-latest-large", "macos-26-xlarge", "macos-26-large", "macos-26", "macos-15-intel", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xlarge", "macos-14-large", "macos-14", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/cpp.yml around lines 105 - 111, The riscv64 job leg
("arch: riscv64", "title: RISCV64 Ubuntu 24.04 C++") needs a longer GitHub
Actions timeout to avoid flakes; add or increase the workflow/job-level timeout
(e.g., set timeout-minutes: 180 or at least 120) for that job or the containing
workflow so the riscv64 leg has sufficient headroom for Python bootstrap,
Compose setup, Docker restart and Archery install before the build starts.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/cpp.yml:
- Around line 149-154: The workflow installs an unpinned Compose binary into
/usr/local/lib/docker/cli-plugins using COMPOSE_VERSION resolved from
releases/latest which mutates the self-hosted runner; instead, pin a specific
Compose release (avoid using releases/latest), verify the binary checksum before
use, install it into a job-scoped directory (e.g. $RUNNER_TEMP or another
ephemeral path) rather than /usr/local/lib/docker/cli-plugins, and remove the
binary at the end of the job; update the steps that set COMPOSE_VERSION and the
target path (/usr/local/lib/docker/cli-plugins/docker-compose) to use the pinned
version, checksum verification, ephemeral install location, and a cleanup step,
or alternatively provision Compose into the runner image and remove the install
logic entirely.
- Around line 142-145: The workflow currently calls "python3 -m pip install -U
pip" and adds "$HOME/.local/bin" to GITHUB_PATH, which will fail under PEP 668;
instead create a job-local virtual environment (use python3 -m venv <envdir>),
install/upgrade pip into that venv, and add the venv's bin directory to
GITHUB_PATH so subsequent steps (e.g., "Setup Archery") use the venv; update the
steps surrounding the existing "python3 -m pip install -U pip" and the
"$HOME/.local/bin" echo to reference the venv directory you create.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f8b9d5d1-7fd2-47d1-ad37-66b072d73e79

📥 Commits

Reviewing files that changed from the base of the PR and between 70b63e9 and cc2a07b.

📒 Files selected for processing (1)
  • .github/workflows/cpp.yml

Comment on lines 142 to +145
sudo apt update
sudo apt install -y --no-install-recommends python3 python3-dev python3-pip
python3 -m pip install -U pip
echo "$HOME/.local/bin" >> $GITHUB_PATH
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Read the relevant section of the cpp.yml workflow file
sed -n '125,165p' .github/workflows/cpp.yml

Repository: gounthar/arrow

Length of output: 1919


🏁 Script executed:

# Check if there are any other references to Ubuntu 24.04 or Python management in the repo
rg -i "ubuntu-24|externally.managed|pep.668" --type yaml --type markdown

Repository: gounthar/arrow

Length of output: 1042


🌐 Web query:

Ubuntu 24.04 Python externally managed PEP 668

💡 Result:

In Ubuntu 24.04 LTS (Noble Numbat), the system Python 3.12 installation is marked as "externally managed" per PEP 668. This prevents pip from installing packages into the system Python environment to avoid conflicts with the apt package manager. The marker file EXTERNALLY-MANAGED is located at /usr/lib/python3.12/EXTERNALLY-MANAGED. When pip detects this file, it blocks installations (system-wide or --user) and displays an error recommending alternatives. Recommended ways to install Python packages: 1. Use apt for available packages: sudo apt install python3- 2. Use virtual environments: sudo apt install python3-full python3-venv; python3 -m venv /path/to/venv; source /path/to/venv/bin/activate; pip install 3. For standalone apps: sudo apt install pipx; pipx install 4. Override (not recommended): pip install --break-system-packages See /usr/share/doc/python3.12/README.venv for Ubuntu-specific venv guidance. This behavior started in Ubuntu 23.04 and continues in 24.04.

Citations:


🏁 Script executed:

# Check if there are any discussions or issues about this Python setup in the codebase
rg -i "break-system-packages|venv|virtualenv|pip install" .github/workflows/ -A 2 -B 2 | head -80

Repository: gounthar/arrow

Length of output: 4234


🏁 Script executed:

# Check if there are any related GitHub runner or CI discussions in comments
rg -i "riscv|ubuntu-24|pip|venv" .github/workflows/cpp.yml -B 3 -A 3 | grep -A 5 -B 5 "riscv64"

Repository: gounthar/arrow

Length of output: 1204


Use a venv for the RISC-V Python bootstrap to avoid PEP 668 externally-managed Python restrictions.

Ubuntu 24.04 marks the system Python as externally managed (per PEP 668), which blocks pip from installing to the system environment. The current code will fail at line 144 (python3 -m pip install -U pip) and again at the "Setup Archery" step that follows. Create a job-local virtualenv and add its bin directory to GITHUB_PATH instead.

Suggested fix
-          sudo apt install -y --no-install-recommends python3 python3-dev python3-pip
-          python3 -m pip install -U pip
-          echo "$HOME/.local/bin" >> $GITHUB_PATH
+          sudo apt install -y --no-install-recommends python3 python3-dev python3-pip python3-venv
+          python3 -m venv .venv
+          echo "$GITHUB_WORKSPACE/.venv/bin" >> "$GITHUB_PATH"
+          . .venv/bin/activate
+          python -m pip install -U pip
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/cpp.yml around lines 142 - 145, The workflow currently
calls "python3 -m pip install -U pip" and adds "$HOME/.local/bin" to
GITHUB_PATH, which will fail under PEP 668; instead create a job-local virtual
environment (use python3 -m venv <envdir>), install/upgrade pip into that venv,
and add the venv's bin directory to GITHUB_PATH so subsequent steps (e.g.,
"Setup Archery") use the venv; update the steps surrounding the existing
"python3 -m pip install -U pip" and the "$HOME/.local/bin" echo to reference the
venv directory you create.

Comment on lines +149 to +154
# Install Docker Compose plugin (not in Ubuntu riscv64 repos)
COMPOSE_VERSION=$(curl -s https://api.github.com/repos/docker/compose/releases/latest | grep -oP '"tag_name": "\K[^"]+')
sudo mkdir -p /usr/local/lib/docker/cli-plugins
sudo curl -SL "https://github.com/docker/compose/releases/download/${COMPOSE_VERSION}/docker-compose-linux-riscv64" \
-o /usr/local/lib/docker/cli-plugins/docker-compose
sudo chmod +x /usr/local/lib/docker/cli-plugins/docker-compose
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Read the workflow file to understand the context
cat -n .github/workflows/cpp.yml

Repository: gounthar/arrow

Length of output: 17410


Don't install a floating Compose binary into /usr/local from the workflow.

This mutates shared host state on the self-hosted runner and resolves releases/latest on every run. On a reused runner, later jobs inherit whichever Compose version last ran. Please either provision Compose on the runner image, or install a pinned, checksum-verified version into a job-scoped location and clean it up after the job.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/cpp.yml around lines 149 - 154, The workflow installs an
unpinned Compose binary into /usr/local/lib/docker/cli-plugins using
COMPOSE_VERSION resolved from releases/latest which mutates the self-hosted
runner; instead, pin a specific Compose release (avoid using releases/latest),
verify the binary checksum before use, install it into a job-scoped directory
(e.g. $RUNNER_TEMP or another ephemeral path) rather than
/usr/local/lib/docker/cli-plugins, and remove the binary at the end of the job;
update the steps that set COMPOSE_VERSION and the target path
(/usr/local/lib/docker/cli-plugins/docker-compose) to use the pinned version,
checksum verification, ephemeral install location, and a cleanup step, or
alternatively provision Compose into the runner image and remove the install
logic entirely.

Use ${archs[$arch]:-} default value to avoid 'unbound variable' error
when running on architectures not in the archs array (e.g. riscv64).
The script already handles missing arches gracefully with exit 0, but
set -eu causes it to crash before reaching that check.

Signed-off-by: Bruno Verachten <gounthar@gmail.com>
- Use --break-system-packages for pip (PEP 668 on Ubuntu 24.04)
- Pin docker compose to v5.1.1 instead of floating latest
- Use matrix.arch == 'riscv64' instead of matching on runner label
- Remove debug output (docker info/buildx ls)

Signed-off-by: Bruno Verachten <gounthar@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants