Skip to content

[FLINK-XXXXX][docs] Add AGENTS.md and AI disclosure to PR template#27776

Draft
MartijnVisser wants to merge 2 commits intoapache:masterfrom
MartijnVisser:agents.md
Draft

[FLINK-XXXXX][docs] Add AGENTS.md and AI disclosure to PR template#27776
MartijnVisser wants to merge 2 commits intoapache:masterfrom
MartijnVisser:agents.md

Conversation

@MartijnVisser
Copy link
Contributor

What is the purpose of the change

Make the Flink codebase accessible to AI coding agents and establish lightweight guidelines for AI-assisted contributions.

Brief change log

  • Added AGENTS.md at the repository root with guidelines for AI coding agents: prerequisites, build/test commands, repository structure, architecture boundaries, common change patterns, coding standards, testing standards, commit conventions, and boundaries
  • Added a dedicated AI disclosure section to the PR template with a checkbox and commented-out Generated-by tag, aligned with the ASF Generative Tooling Guidance

Verifying this change

This change is a trivial rework / code cleanup without any test coverage.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
  • The serializers: (yes / no / don't know)
  • The runtime per-record code paths (performance sensitive): (yes / no / don't know)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
  • The S3 file system connector: (yes / no / don't know)

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

Was generative AI tooling used to co-author this PR?

  • Yes: Claude Code 1.0.33 (Claude Opus 4.6, 1M context)

@flinkbot
Copy link
Collaborator

flinkbot commented Mar 16, 2026

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

Add AGENTS.md with guidelines for AI coding agents: prerequisites,
build/test commands, repository structure, architecture boundaries,
common change patterns, coding standards, testing standards, commit
conventions, and boundaries.

Add AI-assisted contribution disclosure checkbox to the PR template,
linking to ASF Generative Tooling Guidance.

Generated-by: Claude Code 1.0.33 (Claude Opus 4.6, 1M context)
- Clarify Java 11 syntax requirement (Java 17 syntax only in flink-tests-java17)
- Remove hardcoded checkstyle version, reference pom.xml property instead
- Add no new Scala code rule (FLIP-265)
- Add red-green test verification requirement for bug fixes
- Add "concisely" to PR template guidance
- Add CHECKSTYLE:ON/OFF and @SuppressWarnings to Never section

Generated-by: Claude Code 1.0.33 (Claude Opus 4.6, 1M context)
Copy link
Contributor

@gustavodemorais gustavodemorais left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @MartijnVisser thanks for driving this. In general, I'm +1 for the Initiative and it will be a continuous work to keep this up-to-date and correct. I've added several suggestions for the first version

One more feedback for the whole PR: for the Table/SQL Stack, Semantic tests are the recommended tests instead of IT test case. Semantic tests can also perform validation checks and can test most of what we test with Plan Tests and should be preferred. Also, RestoreTests should be used for cases where state is used and the restore mechanism should be tested.


---

##### Was generative AI tooling used to co-author this PR?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the goal with this section? 

It makes the template longer and I personally don't see much benefit from it. Eventually 99% of people will probably be using AI and the answer will be always yes. Also, if people want to provide the information of which tool they're using, they can already add "Generated-by: " to the commits, which most code assistants fill automatically if you use them to commit and you also added below.


- Format code (Java + Scala): `./mvnw spotless:apply`
- Check formatting: `./mvnw spotless:check`
- Checkstyle config: `tools/maven/checkstyle.xml` (v10.18.2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Checkstyle config: `tools/maven/checkstyle.xml` (v10.18.2)
- Checkstyle config: `tools/maven/checkstyle.xml`


This section maps common types of Flink changes to the modules they touch and the verification they require.

### Adding a new SQL built-in function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've recently added BuiltInProcessTableFunction as well

2. Implement in `flink-table-runtime` under `functions/scalar/` (extend `BuiltInScalarFunction`, implement `eval()` method)
3. Add tests in `flink-table-planner` (planner integration) and `flink-table-runtime` (unit tests)
4. Document in `flink-docs`
5. Verify: planner tests + runtime tests + SQL ITCase
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
5. Verify: planner tests + runtime tests + SQL ITCase
5. Extend support for TableAPI's api
6. Verify: planner tests + semantic tests (ITCase should be avoided) + restore tests if you require state

4. Document in `flink-docs` if user-facing
5. Verify: unit test for default value, ITCase for behavior change

### Adding a new table operator (e.g., join type, aggregate)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite simplified... The natural order, still simplified, would be actually the opposite

  1. Runtime operator
  2. Add harness test for runtime operator
  3. Add ExecNode
  4. Add Physical Node + Physical Rules
  5. Add Logical Node + Planner rule
  6. Add semantic tests, plan tests, restore tests

- **API stability annotations:** Every user-facing API class and method must have a stability annotation. `@Public` (stable across minor releases), `@PublicEvolving` (may change in minor releases), `@Experimental` (may change at any time). These are all part of the public API surface that users build against. `@Internal` marks APIs with no stability guarantees that users should not depend on.
- **Logging:** Use parameterized log statements (SLF4J `{}` placeholders), never string concatenation.
- **No Java serialization** for new features (except internal RPC message transport).
- Full code style guide: https://flink.apache.org/how-to-contribute/code-style-and-quality-preamble/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Always use final variables where applicable
  • Do not add unnecessary comments that simply states what one can understand from the code. Add comments that explain "the why" for places which are relevant.


- `[FLINK-XXXX][component] Description` where FLINK-XXXX is the JIRA issue number
- `[hotfix][component] Description` for typo fixes without JIRA
- Each commit must have a meaningful message including the JIRA ID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Each commit must have a meaningful message including the JIRA ID
- Each commit must have a meaningful message including the JIRA ID
- If you don't know what the ticket number is, ask me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's how my local rule is writen and works for me 🙂

Copy link
Contributor

@gustavodemorais gustavodemorais left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple more, this ones with some input from me + help from AI itself:


### Build

- Full build (Java 17 default): `./mvnw clean package -DskipTests -Djdk17 -Pjava17-target`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two issues with the build commands:

  1. The -Djdk17, -Djdk11, -Djdk21 flags don't exist as properties in the root pom.xml. Only the -P profiles exist (java17-target, java11-target, java21-target). An agent following these commands would pass a useless flag silently.

  2. Consider adding a fast build command for development. The project has a -Pfast profile (root pom.xml line 1271) that skips checkstyle, rat, spotless, enforcer, javadoc, japicmp, and cyclonedx all at once. Combined with -Pskip-webui-build (skips the Node.js frontend build), this gives agents a much faster iteration cycle.

Suggested build section:

  • Fast dev build: ./mvnw clean install -T4 -DskipTests -DskipITs -Pfast -Pskip-webui-build -fn
  • Full build: ./mvnw clean package -DskipTests
  • Java 11 target: ./mvnw clean package -DskipTests -Pjava11-target
  • Java 21 target: ./mvnw clean package -DskipTests -Pjava21-target
  • Single module: ./mvnw clean package -DskipTests -pl flink-core-api
  • Full build with tests: ./mvnw clean verify


- `flink-state-backends/`
- `flink-statebackend-rocksdb` — RocksDB state backend
- `flink-statebackend-forst` — ForSt state backend (experimental, RocksDB-based with JNI bindings)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ForSt is not "RocksDB-based" - it's a fork of RocksDB (the name literally stands for "Fork of RocksDB"). The dependency is forstjni, not rocksdbjni. Saying "RocksDB-based" implies it wraps RocksDB.

Suggestion: ForSt state backend (experimental; ForSt is a fork of RocksDB)

2. **JobManager** (`flink-runtime`) orchestrates execution: receives jobs, creates the execution graph, manages scheduling, coordinates checkpoints, and handles failover. Never runs user code directly.
3. **TaskManager** (`flink-runtime`) executes the user's operators in task slots. Manages network buffers, state backends, and I/O.
4. **Table Planner** (`flink-table-planner`) translates SQL/Table API programs into DataStream programs. The planner is loaded in a separate classloader (`flink-table-planner-loader`) to isolate Calcite dependencies.
5. **Connectors** communicate with external systems. Source connectors implement the `Source` API (FLIP-27); sinks implement the `SinkV2` API. Most connectors are externalized to separate repositories.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no class named SinkV2 in the codebase. The actual interface is Sink in package org.apache.flink.api.connector.sink2. An agent searching for a SinkV2 class would find nothing. This is referenced in multiple places in this file (here, line 229 in Common Change Patterns, and line 317 in Boundaries).

Suggestion: use Sink API (package sink2) throughout.


### Adding a new table operator (e.g., join type, aggregate)

1. **Planner rule** in `flink-table-planner` under `plan/rules/physical/stream/` (match conditions, transform to physical node)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The paths here only mention stream/ subdirectories, but there are equivalent batch/ directories for both planner rules (plan/rules/physical/batch/) and ExecNodes (plan/nodes/exec/batch/). An agent adding a batch operator would be left without guidance.

Consider mentioning both stream/ and batch/ paths.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants