feat(incident): add IncidentTcrsSyncHandler for Task→TCRS sync by IceS2 · Pull Request #26759 · open-metadata/OpenMetadata

IceS2 · 2026-03-25T07:20:35Z

Summary

Adds aboutEntityLink field to Task schema (EntityLink format encoding testCase FQN + incident stateId), backed by generated DB columns and index in MySQL/PostgreSQL
Implements IncidentTcrsSyncHandler — an async EntityLifecycleEventHandler that syncs Task lifecycle events to TCRS records (New on creation, Ack on InProgress, Assigned on assignee change, Resolved on terminal status)
Adds TCRS guard in openOrAssignTask() to prevent duplicate Task creation when a workflow-managed Task already exists
SetupImpl builds aboutEntityLink for IncidentResolution/TestCaseResolution task types
Adds testCaseStatus to WorkflowTriggerFields enum (matches the actual field name in entity changeDescription)

Test plan

IncidentTcrsSyncHandlerTest — 10 unit tests covering isIncidentTask() boundary conditions, handler properties, and EntityLink parsing
ManualTaskOutboxIT — E2E integration test verifying full pipeline: workflow trigger → Task creation with aboutEntityLink → TCRS(Ack) on InProgress → TCRS(Resolved) on Completed → workflow FINISHED

Adds a generic, configurable-status human task node for governance workflows. The node creates an OM Task, waits for status transitions via IntermediateCatchEvent messages, and routes based on terminal vs non-terminal statuses. Key components: - ManualTask.java: BPMN subprocess builder (setup → wait → route → close) - SetupDelegate/SetupImpl: Task creation, idempotent on cycle re-entry - CheckTerminalDelegate: Validates status against template - CloseTaskDelegate/CloseTaskImpl: Closes task on terminal status - SetResultDelegate: Propagates status to parent for edge routing - ManualTaskTemplateResolver: Template-based status configuration - ManualTaskDefinition JSON schema + nodeType/nodeSubType registration The node is domain-agnostic — incident/approval behavior lives in the workflow graph around the node, not inside it.

Remove inputNamespaceMapExpr and configMapExpr from BaseDelegate. Each delegate now declares its own Expression fields, preventing NullPointerExceptions in delegates that don't use these fields (e.g., SetResultDelegate, CheckTerminalDelegate, CloseTaskDelegate).

- Fix: isAlreadyClosed now only checks task.getResolution() != null. Previously it also checked terminalStatuses.contains(currentStatus), which always returned true when CloseTask runs (the PATCH already set the terminal status), leaving tasks permanently unresolved. - Remove unused terminalStatuses parameter from closeTask/CloseTaskDelegate - Rename taskCreated → taskAlreadyExists for clarity: the variable means "should we enter the message-waiting phase" (true on re-entry, false on first creation)

…ual-task-node

Implements the bridge that connects Task status changes to the ManualTask workflow node via Flowable message delivery. Bridge chain: TaskRepository.postUpdate() detects status change → TaskWorkflowHandler.transitionManualTaskStatus() (sends updatedBy) → WorkflowHandler.sendManualTaskMessage() with async exponential retry Key design decisions: - postUpdate wrapped in try-catch: workflow failures never break PATCH - Async retry via ScheduledExecutorService + resilience4j IntervalFunction: 500ms → 1s → 2s → 4s → 5s cap (~12.5s total coverage) - First attempt synchronous (fast path), retries non-blocking - CloseTaskImpl uses actual user from PATCH, falls back to governance-bot Also includes: - WorkflowDefinitionRepository/WorkflowInstanceStateRepository updates - CollectionDAO, ListFilter, EntityResource supporting changes - SQL migration (2.0.0) for stageResult generated column - ManualTaskWorkflowTest: full E2E lifecycle test

- Fix: catch FlowableOptimisticLockingException in tryDeliverMessage and return false to trigger retry (concurrent modification means another thread may have consumed the subscription) - Fix: nonTerminalReachable BFS now iterates the full edges list at every step, not just the unfiltered outgoingEdges map. Prevents following terminal-condition edges from intermediate nodes. - Refactor: hoist IntervalFunction to a static final constant instead of recreating on each retry. Made interval fields final. - Fix: remove IF NOT EXISTS from PostgreSQL migration for consistency with MySQL pattern (Flyway handles migration idempotency)

…tConsumer Add Entity.TASK to validEntityTypes, detect workflow-managed task status changes via isWorkflowManagedTaskStatusChange, and enqueue them to the outbox table via enqueueTaskMessage before the existing signal broadcast.

Add 4 reflection-based unit tests for isWorkflowManagedTaskStatusChange covering early-return conditions: non-update events, non-task entity types, missing changeDescription, and non-status field changes.

Remove the TaskRepository.postUpdate override that synchronously called TransitionManualTaskStatus, the TaskWorkflowHandler.transitionManualTaskStatus method it depended on, and the WorkflowHandler async retry infrastructure (sendManualTaskMessage, scheduleMessageRetry, tryDeliverMessage, and their backing constants and ScheduledExecutorService). Task status transitions are now delivered exclusively via the Transactional Outbox pattern.

…andler Start the drainer after the process engine is built in the constructor. Restart it when initializeNewProcessEngine() rebuilds the engine at runtime. Shut it down gracefully via WorkflowHandler.shutdown(), which is called from ManagedShutdown.stop() in OpenMetadataApplication.

…tency The E2E test must tolerate up to 10s CE poll + 30s drainer poll plus margin. Raise all Awaitility atMost() values to 90 seconds.

…roadcast disruption A DB failure during outbox INSERT should not prevent the signal broadcast path from executing. Log the error and continue.

…rity

…an older row SKIP LOCKED skips individual locked rows, not entire task groups. Without this guard, Worker B could grab a newer status while Worker A still holds the oldest — violating per-task ordering. The fix queries the absolute oldest createdAt (no lock) and skips the task if the locked row is newer.

…ycle Collapse findDistinctPendingTaskIds + per-task findAndLockOldestPending + per-task findOldestPendingCreatedAt into a single findAndLockAllOldestPending query using MIN(createdAt) JOIN with FOR UPDATE SKIP LOCKED. Per-task ordering is preserved naturally: if the oldest row for a task is locked by another worker, the JOIN produces no match for that task.

C2: Replace MIN(createdAt) JOIN with ROW_NUMBER() PARTITION BY taskId to guarantee exactly one row per task even with identical timestamps. C3: Split bulk-lock transaction into bulk-read (no lock) + per-entry transactions. Row locks now held only during single-entry delivery, not the entire batch. Flowable API calls no longer inside a DB transaction holding locks on other rows. I2: Add MAX_ATTEMPTS=100. Entries exceeding this are excluded from the drain query and effectively dead-lettered for investigation. I3: Call cleanupDelivered() at end of each drain cycle with 7-day retention to prevent unbounded table growth. I4: Extract workflowInstanceId from ChangeEvent entity payload instead of fetching from DB. Eliminates extra round trip per task status change event. I5: Move signal broadcast before outbox enqueue so it always fires. Wrap enqueue in resilience4j retry (3 attempts) for transient DB errors. Unhandled failure propagates to event publisher for retry.

Add LIMIT 500 with ORDER BY attempts ASC, createdAt ASC to prevent unbounded result sets and prioritize fresh messages over stuck ones. Separate cleanup into its own try-catch for cleaner error diagnostics.

…OutboxIT Move E2E test to integration-tests module where the full application stack is running (CE pipeline, schedulers, drainer). The test verifies the complete outbox delivery pipeline through observable outcomes: 1. Deploy workflow → create table → workflow triggers → task created 2. PATCH task InProgress → PATCH task Completed 3. Poll workflow instance until FINISHED status 4. Assert stage results contain expected status transitions

…sage

…orkflow-bridge

…incident-tcrs-sync-hook

…pen-metadata/OpenMetadata into feat/ilw-item2-incident-tcrs-sync-hook

github-actions · 2026-03-25T07:56:34Z

✅ TypeScript Types Auto-Updated

The generated TypeScript types have been automatically updated based on JSON schema changes in this PR.

github-actions · 2026-03-25T07:57:53Z

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

…ncident-tcrs-sync-hook Resolve conflicts in SetupImpl.java and ManualTaskOutboxIT.java by keeping HEAD (aboutEntityLink + TCRS sync additions).

github-actions · 2026-03-30T16:10:37Z

❌ Lint Check Failed — ESLint + Prettier (core-components)

The following files have style issues that need to be fixed:

Fix locally (fast — only for changed files in the branch):

make ui-checkstyle-core-components-changed

Or to fix all files:

make ui-checkstyle-core-components

github-actions · 2026-03-30T16:14:22Z

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

yan-3005 · 2026-03-31T07:19:32Z

bootstrap/sql/migrations/native/2.0.0/mysql/schemaChanges.sql


+-- aboutEntityLink: hierarchical entity identity for lifecycle handler lookups
+ALTER TABLE task_entity ADD COLUMN IF NOT EXISTS aboutEntityLink varchar(1024)
+  GENERATED ALWAYS AS (json_unquote(json_extract(`json`, _utf8mb4'$.aboutEntityLink'))) STORED;


is mysql not utf8 by default?

I think it should be, but it does not hurt to be defensive, right?

In sql files okay, in run time for other cases when we know the value is so and so, it's better to code that way rather than being defensive, makes debugging a pain. As this is in sql file, it's okay!

yan-3005 · 2026-03-31T07:23:00Z

...rc/main/java/org/openmetadata/service/events/lifecycle/handlers/IncidentTcrsSyncHandler.java

+    String testCaseFqn = link.getEntityFQN();
+    UUID stateId = UUID.fromString(link.getArrayFieldName());
+
+    if (tcrRecordExists(stateId)) {


nit: can we name the functions in detail, I had to recall what tcr is for 1-2 mins which is not helping in debugging

This is fair, will do it!

...rc/main/java/org/openmetadata/service/events/lifecycle/handlers/IncidentTcrsSyncHandler.java

…timize BFS - Rename tcrRecordExists/insertTcrsRecord/mapTaskChangeToTcrsType to descriptive names (incidentResolutionStatusExists, etc.) per reviewer feedback on abbreviation clarity - Guard extractStringValue against single-char edge case - Use outgoingEdges adjacency list in nonTerminalReachable BFS instead of scanning all edges per node (O(N+E) vs O(N*E))

gitar-bot · 2026-03-31T13:32:01Z

Code Review ✅ Approved 2 resolved / 2 findings

Adds IncidentTcrsSyncHandler for syncing Tasks to TCRS with proper error handling for string edge cases and optimized graph traversal. All findings resolved.

✅ 2 resolved

✅ Edge Case: extractStringValue crashes on single-quote string edge case

📄 openmetadata-service/src/main/java/org/openmetadata/service/events/lifecycle/handlers/IncidentTcrsSyncHandler.java:226-228
In IncidentTcrsSyncHandler.extractStringValue(), the expression s.substring(1, s.length() - 1) is called when s.startsWith("""). If s is exactly " (length 1), this becomes substring(1, 0) which throws StringIndexOutOfBoundsException. While unlikely in practice (FieldChange status values are well-formed), this is a latent crash in the lifecycle event handler.

✅ Performance: nonTerminalReachable BFS iterates all edges per node (O(N*E))

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/WorkflowDefinitionRepository.java:420-430
In WorkflowDefinitionRepository.nonTerminalReachable(), the BFS inner loop iterates over ALL edges for each node in the queue to find outgoing edges: for (EdgeDefinition edge : edges). Since outgoingEdges adjacency list is already built at line 250-267 and passed to buildCycleCheckGraph, it could be reused here instead of scanning all edges. This is O(N*E) instead of O(N+E). Only impacts validation time, not a hot path, but easy to fix.

Options

Display: compact → Showing less information.

Comment with these commands to change:

`Compact`
`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

github-actions · 2026-03-31T13:38:11Z

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

IceS2 and others added 30 commits March 16, 2026 14:41

Implement IntermediateCatchEventBuilder

010f3ea

Implement IntermediateCatchEventBuilder

9403129

Update generated TypeScript types

77d0d61

Merge branch 'feat/incident-lifecycle-workflow' into feat/ilw-pr2-man…

bb29b37

…ual-task-node

Address PR review: configurable assignees, null-guard, required fields

0ab66d0

Update generated TypeScript types

1a5c2b8

fix: safe boolean cast and dedup assignees in resolveAssignees

a5020bc

feat(outbox): add task_workflow_outbox table migrations

aa52ef1

feat(outbox): add OutboxEntry POJO and TaskWorkflowOutboxDAO

5e78598

feat(outbox): add TaskWorkflowOutboxDrainer with unit tests

f431868

test(outbox): add consumer routing filter tests

7d39894

Add 4 reflection-based unit tests for isWorkflowManagedTaskStatusChange covering early-return conditions: non-update events, non-task entity types, missing changeDescription, and non-status field changes.

test(outbox): increase ManualTaskWorkflowTest timeouts for polling la…

4cb3ed6

…tency The E2E test must tolerate up to 10s CE poll + 30s drainer poll plus margin. Raise all Awaitility atMost() values to 90 seconds.

fix(outbox): wrap enqueueTaskMessage in try-catch to prevent signal b…

521d2f4

…roadcast disruption A DB failure during outbox INSERT should not prevent the signal broadcast path from executing. Log the error and continue.

fix(outbox): rename index prefix from idx_two_ to idx_outbox_ for cla…

35205ef

…rity

fix(outbox): add batch limit and prioritized ordering to drain query

926787c

Add LIMIT 500 with ORDER BY attempts ASC, createdAt ASC to prevent unbounded result sets and prioritize fresh messages over stuck ones. Separate cleanup into its own try-catch for cleaner error diagnostics.

fix(outbox): handle raw string FieldChange.newValue in enqueueTaskMes…

f73830b

…sage

style: spotless formatting on IncidentTaskIntegrationIT

56ec631

fix(outbox): wrap enqueue retry exhaustion in EventPublisherException

b5512e0

IceS2 and others added 4 commits March 25, 2026 08:51

Merge branch 'feat/ilw-pr2-manual-task-node' into feat/ilw-pr3-task-w…

5380b1b

…orkflow-bridge

Merge branch 'feat/ilw-pr3-task-workflow-bridge' into feat/ilw-item2-…

ea7a316

…incident-tcrs-sync-hook

Merge branch 'feat/ilw-item2-incident-tcrs-sync-hook' of github.com:o…

0bc1464

…pen-metadata/OpenMetadata into feat/ilw-item2-incident-tcrs-sync-hook

Update generated TypeScript types

5386602

Merge branch 'feat/incident-lifecycle-workflow' into feat/ilw-item2-i…

a3df3e2

…ncident-tcrs-sync-hook Resolve conflicts in SetupImpl.java and ManualTaskOutboxIT.java by keeping HEAD (aboutEntityLink + TCRS sync additions).

IceS2 temporarily deployed to test March 30, 2026 16:19 — with GitHub Actions Inactive

IceS2 had a problem deploying to test March 30, 2026 16:19 — with GitHub Actions Failure

IceS2 temporarily deployed to test March 30, 2026 16:19 — with GitHub Actions Inactive

IceS2 had a problem deploying to test March 30, 2026 16:19 — with GitHub Actions Failure

yan-3005 reviewed Mar 31, 2026

View reviewed changes

...rc/main/java/org/openmetadata/service/events/lifecycle/handlers/IncidentTcrsSyncHandler.java Outdated Show resolved Hide resolved

IceS2 temporarily deployed to test March 31, 2026 13:42 — with GitHub Actions Inactive

IceS2 had a problem deploying to test March 31, 2026 13:42 — with GitHub Actions Failure

yan-3005 approved these changes Mar 31, 2026

View reviewed changes

IceS2 merged commit 6b6a1a7 into feat/incident-lifecycle-workflow Mar 31, 2026
21 of 50 checks passed

IceS2 deleted the feat/ilw-item2-incident-tcrs-sync-hook branch March 31, 2026 13:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(incident): add IncidentTcrsSyncHandler for Task→TCRS sync#26759

feat(incident): add IncidentTcrsSyncHandler for Task→TCRS sync#26759
IceS2 merged 41 commits intofeat/incident-lifecycle-workflowfrom
feat/ilw-item2-incident-tcrs-sync-hook

IceS2 commented Mar 25, 2026

Uh oh!

github-actions bot commented Mar 25, 2026

Uh oh!

github-actions bot commented Mar 25, 2026

Uh oh!

github-actions bot commented Mar 30, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 30, 2026

Uh oh!

yan-3005 Mar 31, 2026

Uh oh!

IceS2 Mar 31, 2026

Uh oh!

yan-3005 Mar 31, 2026

Uh oh!

yan-3005 Mar 31, 2026

Uh oh!

IceS2 Mar 31, 2026

Uh oh!

Uh oh!

gitar-bot bot commented Mar 31, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

IceS2 commented Mar 25, 2026

Summary

Test plan

Uh oh!

github-actions bot commented Mar 25, 2026

✅ TypeScript Types Auto-Updated

Uh oh!

github-actions bot commented Mar 25, 2026

Uh oh!

github-actions bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

❌ Lint Check Failed — ESLint + Prettier (core-components)

Uh oh!

github-actions bot commented Mar 30, 2026

Uh oh!

yan-3005 Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

IceS2 Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

yan-3005 Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

yan-3005 Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

IceS2 Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gitar-bot bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Mar 30, 2026 •

edited

Loading

gitar-bot bot commented Mar 31, 2026 •

edited

Loading