Fixes #25933 added-pipeline-lineage by varun-lakhyani · Pull Request #26786 · open-metadata/OpenMetadata

varun-lakhyani · 2026-03-26T04:26:21Z

Describe your changes:

Earlier only jobs where there in ingestion and lineage then pipeline were added in ingestion, this extends pipeline to lineage and name factoring from Job to Entity as now it includes both jobs and pipelines

I worked on ... because ...

Type of change:

Checklist:

I have read the CONTRIBUTING document.
My PR title is Fixes <issue-number>: <short explanation>
I have commented on my code, particularly in hard-to-understand areas.
For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

Summary by Gitar

Naming refactor:
- Renamed DATABRICKS_GET_TABLE_LINEAGE_FOR_JOB to DATABRICKS_GET_TABLE_LINEAGE for broader applicability
- Renamed DATABRICKS_GET_COLUMN_LINEAGE_FOR_JOB to DATABRICKS_GET_COLUMN_LINEAGE
- Updated variable names from job_* to entity_* across DatabricksClient class
Query expansion:
- Modified SQL queries to include both JOB and PIPELINE entity types in WHERE clause
Pipeline lineage support:
- Updated DatabrickspipelineSource.yield_pipeline_lineage_details() to use pipeline_id fallback when job_id unavailable
Tests:
- Added test_databricks_dlt_pipeline_lineage() covering DLT pipeline lineage with and without column lineage

_{This will update automatically on new commits.}

github-actions · 2026-03-26T04:26:48Z

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

ingestion/src/metadata/ingestion/source/database/databricks/client.py

github-actions · 2026-03-26T04:51:18Z

🛡️ TRIVY SCAN RESULT 🛡️

Target: `openmetadata-ingestion:trivy (debian 12.12)`

Vulnerabilities (4)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`libpam-modules`	CVE-2025-6020	🚨 HIGH	1.5.2-6+deb12u1	1.5.2-6+deb12u2
`libpam-modules-bin`	CVE-2025-6020	🚨 HIGH	1.5.2-6+deb12u1	1.5.2-6+deb12u2
`libpam-runtime`	CVE-2025-6020	🚨 HIGH	1.5.2-6+deb12u1	1.5.2-6+deb12u2
`libpam0g`	CVE-2025-6020	🚨 HIGH	1.5.2-6+deb12u1	1.5.2-6+deb12u2

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Java`

Vulnerabilities (39)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`com.fasterxml.jackson.core:jackson-core`	CVE-2025-52999	🚨 HIGH	2.12.7	2.15.0
`com.fasterxml.jackson.core:jackson-core`	GHSA-72hv-8253-57qq	🚨 HIGH	2.12.7	2.18.6, 2.21.1, 3.1.0
`com.fasterxml.jackson.core:jackson-core`	CVE-2025-52999	🚨 HIGH	2.13.4	2.15.0
`com.fasterxml.jackson.core:jackson-core`	GHSA-72hv-8253-57qq	🚨 HIGH	2.13.4	2.18.6, 2.21.1, 3.1.0
`com.fasterxml.jackson.core:jackson-core`	GHSA-72hv-8253-57qq	🚨 HIGH	2.15.2	2.18.6, 2.21.1, 3.1.0
`com.fasterxml.jackson.core:jackson-core`	GHSA-72hv-8253-57qq	🚨 HIGH	2.16.1	2.18.6, 2.21.1, 3.1.0
`com.fasterxml.jackson.core:jackson-databind`	CVE-2022-42003	🚨 HIGH	2.12.7	2.12.7.1, 2.13.4.2
`com.fasterxml.jackson.core:jackson-databind`	CVE-2022-42004	🚨 HIGH	2.12.7	2.12.7.1, 2.13.4
`com.google.code.gson:gson`	CVE-2022-25647	🚨 HIGH	2.2.4	2.8.9
`com.google.protobuf:protobuf-java`	CVE-2021-22569	🚨 HIGH	3.3.0	3.16.1, 3.18.2, 3.19.2
`com.google.protobuf:protobuf-java`	CVE-2022-3509	🚨 HIGH	3.3.0	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2022-3510	🚨 HIGH	3.3.0	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2024-7254	🚨 HIGH	3.3.0	3.25.5, 4.27.5, 4.28.2
`com.google.protobuf:protobuf-java`	CVE-2021-22569	🚨 HIGH	3.7.1	3.16.1, 3.18.2, 3.19.2
`com.google.protobuf:protobuf-java`	CVE-2022-3509	🚨 HIGH	3.7.1	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2022-3510	🚨 HIGH	3.7.1	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2024-7254	🚨 HIGH	3.7.1	3.25.5, 4.27.5, 4.28.2
`com.nimbusds:nimbus-jose-jwt`	CVE-2023-52428	🚨 HIGH	9.8.1	9.37.2
`com.squareup.okhttp3:okhttp`	CVE-2021-0341	🚨 HIGH	3.12.12	4.9.2
`commons-beanutils:commons-beanutils`	CVE-2025-48734	🚨 HIGH	1.9.4	1.11.0
`commons-io:commons-io`	CVE-2024-47554	🚨 HIGH	2.8.0	2.14.0
`dnsjava:dnsjava`	CVE-2024-25638	🚨 HIGH	2.1.7	3.6.0
`io.airlift:aircompressor`	CVE-2025-67721	🚨 HIGH	0.27	2.0.3
`io.netty:netty-codec-http2`	CVE-2025-55163	🚨 HIGH	4.1.96.Final	4.2.4.Final, 4.1.124.Final
`io.netty:netty-codec-http2`	GHSA-xpw8-rcwv-8f8p	🚨 HIGH	4.1.96.Final	4.1.100.Final
`io.netty:netty-handler`	CVE-2025-24970	🚨 HIGH	4.1.96.Final	4.1.118.Final
`net.minidev:json-smart`	CVE-2021-31684	🚨 HIGH	1.3.2	1.3.3, 2.4.4
`net.minidev:json-smart`	CVE-2023-1370	🚨 HIGH	1.3.2	2.4.9
`org.apache.avro:avro`	CVE-2024-47561	🔥 CRITICAL	1.7.7	1.11.4
`org.apache.avro:avro`	CVE-2023-39410	🚨 HIGH	1.7.7	1.11.3
`org.apache.derby:derby`	CVE-2022-46337	🔥 CRITICAL	10.14.2.0	10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0
`org.apache.ivy:ivy`	CVE-2022-46751	🚨 HIGH	2.5.1	2.5.2
`org.apache.mesos:mesos`	CVE-2018-1330	🚨 HIGH	1.4.3	1.6.0
`org.apache.spark:spark-core_2.12`	CVE-2025-54920	🚨 HIGH	3.5.6	3.5.7
`org.apache.thrift:libthrift`	CVE-2019-0205	🚨 HIGH	0.12.0	0.13.0
`org.apache.thrift:libthrift`	CVE-2020-13949	🚨 HIGH	0.12.0	0.14.0
`org.apache.zookeeper:zookeeper`	CVE-2023-44981	🔥 CRITICAL	3.6.3	3.7.2, 3.8.3, 3.9.1
`org.eclipse.jetty:jetty-server`	CVE-2024-13009	🚨 HIGH	9.4.56.v20240826	9.4.57.v20241219
`org.lz4:lz4-java`	CVE-2025-12183	🚨 HIGH	1.8.0	1.8.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Node.js`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Python`

Vulnerabilities (33)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`Authlib`	CVE-2026-27962	🔥 CRITICAL	1.6.6	1.6.9
`Authlib`	CVE-2026-28490	🚨 HIGH	1.6.6	1.6.9
`Authlib`	CVE-2026-28498	🚨 HIGH	1.6.6	1.6.9
`Authlib`	CVE-2026-28802	🚨 HIGH	1.6.6	1.6.7
`PyJWT`	CVE-2026-32597	🚨 HIGH	2.10.1	2.12.0
`Werkzeug`	CVE-2024-34069	🚨 HIGH	2.2.3	3.0.3
`aiohttp`	CVE-2025-69223	🚨 HIGH	3.12.12	3.13.3
`aiohttp`	CVE-2025-69223	🚨 HIGH	3.13.2	3.13.3
`apache-airflow`	CVE-2025-68438	🚨 HIGH	3.1.5	3.1.6
`apache-airflow`	CVE-2025-68675	🚨 HIGH	3.1.5	3.1.6, 2.11.1
`apache-airflow`	CVE-2026-26929	🚨 HIGH	3.1.5	3.1.8
`apache-airflow`	CVE-2026-28779	🚨 HIGH	3.1.5	3.1.8
`apache-airflow`	CVE-2026-30911	🚨 HIGH	3.1.5	3.1.8
`apache-airflow-providers-http`	CVE-2025-69219	🚨 HIGH	5.6.0	6.0.0
`azure-core`	CVE-2026-21226	🚨 HIGH	1.37.0	1.38.0
`cryptography`	CVE-2026-26007	🚨 HIGH	42.0.8	46.0.5
`google-cloud-aiplatform`	CVE-2026-2472	🚨 HIGH	1.130.0	1.131.0
`google-cloud-aiplatform`	CVE-2026-2473	🚨 HIGH	1.130.0	1.133.0
`jaraco.context`	CVE-2026-23949	🚨 HIGH	5.3.0	6.1.0
`jaraco.context`	CVE-2026-23949	🚨 HIGH	6.0.1	6.1.0
`protobuf`	CVE-2026-0994	🚨 HIGH	4.25.8	6.33.5, 5.29.6
`pyOpenSSL`	CVE-2026-27459	🚨 HIGH	24.1.0	26.0.0
`pyasn1`	CVE-2026-23490	🚨 HIGH	0.6.1	0.6.2
`pyasn1`	CVE-2026-30922	🚨 HIGH	0.6.1	0.6.3
`python-multipart`	CVE-2026-24486	🚨 HIGH	0.0.20	0.0.22
`ray`	CVE-2025-62593	🔥 CRITICAL	2.47.1	2.52.0
`starlette`	CVE-2025-62727	🚨 HIGH	0.48.0	0.49.1
`tornado`	CVE-2026-31958	🚨 HIGH	6.5.3	6.5.5
`urllib3`	CVE-2025-66418	🚨 HIGH	1.26.20	2.6.0
`urllib3`	CVE-2025-66471	🚨 HIGH	1.26.20	2.6.0
`urllib3`	CVE-2026-21441	🚨 HIGH	1.26.20	2.6.3
`wheel`	CVE-2026-24049	🚨 HIGH	0.45.1	0.46.2
`wheel`	CVE-2026-24049	🚨 HIGH	0.45.1	0.46.2

🛡️ TRIVY SCAN RESULT 🛡️

Target: `usr/bin/docker`

Vulnerabilities (4)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`stdlib`	CVE-2025-68121	🔥 CRITICAL	v1.25.5	1.24.13, 1.25.7, 1.26.0-rc.3
`stdlib`	CVE-2025-61726	🚨 HIGH	v1.25.5	1.24.12, 1.25.6
`stdlib`	CVE-2025-61728	🚨 HIGH	v1.25.5	1.24.12, 1.25.6
`stdlib`	CVE-2026-25679	🚨 HIGH	v1.25.5	1.25.8, 1.26.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/etc/ssl/private/ssl-cert-snakeoil.key`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/home/airflow/openmetadata-airflow-apis/openmetadata_managed_apis.egg-info/PKG-INFO`

No Vulnerabilities Found

github-actions · 2026-03-26T04:51:24Z

🛡️ TRIVY SCAN RESULT 🛡️

Target: `openmetadata-ingestion-base-slim:trivy (debian 12.13)`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Java`

Vulnerabilities (38)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`com.fasterxml.jackson.core:jackson-core`	CVE-2025-52999	🚨 HIGH	2.12.7	2.15.0
`com.fasterxml.jackson.core:jackson-core`	GHSA-72hv-8253-57qq	🚨 HIGH	2.12.7	2.18.6, 2.21.1, 3.1.0
`com.fasterxml.jackson.core:jackson-core`	CVE-2025-52999	🚨 HIGH	2.13.4	2.15.0
`com.fasterxml.jackson.core:jackson-core`	GHSA-72hv-8253-57qq	🚨 HIGH	2.13.4	2.18.6, 2.21.1, 3.1.0
`com.fasterxml.jackson.core:jackson-core`	GHSA-72hv-8253-57qq	🚨 HIGH	2.15.2	2.18.6, 2.21.1, 3.1.0
`com.fasterxml.jackson.core:jackson-databind`	CVE-2022-42003	🚨 HIGH	2.12.7	2.12.7.1, 2.13.4.2
`com.fasterxml.jackson.core:jackson-databind`	CVE-2022-42004	🚨 HIGH	2.12.7	2.12.7.1, 2.13.4
`com.google.code.gson:gson`	CVE-2022-25647	🚨 HIGH	2.2.4	2.8.9
`com.google.protobuf:protobuf-java`	CVE-2021-22569	🚨 HIGH	3.3.0	3.16.1, 3.18.2, 3.19.2
`com.google.protobuf:protobuf-java`	CVE-2022-3509	🚨 HIGH	3.3.0	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2022-3510	🚨 HIGH	3.3.0	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2024-7254	🚨 HIGH	3.3.0	3.25.5, 4.27.5, 4.28.2
`com.google.protobuf:protobuf-java`	CVE-2021-22569	🚨 HIGH	3.7.1	3.16.1, 3.18.2, 3.19.2
`com.google.protobuf:protobuf-java`	CVE-2022-3509	🚨 HIGH	3.7.1	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2022-3510	🚨 HIGH	3.7.1	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2024-7254	🚨 HIGH	3.7.1	3.25.5, 4.27.5, 4.28.2
`com.nimbusds:nimbus-jose-jwt`	CVE-2023-52428	🚨 HIGH	9.8.1	9.37.2
`com.squareup.okhttp3:okhttp`	CVE-2021-0341	🚨 HIGH	3.12.12	4.9.2
`commons-beanutils:commons-beanutils`	CVE-2025-48734	🚨 HIGH	1.9.4	1.11.0
`commons-io:commons-io`	CVE-2024-47554	🚨 HIGH	2.8.0	2.14.0
`dnsjava:dnsjava`	CVE-2024-25638	🚨 HIGH	2.1.7	3.6.0
`io.airlift:aircompressor`	CVE-2025-67721	🚨 HIGH	0.27	2.0.3
`io.netty:netty-codec-http2`	CVE-2025-55163	🚨 HIGH	4.1.96.Final	4.2.4.Final, 4.1.124.Final
`io.netty:netty-codec-http2`	GHSA-xpw8-rcwv-8f8p	🚨 HIGH	4.1.96.Final	4.1.100.Final
`io.netty:netty-handler`	CVE-2025-24970	🚨 HIGH	4.1.96.Final	4.1.118.Final
`net.minidev:json-smart`	CVE-2021-31684	🚨 HIGH	1.3.2	1.3.3, 2.4.4
`net.minidev:json-smart`	CVE-2023-1370	🚨 HIGH	1.3.2	2.4.9
`org.apache.avro:avro`	CVE-2024-47561	🔥 CRITICAL	1.7.7	1.11.4
`org.apache.avro:avro`	CVE-2023-39410	🚨 HIGH	1.7.7	1.11.3
`org.apache.derby:derby`	CVE-2022-46337	🔥 CRITICAL	10.14.2.0	10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0
`org.apache.ivy:ivy`	CVE-2022-46751	🚨 HIGH	2.5.1	2.5.2
`org.apache.mesos:mesos`	CVE-2018-1330	🚨 HIGH	1.4.3	1.6.0
`org.apache.spark:spark-core_2.12`	CVE-2025-54920	🚨 HIGH	3.5.6	3.5.7
`org.apache.thrift:libthrift`	CVE-2019-0205	🚨 HIGH	0.12.0	0.13.0
`org.apache.thrift:libthrift`	CVE-2020-13949	🚨 HIGH	0.12.0	0.14.0
`org.apache.zookeeper:zookeeper`	CVE-2023-44981	🔥 CRITICAL	3.6.3	3.7.2, 3.8.3, 3.9.1
`org.eclipse.jetty:jetty-server`	CVE-2024-13009	🚨 HIGH	9.4.56.v20240826	9.4.57.v20241219
`org.lz4:lz4-java`	CVE-2025-12183	🚨 HIGH	1.8.0	1.8.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Node.js`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Python`

Vulnerabilities (15)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`apache-airflow`	CVE-2025-68438	🚨 HIGH	3.1.5	3.1.6
`apache-airflow`	CVE-2025-68675	🚨 HIGH	3.1.5	3.1.6, 2.11.1
`apache-airflow`	CVE-2026-26929	🚨 HIGH	3.1.5	3.1.8
`apache-airflow`	CVE-2026-28779	🚨 HIGH	3.1.5	3.1.8
`apache-airflow`	CVE-2026-30911	🚨 HIGH	3.1.5	3.1.8
`cryptography`	CVE-2026-26007	🚨 HIGH	42.0.8	46.0.5
`jaraco.context`	CVE-2026-23949	🚨 HIGH	5.3.0	6.1.0
`jaraco.context`	CVE-2026-23949	🚨 HIGH	6.0.1	6.1.0
`pyOpenSSL`	CVE-2026-27459	🚨 HIGH	24.1.0	26.0.0
`starlette`	CVE-2025-62727	🚨 HIGH	0.48.0	0.49.1
`urllib3`	CVE-2025-66418	🚨 HIGH	1.26.20	2.6.0
`urllib3`	CVE-2025-66471	🚨 HIGH	1.26.20	2.6.0
`urllib3`	CVE-2026-21441	🚨 HIGH	1.26.20	2.6.3
`wheel`	CVE-2026-24049	🚨 HIGH	0.45.1	0.46.2
`wheel`	CVE-2026-24049	🚨 HIGH	0.45.1	0.46.2

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/etc/ssl/private/ssl-cert-snakeoil.key`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/extended_sample_data.yaml`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/lineage.yaml`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_data.json`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_data.yaml`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_data_aut.yaml`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_usage.json`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_usage.yaml`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_usage_aut.yaml`

No Vulnerabilities Found

github-actions · 2026-03-26T06:23:46Z

🟡 Playwright Results — all passed (29 flaky)

✅ 3410 passed · ❌ 0 failed · 🟡 29 flaky · ⏭️ 216 skipped

Shard	Passed	Flaky	Skipped
🟡 Shard 1	452	3	2
🟡 Shard 2	602	6	32
🟡 Shard 3	607	7	27
🟡 Shard 4	609	8	47
🟡 Shard 5	586	1	67
🟡 Shard 6	554	4	41

🟡 29 flaky test(s) (passed on retry)

Features/CustomizeDetailPage.spec.ts › Container - customization should work (shard 1, 1 retry)
Pages/AuditLogs.spec.ts › should apply both User and EntityType filters simultaneously (shard 1, 1 retry)
Pages/UserCreationWithPersona.spec.ts › Create user with persona and verify on profile (shard 1, 1 retry)
Features/BulkEditEntity.spec.ts › Glossary (shard 2, 1 retry)
Features/DataQuality/TestCaseImportExportE2eFlow.spec.ts › Admin: Complete export-import-validate flow (shard 2, 1 retry)
Features/DataQuality/TestCaseIncidentPermissions.spec.ts › User with only VIEW cannot PATCH incidents (shard 2, 1 retry)
Features/DataQuality/TestCaseResultPermissions.spec.ts › User with TEST_CASE.EDIT_ALL can see edit action on test case (shard 2, 1 retry)
Features/EntitySummaryPanel.spec.ts › should display summary panel for table (shard 2, 1 retry)
Features/EntitySummaryPanel.spec.ts › should display summary panel for tableColumn (shard 2, 1 retry)
Features/Permissions/GlossaryPermissions.spec.ts › Team-based permissions work correctly (shard 3, 1 retry)
Features/QueryEntity.spec.ts › Query Entity (shard 3, 1 retry)
Features/TestSuiteMultiPipeline.spec.ts › TestSuite multi pipeline support (shard 3, 1 retry)
Flow/CustomizeWidgets.spec.ts › Following Assets Widget (shard 3, 2 retries)
Flow/ExploreDiscovery.spec.ts › Should display deleted assets when showDeleted is checked and deleted is not present in queryFilter (shard 3, 1 retry)
Flow/PersonaFlow.spec.ts › Set default persona for team should work properly (shard 3, 1 retry)
Flow/SchemaTable.spec.ts › schema table test (shard 3, 1 retry)
Pages/Customproperties-part2.spec.ts › entityReferenceList shows item count, scrollable list, no expand toggle (shard 4, 1 retry)
Pages/DataContractInheritance.spec.ts › Full Contract Inheritance - Asset inherits full contract from Data Product (shard 4, 1 retry)
Pages/DataContracts.spec.ts › Create Data Contract and validate for SearchIndex (shard 4, 1 retry)
Pages/Domains.spec.ts › Verify domain tags and glossary terms (shard 4, 1 retry)
Pages/Domains.spec.ts › Rename domain with subdomains attached verifies subdomain accessibility (shard 4, 1 retry)
Pages/Domains.spec.ts › Rename domain with tags and glossary terms preserves associations (shard 4, 1 retry)
Pages/Domains.spec.ts › Domain Rbac (shard 4, 1 retry)
Pages/Entity.spec.ts › Glossary Term Add, Update and Remove (shard 4, 1 retry)
Pages/EntityDataSteward.spec.ts › User as Owner Add, Update and Remove (shard 5, 1 retry)
Pages/Glossary.spec.ts › Column dropdown drag-and-drop functionality for Glossary Terms table (shard 6, 1 retry)
Pages/Users.spec.ts › Permissions for table details page for Data Consumer (shard 6, 1 retry)
Pages/Users.spec.ts › Check permissions for Data Steward (shard 6, 1 retry)
VersionPages/EntityVersionPages.spec.ts › Directory (shard 6, 1 retry)

📦 Download artifacts

How to debug locally

# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

ulixius9

@varun-lakhyani can you add unit tests for this

varun-lakhyani · 2026-03-26T12:57:46Z

@varun-lakhyani can you add unit tests for this

Added now

ingestion/src/metadata/ingestion/source/database/databricks/client.py

ingestion/src/metadata/ingestion/source/pipeline/databrickspipeline/metadata.py

varun-lakhyani · 2026-03-26T15:54:35Z

💡 Bug: for/else misuse: debug log triggers on normal loop completion
In yield_pipeline_lineage_details (metadata.py line 1331), there is a for/else construct on the for table_lineage in table_lineage_list loop. The else block logs "No source or target table full name found for {entity_id}", but Python's for/else triggers the else when the loop completes without a break — i.e., on every normal iteration through all items. This means the debug message fires after successfully processing all lineage entries, which is misleading. This is a pre-existing pattern, but the diff changes the variable used (entity_id replacing job_id), so worth noting.

Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion

@gitar-bot Fixed in 3b8731d - the for/else block has been removed entirely.

gitar-bot · 2026-03-26T15:56:00Z

Thanks for confirming! The for/else block was removed in 3b8731d, so the misleading debug log no longer fires on normal loop completion.

varun-lakhyani · 2026-03-30T04:15:11Z

Reviewing your code

Code Review ⚠️ Changes requested 3 resolved / 6 findings
Adds Databricks lineage pipeline support with some incomplete fixes: certification tags are silently lost during CSV import due to overly broad tag deletion, pagination may stop early if API responses lack total_entries field, and DAG detail building logic is duplicated across AirflowApiClient and MWAAClient.

⚠️ Bug: CSV import silently loses certification tags from tag_usage
updateTagsForImport() calls deleteTagsByTarget(fqn) which deletes ALL tags for the entity, including certification tags. When updateCertification() runs later, it compares the in-memory original and updated certification objects — if they are equal (certification unchanged), it returns without re-inserting the cert tag. The net result is that any entity with an existing certification will silently lose it during CSV bulk import.

The non-import updateTags() correctly filters out certification tags from addedTags/deletedTags (line 6767-6771), but updateTagsForImport() has no such guard because it does a blanket DELETE + re-INSERT.

To fix: either modify updateTagsForImport to preserve cert tags by using a more targeted delete that excludes certification prefix, or ensure updateCertification() always re-applies when the entity has a certification (regardless of whether it changed).

Suggested fix
protected void updateTagsForImport(
    String fqn, String fieldName, List<TagLabel> origTags, List<TagLabel> updatedTags) {
  origTags = listOrEmpty(origTags);
  updatedTags = Optional.ofNullable(updatedTags).orElse(new ArrayList<>());
  if (origTags.isEmpty() && updatedTags.isEmpty()) {
    return;
  }
  // Delete all tags except certification (handled by updateCertification)
  String certClassification = getCertificationClassification();
  if (certClassification != null) {
    daoCollection.tagUsageDAO().deleteNonCertTagsByTarget(fqn, certClassification);
  } else {
    daoCollection.tagUsageDAO().deleteTagsByTarget(fqn);
  }
  // ... rest unchanged
💡 Bug: Pagination stops early if API omits total_entries field
In both AirflowApiClient._paginate() and MWAAClient._paginate(), total is initialized to limit (e.g. 100). If the API response does not include total_entries, the fallback total = response.get("total_entries", len(result)) sets total to the number of items seen so far. When the first page is full (100 items), the offset advances to 100 and the loop exits, potentially missing remaining pages.

Airflow's stable API always returns total_entries, so this is mainly a risk with future API changes or MWAA quirks. Consider using a sentinel or checking if len(page) == limit as an additional continuation condition.

💡 Quality: Duplicate DAG detail building logic in AirflowApiClient and MWAAClient
build_dag_details(), get_dag_runs(), get_task_instances_for_run(), _paginate(), and tag-parsing logic are nearly identical between AirflowApiClient and MWAAClient. This duplication makes it easy for bug fixes to be applied in one place but not the other (the pagination issue above is a good example). Consider extracting shared parsing/model-building logic into a common base class or utility module, keeping only the HTTP transport layer different.

✅ 3 resolved
✅ Quality: Incomplete rename: query constants and alias still say 'job'

📄 ingestion/src/metadata/ingestion/source/database/databricks/queries.py:90 📄 ingestion/src/metadata/ingestion/source/database/databricks/queries.py:92 📄 ingestion/src/metadata/ingestion/source/database/databricks/queries.py:103 📄 ingestion/src/metadata/ingestion/source/database/databricks/queries.py:105 📄 ingestion/src/metadata/ingestion/source/database/databricks/client.py:351 📄 ingestion/src/metadata/ingestion/source/database/databricks/client.py:379
The renaming from job-specific to entity-generic was applied to the client's instance variables and method parameters, but the SQL query constants are still named DATABRICKS_GET_TABLE_LINEAGE_FOR_JOB / DATABRICKS_GET_COLUMN_LINEAGE_FOR_JOB, and the column alias is still entity_id AS job_id. The cache_lineage method also still accesses row.job_id (line 351, 379). While this works functionally (because the alias ensures the attribute name), it's inconsistent with the rest of the renaming effort and will confuse future readers who see 'job' in the query but 'entity' in the client code.

✅ Bug: Log message still says "all jobs" instead of "all entities"

📄 ingestion/src/metadata/ingestion/source/database/databricks/client.py:289
At line 289 of client.py, the log message reads "performing bulk lineage fetch for all jobs" but this method now fetches lineage for both jobs and pipelines. This is a minor inconsistency with the rename from job-specific to entity-generic terminology applied elsewhere in the PR.

✅ Bug: for/else misuse: debug log triggers on normal loop completion

📄 ingestion/src/metadata/ingestion/source/pipeline/databrickspipeline/metadata.py:1331-1334
In yield_pipeline_lineage_details (metadata.py line 1331), there is a for/else construct on the for table_lineage in table_lineage_list loop. The else block logs "No source or target table full name found for {entity_id}", but Python's for/else triggers the else when the loop completes without a break — i.e., on every normal iteration through all items. This means the debug message fires after successfully processing all lineage entries, which is misleading. This is a pre-existing pattern, but the diff changes the variable used (entity_id replacing job_id), so worth noting.

🤖 Prompt for agents
Code Review: Adds Databricks lineage pipeline support with some incomplete fixes: certification tags are silently lost during CSV import due to overly broad tag deletion, pagination may stop early if API responses lack total_entries field, and DAG detail building logic is duplicated across AirflowApiClient and MWAAClient.

1. ⚠️ Bug: CSV import silently loses certification tags from tag_usage

   `updateTagsForImport()` calls `deleteTagsByTarget(fqn)` which deletes ALL tags for the entity, including certification tags. When `updateCertification()` runs later, it compares the in-memory original and updated certification objects — if they are equal (certification unchanged), it returns without re-inserting the cert tag. The net result is that any entity with an existing certification will silently lose it during CSV bulk import.
   
   The non-import `updateTags()` correctly filters out certification tags from addedTags/deletedTags (line 6767-6771), but `updateTagsForImport()` has no such guard because it does a blanket DELETE + re-INSERT.
   
   To fix: either modify `updateTagsForImport` to preserve cert tags by using a more targeted delete that excludes certification prefix, or ensure `updateCertification()` always re-applies when the entity has a certification (regardless of whether it changed).

   Suggested fix:
   protected void updateTagsForImport(
       String fqn, String fieldName, List<TagLabel> origTags, List<TagLabel> updatedTags) {
     origTags = listOrEmpty(origTags);
     updatedTags = Optional.ofNullable(updatedTags).orElse(new ArrayList<>());
     if (origTags.isEmpty() && updatedTags.isEmpty()) {
       return;
     }
     // Delete all tags except certification (handled by updateCertification)
     String certClassification = getCertificationClassification();
     if (certClassification != null) {
       daoCollection.tagUsageDAO().deleteNonCertTagsByTarget(fqn, certClassification);
     } else {
       daoCollection.tagUsageDAO().deleteTagsByTarget(fqn);
     }
     // ... rest unchanged

2. 💡 Bug: Pagination stops early if API omits total_entries field

   In both `AirflowApiClient._paginate()` and `MWAAClient._paginate()`, `total` is initialized to `limit` (e.g. 100). If the API response does not include `total_entries`, the fallback `total = response.get("total_entries", len(result))` sets total to the number of items seen so far. When the first page is full (100 items), the offset advances to 100 and the loop exits, potentially missing remaining pages.
   
   Airflow's stable API always returns `total_entries`, so this is mainly a risk with future API changes or MWAA quirks. Consider using a sentinel or checking if `len(page) == limit` as an additional continuation condition.

3. 💡 Quality: Duplicate DAG detail building logic in AirflowApiClient and MWAAClient

   `build_dag_details()`, `get_dag_runs()`, `get_task_instances_for_run()`, `_paginate()`, and tag-parsing logic are nearly identical between `AirflowApiClient` and `MWAAClient`. This duplication makes it easy for bug fixes to be applied in one place but not the other (the pagination issue above is a good example). Consider extracting shared parsing/model-building logic into a common base class or utility module, keeping only the HTTP transport layer different.
Options
Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         
Was this helpful? React with 👍 / 👎 | Gitar

@gitar-bot All three findings are false positives: AirflowApiClient/MWAAClient don't exist in this codebase, total_entries appears nowhere in the code, and certification tags are stored in entity JSON (not tag_usage), so deleteTagsByTarget doesn't affect them

varun-lakhyani · 2026-03-30T08:54:40Z

Code Review ⚠️ Changes requested 3 resolved / 8 findings
Adds Databricks lineage pipeline support but has three open issues: CSV import silently loses certification tags due to deleteTagsByTarget() removing all tags, pagination stops early when API omits total_entries field, and duplicate DAG detail building logic across AirflowApiClient and MWAAClient. Two items were resolved (incomplete rename and log message).

⚠️ Bug: CSV import silently loses certification tags from tag_usage
updateTagsForImport() calls deleteTagsByTarget(fqn) which deletes ALL tags for the entity, including certification tags. When updateCertification() runs later, it compares the in-memory original and updated certification objects — if they are equal (certification unchanged), it returns without re-inserting the cert tag. The net result is that any entity with an existing certification will silently lose it during CSV bulk import.

The non-import updateTags() correctly filters out certification tags from addedTags/deletedTags (line 6767-6771), but updateTagsForImport() has no such guard because it does a blanket DELETE + re-INSERT.

To fix: either modify updateTagsForImport to preserve cert tags by using a more targeted delete that excludes certification prefix, or ensure updateCertification() always re-applies when the entity has a certification (regardless of whether it changed).

Suggested fix
protected void updateTagsForImport(
    String fqn, String fieldName, List<TagLabel> origTags, List<TagLabel> updatedTags) {
  origTags = listOrEmpty(origTags);
  updatedTags = Optional.ofNullable(updatedTags).orElse(new ArrayList<>());
  if (origTags.isEmpty() && updatedTags.isEmpty()) {
    return;
  }
  // Delete all tags except certification (handled by updateCertification)
  String certClassification = getCertificationClassification();
  if (certClassification != null) {
    daoCollection.tagUsageDAO().deleteNonCertTagsByTarget(fqn, certClassification);
  } else {
    daoCollection.tagUsageDAO().deleteTagsByTarget(fqn);
  }
  // ... rest unchanged
⚠️ Bug: SLF4J migration drops exception message from log text
Several String.format → SLF4J conversions have one fewer {} placeholder than arguments. When the last argument is a Throwable, SLF4J treats it as the exception for stack-trace logging and does NOT substitute it into the message. This means the exception's message text is silently dropped from the log line.

Affected locations:

OpenMetadataConnectionBuilder.java:80: message ends with "due to " but has no {} for the exception. Original format: "Could not initialize bot for pipeline [%s] due to [%s]".

v150/MigrationUtil.java:108: "Error updating automator [{}] due to " — missing second {} for ex.

v159/MigrationUtil.java:65: "Error migrating app extension [{}] due to " — same issue.

In each case the exception stack trace is still printed (because SLF4J auto-detects the trailing Throwable), but the human-readable message loses the exception text, making quick log scanning harder.

Suggested fix
// OpenMetadataConnectionBuilder.java:80
LOG.warn(
    "Could not initialize bot for pipeline [{}] due to [{}]",
    ingestionPipeline.getPipelineType(),
    e.getMessage(),
    e);

// v150/MigrationUtil.java:108
LOG.warn("Error updating automator [{}] due to [{}]", row, ex.getMessage(), ex);

// v159/MigrationUtil.java:65
LOG.warn("Error migrating app extension [{}] due to [{}]", row, ex.getMessage(), ex);
💡 Bug: Pagination stops early if API omits total_entries field
In both AirflowApiClient._paginate() and MWAAClient._paginate(), total is initialized to limit (e.g. 100). If the API response does not include total_entries, the fallback total = response.get("total_entries", len(result)) sets total to the number of items seen so far. When the first page is full (100 items), the offset advances to 100 and the loop exits, potentially missing remaining pages.

Airflow's stable API always returns total_entries, so this is mainly a risk with future API changes or MWAA quirks. Consider using a sentinel or checking if len(page) == limit as an additional continuation condition.

💡 Quality: Duplicate DAG detail building logic in AirflowApiClient and MWAAClient
build_dag_details(), get_dag_runs(), get_task_instances_for_run(), _paginate(), and tag-parsing logic are nearly identical between AirflowApiClient and MWAAClient. This duplication makes it easy for bug fixes to be applied in one place but not the other (the pagination issue above is a good example). Consider extracting shared parsing/model-building logic into a common base class or utility module, keeping only the HTTP transport layer different.

💡 Quality: Log messages missing separator before placeholder
Two LOG.error calls concatenate the message text directly with the exception message with no space, colon, or other separator before {}:

"Error in sending invite to User{}" → renders as "Error in sending invite to UserNullPointerException"

"Error in sending mail for reset password{}" → similar.

This was inherited from the original string-concatenation code, but the SLF4J migration was an opportunity to fix it.

Suggested fix
LOG.error("Error in sending invite to User: {}", ex.getMessage());
LOG.error("Error in sending mail for reset password: {}", ex.getMessage());
✅ 3 resolved
✅ Quality: Incomplete rename: query constants and alias still say 'job'

📄 ingestion/src/metadata/ingestion/source/database/databricks/queries.py:90 📄 ingestion/src/metadata/ingestion/source/database/databricks/queries.py:92 📄 ingestion/src/metadata/ingestion/source/database/databricks/queries.py:103 📄 ingestion/src/metadata/ingestion/source/database/databricks/queries.py:105 📄 ingestion/src/metadata/ingestion/source/database/databricks/client.py:351 📄 ingestion/src/metadata/ingestion/source/database/databricks/client.py:379
The renaming from job-specific to entity-generic was applied to the client's instance variables and method parameters, but the SQL query constants are still named DATABRICKS_GET_TABLE_LINEAGE_FOR_JOB / DATABRICKS_GET_COLUMN_LINEAGE_FOR_JOB, and the column alias is still entity_id AS job_id. The cache_lineage method also still accesses row.job_id (line 351, 379). While this works functionally (because the alias ensures the attribute name), it's inconsistent with the rest of the renaming effort and will confuse future readers who see 'job' in the query but 'entity' in the client code.

✅ Bug: Log message still says "all jobs" instead of "all entities"

📄 ingestion/src/metadata/ingestion/source/database/databricks/client.py:289
At line 289 of client.py, the log message reads "performing bulk lineage fetch for all jobs" but this method now fetches lineage for both jobs and pipelines. This is a minor inconsistency with the rename from job-specific to entity-generic terminology applied elsewhere in the PR.

✅ Bug: for/else misuse: debug log triggers on normal loop completion

📄 ingestion/src/metadata/ingestion/source/pipeline/databrickspipeline/metadata.py:1331-1334
In yield_pipeline_lineage_details (metadata.py line 1331), there is a for/else construct on the for table_lineage in table_lineage_list loop. The else block logs "No source or target table full name found for {entity_id}", but Python's for/else triggers the else when the loop completes without a break — i.e., on every normal iteration through all items. This means the debug message fires after successfully processing all lineage entries, which is misleading. This is a pre-existing pattern, but the diff changes the variable used (entity_id replacing job_id), so worth noting.

🤖 Prompt for agents
Code Review: Adds Databricks lineage pipeline support but has three open issues: CSV import silently loses certification tags due to deleteTagsByTarget() removing all tags, pagination stops early when API omits total_entries field, and duplicate DAG detail building logic across AirflowApiClient and MWAAClient. Two items were resolved (incomplete rename and log message).

1. ⚠️ Bug: CSV import silently loses certification tags from tag_usage

   `updateTagsForImport()` calls `deleteTagsByTarget(fqn)` which deletes ALL tags for the entity, including certification tags. When `updateCertification()` runs later, it compares the in-memory original and updated certification objects — if they are equal (certification unchanged), it returns without re-inserting the cert tag. The net result is that any entity with an existing certification will silently lose it during CSV bulk import.
   
   The non-import `updateTags()` correctly filters out certification tags from addedTags/deletedTags (line 6767-6771), but `updateTagsForImport()` has no such guard because it does a blanket DELETE + re-INSERT.
   
   To fix: either modify `updateTagsForImport` to preserve cert tags by using a more targeted delete that excludes certification prefix, or ensure `updateCertification()` always re-applies when the entity has a certification (regardless of whether it changed).

   Suggested fix:
   protected void updateTagsForImport(
       String fqn, String fieldName, List<TagLabel> origTags, List<TagLabel> updatedTags) {
     origTags = listOrEmpty(origTags);
     updatedTags = Optional.ofNullable(updatedTags).orElse(new ArrayList<>());
     if (origTags.isEmpty() && updatedTags.isEmpty()) {
       return;
     }
     // Delete all tags except certification (handled by updateCertification)
     String certClassification = getCertificationClassification();
     if (certClassification != null) {
       daoCollection.tagUsageDAO().deleteNonCertTagsByTarget(fqn, certClassification);
     } else {
       daoCollection.tagUsageDAO().deleteTagsByTarget(fqn);
     }
     // ... rest unchanged

2. 💡 Bug: Pagination stops early if API omits total_entries field

   In both `AirflowApiClient._paginate()` and `MWAAClient._paginate()`, `total` is initialized to `limit` (e.g. 100). If the API response does not include `total_entries`, the fallback `total = response.get("total_entries", len(result))` sets total to the number of items seen so far. When the first page is full (100 items), the offset advances to 100 and the loop exits, potentially missing remaining pages.
   
   Airflow's stable API always returns `total_entries`, so this is mainly a risk with future API changes or MWAA quirks. Consider using a sentinel or checking if `len(page) == limit` as an additional continuation condition.

3. 💡 Quality: Duplicate DAG detail building logic in AirflowApiClient and MWAAClient

   `build_dag_details()`, `get_dag_runs()`, `get_task_instances_for_run()`, `_paginate()`, and tag-parsing logic are nearly identical between `AirflowApiClient` and `MWAAClient`. This duplication makes it easy for bug fixes to be applied in one place but not the other (the pagination issue above is a good example). Consider extracting shared parsing/model-building logic into a common base class or utility module, keeping only the HTTP transport layer different.

4. ⚠️ Bug: SLF4J migration drops exception message from log text

   Several `String.format` → SLF4J conversions have one fewer `{}` placeholder than arguments. When the last argument is a `Throwable`, SLF4J treats it as the exception for stack-trace logging and does NOT substitute it into the message. This means the exception's message text is silently dropped from the log line.
   
   Affected locations:
   - `OpenMetadataConnectionBuilder.java:80`: message ends with `"due to "` but has no `{}` for the exception. Original format: `"Could not initialize bot for pipeline [%s] due to [%s]"`.
   - `v150/MigrationUtil.java:108`: `"Error updating automator [{}] due to "` — missing second `{}` for `ex`.
   - `v159/MigrationUtil.java:65`: `"Error migrating app extension [{}] due to "` — same issue.
   
   In each case the exception stack trace is still printed (because SLF4J auto-detects the trailing Throwable), but the human-readable message loses the exception text, making quick log scanning harder.

   Suggested fix:
   // OpenMetadataConnectionBuilder.java:80
   LOG.warn(
       "Could not initialize bot for pipeline [{}] due to [{}]",
       ingestionPipeline.getPipelineType(),
       e.getMessage(),
       e);
   
   // v150/MigrationUtil.java:108
   LOG.warn("Error updating automator [{}] due to [{}]", row, ex.getMessage(), ex);
   
   // v159/MigrationUtil.java:65
   LOG.warn("Error migrating app extension [{}] due to [{}]", row, ex.getMessage(), ex);

5. 💡 Quality: Log messages missing separator before placeholder

   Two `LOG.error` calls concatenate the message text directly with the exception message with no space, colon, or other separator before `{}`:
   - `"Error in sending invite to User{}"` → renders as `"Error in sending invite to UserNullPointerException"`
   - `"Error in sending mail for reset password{}"` → similar.
   
   This was inherited from the original string-concatenation code, but the SLF4J migration was an opportunity to fix it.

   Suggested fix:
   LOG.error("Error in sending invite to User: {}", ex.getMessage());
   LOG.error("Error in sending mail for reset password: {}", ex.getMessage());
Options
Display: verbose → Showing more information.

Comment with these commands to change:

Verbose
gitar display:compact         
Was this helpful? React with 👍 / 👎 | Gitar

@gitar-bot these are all unrelated issues caused by Update Branch, not pr changes

gitar-bot · 2026-03-30T08:57:21Z

Code Review ⚠️ Changes requested 3 resolved / 8 findings

Databricks lineage pipeline addition addresses the incomplete rename and log message issues, but three critical problems remain: CSV import silently drops certification tags during update, pagination stops early when total_entries is omitted, and SLF4J migration loses exception messages in logs.

⚠️

Bug: CSV import silently loses certification tags from tag_usage

updateTagsForImport() calls deleteTagsByTarget(fqn) which deletes ALL tags for the entity, including certification tags. When updateCertification() runs later, it compares the in-memory original and updated certification objects — if they are equal (certification unchanged), it returns without re-inserting the cert tag. The net result is that any entity with an existing certification will silently lose it during CSV bulk import.

The non-import updateTags() correctly filters out certification tags from addedTags/deletedTags (line 6767-6771), but updateTagsForImport() has no such guard because it does a blanket DELETE + re-INSERT.

To fix: either modify updateTagsForImport to preserve cert tags by using a more targeted delete that excludes certification prefix, or ensure updateCertification() always re-applies when the entity has a certification (regardless of whether it changed).

Suggested fix

protected void updateTagsForImport(
    String fqn, String fieldName, List<TagLabel> origTags, List<TagLabel> updatedTags) {
  origTags = listOrEmpty(origTags);
  updatedTags = Optional.ofNullable(updatedTags).orElse(new ArrayList<>());
  if (origTags.isEmpty() && updatedTags.isEmpty()) {
    return;
  }
  // Delete all tags except certification (handled by updateCertification)
  String certClassification = getCertificationClassification();
  if (certClassification != null) {
    daoCollection.tagUsageDAO().deleteNonCertTagsByTarget(fqn, certClassification);
  } else {
    daoCollection.tagUsageDAO().deleteTagsByTarget(fqn);
  }
  // ... rest unchanged

⚠️

Bug: SLF4J migration drops exception message from log text

Several String.format → SLF4J conversions have one fewer {} placeholder than arguments. When the last argument is a Throwable, SLF4J treats it as the exception for stack-trace logging and does NOT substitute it into the message. This means the exception's message text is silently dropped from the log line.

Affected locations:

OpenMetadataConnectionBuilder.java:80: message ends with "due to " but has no {} for the exception. Original format: "Could not initialize bot for pipeline [%s] due to [%s]".
v150/MigrationUtil.java:108: "Error updating automator [{}] due to " — missing second {} for ex.
v159/MigrationUtil.java:65: "Error migrating app extension [{}] due to " — same issue.

In each case the exception stack trace is still printed (because SLF4J auto-detects the trailing Throwable), but the human-readable message loses the exception text, making quick log scanning harder.

Suggested fix

// OpenMetadataConnectionBuilder.java:80
LOG.warn(
    "Could not initialize bot for pipeline [{}] due to [{}]",
    ingestionPipeline.getPipelineType(),
    e.getMessage(),
    e);

// v150/MigrationUtil.java:108
LOG.warn("Error updating automator [{}] due to [{}]", row, ex.getMessage(), ex);

// v159/MigrationUtil.java:65
LOG.warn("Error migrating app extension [{}] due to [{}]", row, ex.getMessage(), ex);

💡 Bug: Pagination stops early if API omits total_entries field

In both AirflowApiClient._paginate() and MWAAClient._paginate(), total is initialized to limit (e.g. 100). If the API response does not include total_entries, the fallback total = response.get("total_entries", len(result)) sets total to the number of items seen so far. When the first page is full (100 items), the offset advances to 100 and the loop exits, potentially missing remaining pages.

Airflow's stable API always returns total_entries, so this is mainly a risk with future API changes or MWAA quirks. Consider using a sentinel or checking if len(page) == limit as an additional continuation condition.

💡 Quality: Duplicate DAG detail building logic in AirflowApiClient and MWAAClient

build_dag_details(), get_dag_runs(), get_task_instances_for_run(), _paginate(), and tag-parsing logic are nearly identical between AirflowApiClient and MWAAClient. This duplication makes it easy for bug fixes to be applied in one place but not the other (the pagination issue above is a good example). Consider extracting shared parsing/model-building logic into a common base class or utility module, keeping only the HTTP transport layer different.

💡 Quality: Log messages missing separator before placeholder

Two LOG.error calls concatenate the message text directly with the exception message with no space, colon, or other separator before {}:

"Error in sending invite to User{}" → renders as "Error in sending invite to UserNullPointerException"
"Error in sending mail for reset password{}" → similar.

This was inherited from the original string-concatenation code, but the SLF4J migration was an opportunity to fix it.

Suggested fix

LOG.error("Error in sending invite to User: {}", ex.getMessage());
LOG.error("Error in sending mail for reset password: {}", ex.getMessage());

✅ 3 resolved

✅ Quality: Incomplete rename: query constants and alias still say 'job'

📄 ingestion/src/metadata/ingestion/source/database/databricks/queries.py:90 📄 ingestion/src/metadata/ingestion/source/database/databricks/queries.py:92 📄 ingestion/src/metadata/ingestion/source/database/databricks/queries.py:103 📄 ingestion/src/metadata/ingestion/source/database/databricks/queries.py:105 📄 ingestion/src/metadata/ingestion/source/database/databricks/client.py:351 📄 ingestion/src/metadata/ingestion/source/database/databricks/client.py:379
The renaming from job-specific to entity-generic was applied to the client's instance variables and method parameters, but the SQL query constants are still named DATABRICKS_GET_TABLE_LINEAGE_FOR_JOB / DATABRICKS_GET_COLUMN_LINEAGE_FOR_JOB, and the column alias is still entity_id AS job_id. The cache_lineage method also still accesses row.job_id (line 351, 379). While this works functionally (because the alias ensures the attribute name), it's inconsistent with the rest of the renaming effort and will confuse future readers who see 'job' in the query but 'entity' in the client code.

✅ Bug: Log message still says "all jobs" instead of "all entities"

📄 ingestion/src/metadata/ingestion/source/database/databricks/client.py:289
At line 289 of client.py, the log message reads "performing bulk lineage fetch for all jobs" but this method now fetches lineage for both jobs and pipelines. This is a minor inconsistency with the rename from job-specific to entity-generic terminology applied elsewhere in the PR.

✅ Bug: for/else misuse: debug log triggers on normal loop completion

📄 ingestion/src/metadata/ingestion/source/pipeline/databrickspipeline/metadata.py:1331-1334
In yield_pipeline_lineage_details (metadata.py line 1331), there is a for/else construct on the for table_lineage in table_lineage_list loop. The else block logs "No source or target table full name found for {entity_id}", but Python's for/else triggers the else when the loop completes without a break — i.e., on every normal iteration through all items. This means the debug message fires after successfully processing all lineage entries, which is misleading. This is a pre-existing pattern, but the diff changes the variable used (entity_id replacing job_id), so worth noting.

🤖 Prompt for agents

Code Review: Databricks lineage pipeline addition addresses the incomplete rename and log message issues, but three critical problems remain: CSV import silently drops certification tags during update, pagination stops early when total_entries is omitted, and SLF4J migration loses exception messages in logs.

1. ⚠️ Bug: CSV import silently loses certification tags from tag_usage

   `updateTagsForImport()` calls `deleteTagsByTarget(fqn)` which deletes ALL tags for the entity, including certification tags. When `updateCertification()` runs later, it compares the in-memory original and updated certification objects — if they are equal (certification unchanged), it returns without re-inserting the cert tag. The net result is that any entity with an existing certification will silently lose it during CSV bulk import.
   
   The non-import `updateTags()` correctly filters out certification tags from addedTags/deletedTags (line 6767-6771), but `updateTagsForImport()` has no such guard because it does a blanket DELETE + re-INSERT.
   
   To fix: either modify `updateTagsForImport` to preserve cert tags by using a more targeted delete that excludes certification prefix, or ensure `updateCertification()` always re-applies when the entity has a certification (regardless of whether it changed).

   Suggested fix:
   protected void updateTagsForImport(
       String fqn, String fieldName, List<TagLabel> origTags, List<TagLabel> updatedTags) {
     origTags = listOrEmpty(origTags);
     updatedTags = Optional.ofNullable(updatedTags).orElse(new ArrayList<>());
     if (origTags.isEmpty() && updatedTags.isEmpty()) {
       return;
     }
     // Delete all tags except certification (handled by updateCertification)
     String certClassification = getCertificationClassification();
     if (certClassification != null) {
       daoCollection.tagUsageDAO().deleteNonCertTagsByTarget(fqn, certClassification);
     } else {
       daoCollection.tagUsageDAO().deleteTagsByTarget(fqn);
     }
     // ... rest unchanged

2. 💡 Bug: Pagination stops early if API omits total_entries field

   In both `AirflowApiClient._paginate()` and `MWAAClient._paginate()`, `total` is initialized to `limit` (e.g. 100). If the API response does not include `total_entries`, the fallback `total = response.get("total_entries", len(result))` sets total to the number of items seen so far. When the first page is full (100 items), the offset advances to 100 and the loop exits, potentially missing remaining pages.
   
   Airflow's stable API always returns `total_entries`, so this is mainly a risk with future API changes or MWAA quirks. Consider using a sentinel or checking if `len(page) == limit` as an additional continuation condition.

3. 💡 Quality: Duplicate DAG detail building logic in AirflowApiClient and MWAAClient

   `build_dag_details()`, `get_dag_runs()`, `get_task_instances_for_run()`, `_paginate()`, and tag-parsing logic are nearly identical between `AirflowApiClient` and `MWAAClient`. This duplication makes it easy for bug fixes to be applied in one place but not the other (the pagination issue above is a good example). Consider extracting shared parsing/model-building logic into a common base class or utility module, keeping only the HTTP transport layer different.

4. ⚠️ Bug: SLF4J migration drops exception message from log text

   Several `String.format` → SLF4J conversions have one fewer `{}` placeholder than arguments. When the last argument is a `Throwable`, SLF4J treats it as the exception for stack-trace logging and does NOT substitute it into the message. This means the exception's message text is silently dropped from the log line.
   
   Affected locations:
   - `OpenMetadataConnectionBuilder.java:80`: message ends with `"due to "` but has no `{}` for the exception. Original format: `"Could not initialize bot for pipeline [%s] due to [%s]"`.
   - `v150/MigrationUtil.java:108`: `"Error updating automator [{}] due to "` — missing second `{}` for `ex`.
   - `v159/MigrationUtil.java:65`: `"Error migrating app extension [{}] due to "` — same issue.
   
   In each case the exception stack trace is still printed (because SLF4J auto-detects the trailing Throwable), but the human-readable message loses the exception text, making quick log scanning harder.

   Suggested fix:
   // OpenMetadataConnectionBuilder.java:80
   LOG.warn(
       "Could not initialize bot for pipeline [{}] due to [{}]",
       ingestionPipeline.getPipelineType(),
       e.getMessage(),
       e);
   
   // v150/MigrationUtil.java:108
   LOG.warn("Error updating automator [{}] due to [{}]", row, ex.getMessage(), ex);
   
   // v159/MigrationUtil.java:65
   LOG.warn("Error migrating app extension [{}] due to [{}]", row, ex.getMessage(), ex);

5. 💡 Quality: Log messages missing separator before placeholder

   Two `LOG.error` calls concatenate the message text directly with the exception message with no space, colon, or other separator before `{}`:
   - `"Error in sending invite to User{}"` → renders as `"Error in sending invite to UserNullPointerException"`
   - `"Error in sending mail for reset password{}"` → similar.
   
   This was inherited from the original string-concatenation code, but the SLF4J migration was an opportunity to fix it.

   Suggested fix:
   LOG.error("Error in sending invite to User: {}", ex.getMessage());
   LOG.error("Error in sending mail for reset password: {}", ex.getMessage());

Options

Display: compact → Showing less information.

Comment with these commands to change:

`Compact`
`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

sonarqubecloud · 2026-03-30T09:55:44Z

Quality Gate passed for 'open-metadata-ingestion'

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
44.4% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

added-pipeline-lineage

436e9c5

varun-lakhyani requested a review from a team as a code owner March 26, 2026 04:26

gitar-bot bot reviewed Mar 26, 2026

View reviewed changes

ingestion/src/metadata/ingestion/source/database/databricks/client.py Outdated Show resolved Hide resolved

varun-lakhyani added Ingestion safe to test Add this label to run secure Github workflows on PRs labels Mar 26, 2026

minor rename

4159db1

varun-lakhyani had a problem deploying to test March 26, 2026 04:38 — with GitHub Actions Error

varun-lakhyani temporarily deployed to test March 26, 2026 04:50 — with GitHub Actions Inactive

ulixius9 reviewed Mar 26, 2026

View reviewed changes

unit-test

042539d

gitar-bot bot reviewed Mar 26, 2026

View reviewed changes

ingestion/src/metadata/ingestion/source/database/databricks/client.py Outdated Show resolved Hide resolved

gitar-bot bot reviewed Mar 26, 2026

View reviewed changes

ingestion/src/metadata/ingestion/source/pipeline/databrickspipeline/metadata.py Outdated Show resolved Hide resolved

varun-lakhyani had a problem deploying to test March 26, 2026 13:07 — with GitHub Actions Error

varun-lakhyani temporarily deployed to test March 26, 2026 15:52 — with GitHub Actions Inactive

varun-lakhyani requested a review from ulixius9 March 27, 2026 02:14

Merge branch 'main' into databricks-lineage-pipeline

6f43475

varun-lakhyani temporarily deployed to test March 30, 2026 02:28 — with GitHub Actions Inactive

Merge branch 'main' into databricks-lineage-pipeline

725da53

varun-lakhyani temporarily deployed to test March 30, 2026 04:22 — with GitHub Actions Inactive

varun-lakhyani had a problem deploying to test March 30, 2026 04:22 — with GitHub Actions Failure

varun-lakhyani temporarily deployed to test March 30, 2026 04:22 — with GitHub Actions Inactive

varun-lakhyani temporarily deployed to test March 30, 2026 06:26 — with GitHub Actions Inactive

Merge branch 'main' into databricks-lineage-pipeline

77b9fda

varun-lakhyani temporarily deployed to test March 30, 2026 09:05 — with GitHub Actions Inactive

Conversation

varun-lakhyani commented Mar 26, 2026 • edited by gitar-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe your changes:

Type of change:

Checklist:

Summary by Gitar

Uh oh!

github-actions bot commented Mar 26, 2026

Uh oh!

Uh oh!

github-actions bot commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🛡️ TRIVY SCAN RESULT 🛡️

Target: openmetadata-ingestion:trivy (debian 12.12)

Vulnerabilities (4)

🛡️ TRIVY SCAN RESULT 🛡️

Target: Java

Vulnerabilities (39)

🛡️ TRIVY SCAN RESULT 🛡️

Target: Node.js

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Python

Vulnerabilities (33)

🛡️ TRIVY SCAN RESULT 🛡️

Target: usr/bin/docker

Vulnerabilities (4)

🛡️ TRIVY SCAN RESULT 🛡️

Target: /etc/ssl/private/ssl-cert-snakeoil.key

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /home/airflow/openmetadata-airflow-apis/openmetadata_managed_apis.egg-info/PKG-INFO

No Vulnerabilities Found

Uh oh!

github-actions bot commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🛡️ TRIVY SCAN RESULT 🛡️

Target: openmetadata-ingestion-base-slim:trivy (debian 12.13)

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Java

Vulnerabilities (38)

🛡️ TRIVY SCAN RESULT 🛡️

Target: Node.js

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Python

Vulnerabilities (15)

🛡️ TRIVY SCAN RESULT 🛡️

Target: /etc/ssl/private/ssl-cert-snakeoil.key

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/extended_sample_data.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/lineage.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data.json

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data_aut.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage.json

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage_aut.yaml

No Vulnerabilities Found

Uh oh!

github-actions bot commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🟡 Playwright Results — all passed (29 flaky)

varun-lakhyani commented Mar 26, 2026 •

edited by gitar-bot bot

Loading

github-actions bot commented Mar 26, 2026 •

edited

Loading

Target: `openmetadata-ingestion:trivy (debian 12.12)`

Target: `Java`

Target: `Node.js`

Target: `Python`

Target: `usr/bin/docker`

Target: `/etc/ssl/private/ssl-cert-snakeoil.key`

Target: `/home/airflow/openmetadata-airflow-apis/openmetadata_managed_apis.egg-info/PKG-INFO`

github-actions bot commented Mar 26, 2026 •

edited

Loading

Target: `openmetadata-ingestion-base-slim:trivy (debian 12.13)`

Target: `Java`

Target: `Node.js`

Target: `Python`

Target: `/etc/ssl/private/ssl-cert-snakeoil.key`

Target: `/ingestion/pipelines/extended_sample_data.yaml`

Target: `/ingestion/pipelines/lineage.yaml`

Target: `/ingestion/pipelines/sample_data.json`

Target: `/ingestion/pipelines/sample_data.yaml`

Target: `/ingestion/pipelines/sample_data_aut.yaml`

Target: `/ingestion/pipelines/sample_usage.json`

Target: `/ingestion/pipelines/sample_usage.yaml`

Target: `/ingestion/pipelines/sample_usage_aut.yaml`

github-actions bot commented Mar 26, 2026 •

edited

Loading

gitar-bot bot commented Mar 30, 2026 •

edited

Loading