Skip to content

Fixes #25933 added-pipeline-lineage#26786

Open
varun-lakhyani wants to merge 5 commits intoopen-metadata:mainfrom
varun-lakhyani:databricks-lineage-pipeline
Open

Fixes #25933 added-pipeline-lineage#26786
varun-lakhyani wants to merge 5 commits intoopen-metadata:mainfrom
varun-lakhyani:databricks-lineage-pipeline

Conversation

@varun-lakhyani
Copy link
Member

@varun-lakhyani varun-lakhyani commented Mar 26, 2026

Describe your changes:

Fixes #25933

Earlier only jobs where there in ingestion and lineage then pipeline were added in ingestion, this extends pipeline to lineage and name factoring from Job to Entity as now it includes both jobs and pipelines

I worked on ... because ...

Type of change:

  • Bug fix
  • Improvement
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

Summary by Gitar

  • Refactoring for pipeline support:
    • Renamed query constants and variables from job-specific to entity-generic (DATABRICKS_GET_TABLE_LINEAGE, entity_table_lineage, etc.)
    • Updated SQL queries to include both JOB and PIPELINE entity types in lineage filters
  • Pipeline lineage handling:
    • Modified pipeline metadata source to extract entity ID from either job or pipeline, enabling lineage for DLT pipelines
  • Testing:
    • Added unit test for DLT pipeline lineage with table and column lineage scenarios

This will update automatically on new commits.

@varun-lakhyani varun-lakhyani requested a review from a team as a code owner March 26, 2026 04:26
@github-actions
Copy link
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@varun-lakhyani varun-lakhyani added Ingestion safe to test Add this label to run secure Github workflows on PRs labels Mar 26, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Mar 26, 2026

🛡️ TRIVY SCAN RESULT 🛡️

Target: openmetadata-ingestion:trivy (debian 12.12)

Vulnerabilities (4)

Package Vulnerability ID Severity Installed Version Fixed Version
libpam-modules CVE-2025-6020 🚨 HIGH 1.5.2-6+deb12u1 1.5.2-6+deb12u2
libpam-modules-bin CVE-2025-6020 🚨 HIGH 1.5.2-6+deb12u1 1.5.2-6+deb12u2
libpam-runtime CVE-2025-6020 🚨 HIGH 1.5.2-6+deb12u1 1.5.2-6+deb12u2
libpam0g CVE-2025-6020 🚨 HIGH 1.5.2-6+deb12u1 1.5.2-6+deb12u2

🛡️ TRIVY SCAN RESULT 🛡️

Target: Java

Vulnerabilities (39)

Package Vulnerability ID Severity Installed Version Fixed Version
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.12.7 2.15.0
com.fasterxml.jackson.core:jackson-core GHSA-72hv-8253-57qq 🚨 HIGH 2.12.7 2.18.6, 2.21.1, 3.1.0
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.13.4 2.15.0
com.fasterxml.jackson.core:jackson-core GHSA-72hv-8253-57qq 🚨 HIGH 2.13.4 2.18.6, 2.21.1, 3.1.0
com.fasterxml.jackson.core:jackson-core GHSA-72hv-8253-57qq 🚨 HIGH 2.15.2 2.18.6, 2.21.1, 3.1.0
com.fasterxml.jackson.core:jackson-core GHSA-72hv-8253-57qq 🚨 HIGH 2.16.1 2.18.6, 2.21.1, 3.1.0
com.fasterxml.jackson.core:jackson-databind CVE-2022-42003 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4.2
com.fasterxml.jackson.core:jackson-databind CVE-2022-42004 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4
com.google.code.gson:gson CVE-2022-25647 🚨 HIGH 2.2.4 2.8.9
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.3.0 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.3.0 3.25.5, 4.27.5, 4.28.2
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.7.1 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.7.1 3.25.5, 4.27.5, 4.28.2
com.nimbusds:nimbus-jose-jwt CVE-2023-52428 🚨 HIGH 9.8.1 9.37.2
com.squareup.okhttp3:okhttp CVE-2021-0341 🚨 HIGH 3.12.12 4.9.2
commons-beanutils:commons-beanutils CVE-2025-48734 🚨 HIGH 1.9.4 1.11.0
commons-io:commons-io CVE-2024-47554 🚨 HIGH 2.8.0 2.14.0
dnsjava:dnsjava CVE-2024-25638 🚨 HIGH 2.1.7 3.6.0
io.airlift:aircompressor CVE-2025-67721 🚨 HIGH 0.27 2.0.3
io.netty:netty-codec-http2 CVE-2025-55163 🚨 HIGH 4.1.96.Final 4.2.4.Final, 4.1.124.Final
io.netty:netty-codec-http2 GHSA-xpw8-rcwv-8f8p 🚨 HIGH 4.1.96.Final 4.1.100.Final
io.netty:netty-handler CVE-2025-24970 🚨 HIGH 4.1.96.Final 4.1.118.Final
net.minidev:json-smart CVE-2021-31684 🚨 HIGH 1.3.2 1.3.3, 2.4.4
net.minidev:json-smart CVE-2023-1370 🚨 HIGH 1.3.2 2.4.9
org.apache.avro:avro CVE-2024-47561 🔥 CRITICAL 1.7.7 1.11.4
org.apache.avro:avro CVE-2023-39410 🚨 HIGH 1.7.7 1.11.3
org.apache.derby:derby CVE-2022-46337 🔥 CRITICAL 10.14.2.0 10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0
org.apache.ivy:ivy CVE-2022-46751 🚨 HIGH 2.5.1 2.5.2
org.apache.mesos:mesos CVE-2018-1330 🚨 HIGH 1.4.3 1.6.0
org.apache.spark:spark-core_2.12 CVE-2025-54920 🚨 HIGH 3.5.6 3.5.7
org.apache.thrift:libthrift CVE-2019-0205 🚨 HIGH 0.12.0 0.13.0
org.apache.thrift:libthrift CVE-2020-13949 🚨 HIGH 0.12.0 0.14.0
org.apache.zookeeper:zookeeper CVE-2023-44981 🔥 CRITICAL 3.6.3 3.7.2, 3.8.3, 3.9.1
org.eclipse.jetty:jetty-server CVE-2024-13009 🚨 HIGH 9.4.56.v20240826 9.4.57.v20241219
org.lz4:lz4-java CVE-2025-12183 🚨 HIGH 1.8.0 1.8.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: Node.js

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Python

Vulnerabilities (33)

Package Vulnerability ID Severity Installed Version Fixed Version
Authlib CVE-2026-27962 🔥 CRITICAL 1.6.6 1.6.9
Authlib CVE-2026-28490 🚨 HIGH 1.6.6 1.6.9
Authlib CVE-2026-28498 🚨 HIGH 1.6.6 1.6.9
Authlib CVE-2026-28802 🚨 HIGH 1.6.6 1.6.7
PyJWT CVE-2026-32597 🚨 HIGH 2.10.1 2.12.0
Werkzeug CVE-2024-34069 🚨 HIGH 2.2.3 3.0.3
aiohttp CVE-2025-69223 🚨 HIGH 3.12.12 3.13.3
aiohttp CVE-2025-69223 🚨 HIGH 3.13.2 3.13.3
apache-airflow CVE-2025-68438 🚨 HIGH 3.1.5 3.1.6
apache-airflow CVE-2025-68675 🚨 HIGH 3.1.5 3.1.6, 2.11.1
apache-airflow CVE-2026-26929 🚨 HIGH 3.1.5 3.1.8
apache-airflow CVE-2026-28779 🚨 HIGH 3.1.5 3.1.8
apache-airflow CVE-2026-30911 🚨 HIGH 3.1.5 3.1.8
apache-airflow-providers-http CVE-2025-69219 🚨 HIGH 5.6.0 6.0.0
azure-core CVE-2026-21226 🚨 HIGH 1.37.0 1.38.0
cryptography CVE-2026-26007 🚨 HIGH 42.0.8 46.0.5
google-cloud-aiplatform CVE-2026-2472 🚨 HIGH 1.130.0 1.131.0
google-cloud-aiplatform CVE-2026-2473 🚨 HIGH 1.130.0 1.133.0
jaraco.context CVE-2026-23949 🚨 HIGH 5.3.0 6.1.0
jaraco.context CVE-2026-23949 🚨 HIGH 6.0.1 6.1.0
protobuf CVE-2026-0994 🚨 HIGH 4.25.8 6.33.5, 5.29.6
pyOpenSSL CVE-2026-27459 🚨 HIGH 24.1.0 26.0.0
pyasn1 CVE-2026-23490 🚨 HIGH 0.6.1 0.6.2
pyasn1 CVE-2026-30922 🚨 HIGH 0.6.1 0.6.3
python-multipart CVE-2026-24486 🚨 HIGH 0.0.20 0.0.22
ray CVE-2025-62593 🔥 CRITICAL 2.47.1 2.52.0
starlette CVE-2025-62727 🚨 HIGH 0.48.0 0.49.1
tornado CVE-2026-31958 🚨 HIGH 6.5.3 6.5.5
urllib3 CVE-2025-66418 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2025-66471 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2026-21441 🚨 HIGH 1.26.20 2.6.3
wheel CVE-2026-24049 🚨 HIGH 0.45.1 0.46.2
wheel CVE-2026-24049 🚨 HIGH 0.45.1 0.46.2

🛡️ TRIVY SCAN RESULT 🛡️

Target: usr/bin/docker

Vulnerabilities (4)

Package Vulnerability ID Severity Installed Version Fixed Version
stdlib CVE-2025-68121 🔥 CRITICAL v1.25.5 1.24.13, 1.25.7, 1.26.0-rc.3
stdlib CVE-2025-61726 🚨 HIGH v1.25.5 1.24.12, 1.25.6
stdlib CVE-2025-61728 🚨 HIGH v1.25.5 1.24.12, 1.25.6
stdlib CVE-2026-25679 🚨 HIGH v1.25.5 1.25.8, 1.26.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: /etc/ssl/private/ssl-cert-snakeoil.key

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /home/airflow/openmetadata-airflow-apis/openmetadata_managed_apis.egg-info/PKG-INFO

No Vulnerabilities Found

@github-actions
Copy link
Contributor

github-actions bot commented Mar 26, 2026

🛡️ TRIVY SCAN RESULT 🛡️

Target: openmetadata-ingestion-base-slim:trivy (debian 12.13)

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Java

Vulnerabilities (38)

Package Vulnerability ID Severity Installed Version Fixed Version
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.12.7 2.15.0
com.fasterxml.jackson.core:jackson-core GHSA-72hv-8253-57qq 🚨 HIGH 2.12.7 2.18.6, 2.21.1, 3.1.0
com.fasterxml.jackson.core:jackson-core CVE-2025-52999 🚨 HIGH 2.13.4 2.15.0
com.fasterxml.jackson.core:jackson-core GHSA-72hv-8253-57qq 🚨 HIGH 2.13.4 2.18.6, 2.21.1, 3.1.0
com.fasterxml.jackson.core:jackson-core GHSA-72hv-8253-57qq 🚨 HIGH 2.15.2 2.18.6, 2.21.1, 3.1.0
com.fasterxml.jackson.core:jackson-databind CVE-2022-42003 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4.2
com.fasterxml.jackson.core:jackson-databind CVE-2022-42004 🚨 HIGH 2.12.7 2.12.7.1, 2.13.4
com.google.code.gson:gson CVE-2022-25647 🚨 HIGH 2.2.4 2.8.9
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.3.0 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.3.0 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.3.0 3.25.5, 4.27.5, 4.28.2
com.google.protobuf:protobuf-java CVE-2021-22569 🚨 HIGH 3.7.1 3.16.1, 3.18.2, 3.19.2
com.google.protobuf:protobuf-java CVE-2022-3509 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2022-3510 🚨 HIGH 3.7.1 3.16.3, 3.19.6, 3.20.3, 3.21.7
com.google.protobuf:protobuf-java CVE-2024-7254 🚨 HIGH 3.7.1 3.25.5, 4.27.5, 4.28.2
com.nimbusds:nimbus-jose-jwt CVE-2023-52428 🚨 HIGH 9.8.1 9.37.2
com.squareup.okhttp3:okhttp CVE-2021-0341 🚨 HIGH 3.12.12 4.9.2
commons-beanutils:commons-beanutils CVE-2025-48734 🚨 HIGH 1.9.4 1.11.0
commons-io:commons-io CVE-2024-47554 🚨 HIGH 2.8.0 2.14.0
dnsjava:dnsjava CVE-2024-25638 🚨 HIGH 2.1.7 3.6.0
io.airlift:aircompressor CVE-2025-67721 🚨 HIGH 0.27 2.0.3
io.netty:netty-codec-http2 CVE-2025-55163 🚨 HIGH 4.1.96.Final 4.2.4.Final, 4.1.124.Final
io.netty:netty-codec-http2 GHSA-xpw8-rcwv-8f8p 🚨 HIGH 4.1.96.Final 4.1.100.Final
io.netty:netty-handler CVE-2025-24970 🚨 HIGH 4.1.96.Final 4.1.118.Final
net.minidev:json-smart CVE-2021-31684 🚨 HIGH 1.3.2 1.3.3, 2.4.4
net.minidev:json-smart CVE-2023-1370 🚨 HIGH 1.3.2 2.4.9
org.apache.avro:avro CVE-2024-47561 🔥 CRITICAL 1.7.7 1.11.4
org.apache.avro:avro CVE-2023-39410 🚨 HIGH 1.7.7 1.11.3
org.apache.derby:derby CVE-2022-46337 🔥 CRITICAL 10.14.2.0 10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0
org.apache.ivy:ivy CVE-2022-46751 🚨 HIGH 2.5.1 2.5.2
org.apache.mesos:mesos CVE-2018-1330 🚨 HIGH 1.4.3 1.6.0
org.apache.spark:spark-core_2.12 CVE-2025-54920 🚨 HIGH 3.5.6 3.5.7
org.apache.thrift:libthrift CVE-2019-0205 🚨 HIGH 0.12.0 0.13.0
org.apache.thrift:libthrift CVE-2020-13949 🚨 HIGH 0.12.0 0.14.0
org.apache.zookeeper:zookeeper CVE-2023-44981 🔥 CRITICAL 3.6.3 3.7.2, 3.8.3, 3.9.1
org.eclipse.jetty:jetty-server CVE-2024-13009 🚨 HIGH 9.4.56.v20240826 9.4.57.v20241219
org.lz4:lz4-java CVE-2025-12183 🚨 HIGH 1.8.0 1.8.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: Node.js

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Python

Vulnerabilities (15)

Package Vulnerability ID Severity Installed Version Fixed Version
apache-airflow CVE-2025-68438 🚨 HIGH 3.1.5 3.1.6
apache-airflow CVE-2025-68675 🚨 HIGH 3.1.5 3.1.6, 2.11.1
apache-airflow CVE-2026-26929 🚨 HIGH 3.1.5 3.1.8
apache-airflow CVE-2026-28779 🚨 HIGH 3.1.5 3.1.8
apache-airflow CVE-2026-30911 🚨 HIGH 3.1.5 3.1.8
cryptography CVE-2026-26007 🚨 HIGH 42.0.8 46.0.5
jaraco.context CVE-2026-23949 🚨 HIGH 5.3.0 6.1.0
jaraco.context CVE-2026-23949 🚨 HIGH 6.0.1 6.1.0
pyOpenSSL CVE-2026-27459 🚨 HIGH 24.1.0 26.0.0
starlette CVE-2025-62727 🚨 HIGH 0.48.0 0.49.1
urllib3 CVE-2025-66418 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2025-66471 🚨 HIGH 1.26.20 2.6.0
urllib3 CVE-2026-21441 🚨 HIGH 1.26.20 2.6.3
wheel CVE-2026-24049 🚨 HIGH 0.45.1 0.46.2
wheel CVE-2026-24049 🚨 HIGH 0.45.1 0.46.2

🛡️ TRIVY SCAN RESULT 🛡️

Target: /etc/ssl/private/ssl-cert-snakeoil.key

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/extended_sample_data.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/lineage.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data.json

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data_aut.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage.json

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage_aut.yaml

No Vulnerabilities Found

@github-actions
Copy link
Contributor

github-actions bot commented Mar 26, 2026

🟡 Playwright Results — all passed (19 flaky)

✅ 3397 passed · ❌ 0 failed · 🟡 19 flaky · ⏭️ 216 skipped

Shard Passed Failed Flaky Skipped
🟡 Shard 1 451 0 4 2
🟡 Shard 2 601 0 3 32
🟡 Shard 3 604 0 5 27
🟡 Shard 4 601 0 2 47
🟡 Shard 5 586 0 1 67
🟡 Shard 6 554 0 4 41
🟡 19 flaky test(s) (passed on retry)
  • Features/CustomizeDetailPage.spec.ts › Glossary - customization should work (shard 1, 1 retry)
  • Flow/Metric.spec.ts › Verify Related Metrics Update (shard 1, 1 retry)
  • Flow/Tour.spec.ts › Tour should work from URL directly (shard 1, 1 retry)
  • Pages/UserCreationWithPersona.spec.ts › Create user with persona and verify on profile (shard 1, 1 retry)
  • Features/BulkEditEntity.spec.ts › Glossary (shard 2, 1 retry)
  • Features/ColumnBulkOperations.spec.ts › should keep latest search results when responses arrive out of order (shard 2, 1 retry)
  • Features/Glossary/GlossaryHierarchy.spec.ts › should cancel move operation (shard 2, 1 retry)
  • Features/Permissions/GlossaryPermissions.spec.ts › Team-based permissions work correctly (shard 3, 1 retry)
  • Features/TestSuiteMultiPipeline.spec.ts › TestSuite multi pipeline support (shard 3, 1 retry)
  • Flow/AddRoleAndAssignToUser.spec.ts › Verify assigned role to new user (shard 3, 1 retry)
  • Flow/ExploreDiscovery.spec.ts › Should display deleted assets when showDeleted is checked and deleted is not present in queryFilter (shard 3, 1 retry)
  • Flow/PersonaFlow.spec.ts › Set default persona for team should work properly (shard 3, 1 retry)
  • Pages/Customproperties-part2.spec.ts › entityReferenceList shows item count, scrollable list, no expand toggle (shard 4, 1 retry)
  • Pages/Entity.spec.ts › Glossary Term Add, Update and Remove (shard 4, 2 retries)
  • Pages/EntityDataConsumer.spec.ts › Tier Add, Update and Remove (shard 5, 1 retry)
  • Pages/Users.spec.ts › Permissions for table details page for Data Consumer (shard 6, 1 retry)
  • Pages/Users.spec.ts › Check permissions for Data Steward (shard 6, 1 retry)
  • VersionPages/EntityVersionPages.spec.ts › Directory (shard 6, 1 retry)
  • VersionPages/GlossaryVersionPage.spec.ts › GlossaryTerm (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

Copy link
Member

@ulixius9 ulixius9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@varun-lakhyani can you add unit tests for this

@varun-lakhyani
Copy link
Member Author

@varun-lakhyani can you add unit tests for this

Added now

@varun-lakhyani
Copy link
Member Author

varun-lakhyani commented Mar 26, 2026

Code Review 👍 Approved with suggestions 2 resolved / 3 findings
Adds pipeline lineage support to the Databricks ingestion connector, resolving naming inconsistencies in query constants and log messages. Consider refactoring the for/else loop in yield_pipeline_lineage_details to avoid triggering debug logs on normal completion.

💡 Bug: for/else misuse: debug log triggers on normal loop completion
📄 ingestion/src/metadata/ingestion/source/pipeline/databrickspipeline/metadata.py:1331-1334

In yield_pipeline_lineage_details (metadata.py line 1331), there is a for/else construct on the for table_lineage in table_lineage_list loop. The else block logs "No source or target table full name found for {entity_id}", but Python's for/else triggers the else when the loop completes without a break — i.e., on every normal iteration through all items. This means the debug message fires after successfully processing all lineage entries, which is misleading. This is a pre-existing pattern, but the diff changes the variable used (entity_id replacing job_id), so worth noting.

✅ 2 resolved
✅ Quality: Incomplete rename: query constants and alias still say 'job'

📄 ingestion/src/metadata/ingestion/source/database/databricks/queries.py:90 📄 ingestion/src/metadata/ingestion/source/database/databricks/queries.py:92 📄 ingestion/src/metadata/ingestion/source/database/databricks/queries.py:103 📄 ingestion/src/metadata/ingestion/source/database/databricks/queries.py:105 📄 ingestion/src/metadata/ingestion/source/database/databricks/client.py:351 📄 ingestion/src/metadata/ingestion/source/database/databricks/client.py:379
The renaming from job-specific to entity-generic was applied to the client's instance variables and method parameters, but the SQL query constants are still named DATABRICKS_GET_TABLE_LINEAGE_FOR_JOB / DATABRICKS_GET_COLUMN_LINEAGE_FOR_JOB, and the column alias is still entity_id AS job_id. The cache_lineage method also still accesses row.job_id (line 351, 379). While this works functionally (because the alias ensures the attribute name), it's inconsistent with the rest of the renaming effort and will confuse future readers who see 'job' in the query but 'entity' in the client code.

✅ Bug: Log message still says "all jobs" instead of "all entities"

📄 ingestion/src/metadata/ingestion/source/database/databricks/client.py:289
At line 289 of client.py, the log message reads "performing bulk lineage fetch for all jobs" but this method now fetches lineage for both jobs and pipelines. This is a minor inconsistency with the rename from job-specific to entity-generic terminology applied elsewhere in the PR.

🤖 Prompt for agents

Code Review: Adds pipeline lineage support to the Databricks ingestion connector, resolving naming inconsistencies in query constants and log messages. Consider refactoring the for/else loop in `yield_pipeline_lineage_details` to avoid triggering debug logs on normal completion.

1. 💡 Bug: for/else misuse: debug log triggers on normal loop completion
   Files: ingestion/src/metadata/ingestion/source/pipeline/databrickspipeline/metadata.py:1331-1334

   In `yield_pipeline_lineage_details` (metadata.py line 1331), there is a `for/else` construct on the `for table_lineage in table_lineage_list` loop. The `else` block logs `"No source or target table full name found for {entity_id}"`, but Python's `for/else` triggers the `else` when the loop completes *without* a `break` — i.e., on every normal iteration through all items. This means the debug message fires after successfully processing all lineage entries, which is misleading. This is a pre-existing pattern, but the diff changes the variable used (`entity_id` replacing `job_id`), so worth noting.

Options
Auto-apply is off → Gitar will not commit updates to this branch.Display: compact → Showing less information.

Comment with these commands to change:

Auto-apply Compact

gitar auto-apply:on         
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@gitar-bot all 3 are already resolved with latest commit

@varun-lakhyani
Copy link
Member Author

varun-lakhyani commented Mar 26, 2026

Before code changes: there is no pipeline lineage
Screenshot 2026-03-26 at 8 15 10 AM
After changes: pipeline lineage present
Pipeline lineage

@gitar-bot
Copy link

gitar-bot bot commented Mar 26, 2026

Code Review 👍 Approved with suggestions 2 resolved / 3 findings

Adds pipeline lineage extraction for Databricks ingestion, resolving incomplete naming and logging issues. Consider refactoring the debug log in yield_pipeline_lineage_details to avoid triggering on normal loop completion.

💡 Bug: for/else misuse: debug log triggers on normal loop completion

📄 ingestion/src/metadata/ingestion/source/pipeline/databrickspipeline/metadata.py:1331-1334

In yield_pipeline_lineage_details (metadata.py line 1331), there is a for/else construct on the for table_lineage in table_lineage_list loop. The else block logs "No source or target table full name found for {entity_id}", but Python's for/else triggers the else when the loop completes without a break — i.e., on every normal iteration through all items. This means the debug message fires after successfully processing all lineage entries, which is misleading. This is a pre-existing pattern, but the diff changes the variable used (entity_id replacing job_id), so worth noting.

✅ 2 resolved
Quality: Incomplete rename: query constants and alias still say 'job'

📄 ingestion/src/metadata/ingestion/source/database/databricks/queries.py:90 📄 ingestion/src/metadata/ingestion/source/database/databricks/queries.py:92 📄 ingestion/src/metadata/ingestion/source/database/databricks/queries.py:103 📄 ingestion/src/metadata/ingestion/source/database/databricks/queries.py:105 📄 ingestion/src/metadata/ingestion/source/database/databricks/client.py:351 📄 ingestion/src/metadata/ingestion/source/database/databricks/client.py:379
The renaming from job-specific to entity-generic was applied to the client's instance variables and method parameters, but the SQL query constants are still named DATABRICKS_GET_TABLE_LINEAGE_FOR_JOB / DATABRICKS_GET_COLUMN_LINEAGE_FOR_JOB, and the column alias is still entity_id AS job_id. The cache_lineage method also still accesses row.job_id (line 351, 379). While this works functionally (because the alias ensures the attribute name), it's inconsistent with the rest of the renaming effort and will confuse future readers who see 'job' in the query but 'entity' in the client code.

Bug: Log message still says "all jobs" instead of "all entities"

📄 ingestion/src/metadata/ingestion/source/database/databricks/client.py:289
At line 289 of client.py, the log message reads "performing bulk lineage fetch for all jobs" but this method now fetches lineage for both jobs and pipelines. This is a minor inconsistency with the rename from job-specific to entity-generic terminology applied elsewhere in the PR.

🤖 Prompt for agents
Code Review: Adds pipeline lineage extraction for Databricks ingestion, resolving incomplete naming and logging issues. Consider refactoring the debug log in `yield_pipeline_lineage_details` to avoid triggering on normal loop completion.

1. 💡 Bug: for/else misuse: debug log triggers on normal loop completion
   Files: ingestion/src/metadata/ingestion/source/pipeline/databrickspipeline/metadata.py:1331-1334

   In `yield_pipeline_lineage_details` (metadata.py line 1331), there is a `for/else` construct on the `for table_lineage in table_lineage_list` loop. The `else` block logs `"No source or target table full name found for {entity_id}"`, but Python's `for/else` triggers the `else` when the loop completes *without* a `break` — i.e., on every normal iteration through all items. This means the debug message fires after successfully processing all lineage entries, which is misleading. This is a pre-existing pattern, but the diff changes the variable used (`entity_id` replacing `job_id`), so worth noting.

Options

Auto-apply is off → Gitar will not commit updates to this branch.
Display: compact → Showing less information.

Comment with these commands to change:

Auto-apply Compact
gitar auto-apply:on         
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@varun-lakhyani
Copy link
Member Author

💡 Bug: for/else misuse: debug log triggers on normal loop completion
In yield_pipeline_lineage_details (metadata.py line 1331), there is a for/else construct on the for table_lineage in table_lineage_list loop. The else block logs "No source or target table full name found for {entity_id}", but Python's for/else triggers the else when the loop completes without a break — i.e., on every normal iteration through all items. This means the debug message fires after successfully processing all lineage entries, which is misleading. This is a pre-existing pattern, but the diff changes the variable used (entity_id replacing job_id), so worth noting.

Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion

@gitar-bot Fixed in 3b8731d - the for/else block has been removed entirely.

@gitar-bot
Copy link

gitar-bot bot commented Mar 26, 2026

Thanks for confirming! The for/else block was removed in 3b8731d, so the misleading debug log no longer fires on normal loop completion.

@sonarqubecloud
Copy link

@varun-lakhyani varun-lakhyani requested a review from ulixius9 March 27, 2026 02:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Ingestion safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Databricks Pipeline Lineage Missing

2 participants