Skip to content

Rewrite schema merge for efficiency and referential equality#1735

Merged
kmcginnes merged 3 commits intoaws:mainfrom
kmcginnes:rewrite-schema-merge
Apr 29, 2026
Merged

Rewrite schema merge for efficiency and referential equality#1735
kmcginnes merged 3 commits intoaws:mainfrom
kmcginnes:rewrite-schema-merge

Conversation

@kmcginnes
Copy link
Copy Markdown
Collaborator

@kmcginnes kmcginnes commented Apr 28, 2026

Description

Rewrites updateSchemaFromEntities and its supporting functions to be more efficient and preserve referential equality at every level of the schema when nothing changes. This reduces unnecessary React re-renders when entities are merged into an already-up-to-date schema.

Changes

  • Single-pass merge instead of pre-check + merge — The old code ran shouldUpdateSchemaFromEntities + hasNewPrefixNamespaces to decide if an update was needed, then updateSchemaFromEntities repeated the same work to do the merge. The new code does one merge pass that preserves referential equality when nothing changes, making the pre-check redundant and removing it entirely.

  • Merge directly from entitiesmergeVertices and mergeEdges work directly from Vertex[]/Edge[] instead of first converting to intermediate VertexTypeConfig[]/EdgeTypeConfig[] via mapVertexToTypeConfigs/mapEdgeToTypeConfig, avoiding throwaway allocations.

  • Merge functions return discovered IRIsmergeVertices and mergeEdges return { configs, newIris } where newIris contains only the type/attribute IRIs that were actually new. This keeps prefix scanning focused on what changed.

  • Optimized prefix scanningmergePrefixes only scans newly-discovered IRIs and entity IDs, rather than re-scanning all existing schema types on every call.

  • Fixed activeSchemaSelector setter churn — The setter allocated a new Map unconditionally, even when the update was a no-op (e.g., returning the same value, or deleting a key that was not present). Now it checks newValue === prev before cloning and guards the RESET/undefined path against no-op deletes.

  • Added createPrefixTypeConfig factory — Accepts plain strings { prefix, uri, inferred? } and produces the branded RdfPrefix/IriNamespace types, reducing manual as casts.

  • Added debug logging — Logs when the schema is updated, when it is already up to date, and when new vertex types, edge types, attributes, or prefixes are discovered.

Benchmarks (70k vertex types, 3 attributes each)

The old code called getSchemaUris on every merge, which iterated all 70k types and 210k attributes to build a Set<string> — even when nothing changed.

Scenario Old New Speedup
No-op (50 entities already in schema) 935ms 6.5ms 144x
10 new types added 1,306ms 6.4ms 203x
Mixed (40 existing + 10 new) 1,293ms 6.9ms 188x
Empty schema, 100 new entities 0.99ms 1.18ms ~1x

The empty schema case is roughly equivalent since there is no existing schema to skip.

New tests

  • Multi-label vertex adds all types to schema
  • Prefix deduplication stability across incremental updates (country, country2country3)
  • New attribute namespace on existing type generates a prefix
  • New types get prefixes but existing types do not
  • schemaAtom not churned when active config has no schema entry
  • schemaAtom not churned when entities already match schema
  • Vertices/edges/prefixes array references preserved when only the other changes

Validation

  • pnpm checks passes (lint, format, types)
  • pnpm test passes (1646 tests across 152 files)
  • Manually tested by emptying the schema via debug action and verifying incremental rediscovery through search

Check List

  • I confirm that my contribution is made under the terms of the Apache 2.0 license.
  • I have verified pnpm checks passes with no errors.
  • I have verified pnpm test passes with no failures.
  • I have covered new added functionality with unit tests if necessary.
  • I have updated documentation if necessary.

@kmcginnes kmcginnes force-pushed the rewrite-schema-merge branch from eb62c74 to efa4fea Compare April 28, 2026 20:53
@kmcginnes kmcginnes force-pushed the rewrite-schema-merge branch from efa4fea to 5843307 Compare April 28, 2026 20:54
@kmcginnes kmcginnes changed the title Rewrite schema merge for referential equality Rewrite schema merge for efficiency and referential equality Apr 28, 2026
@kmcginnes kmcginnes marked this pull request as ready for review April 28, 2026 22:31
@kmcginnes kmcginnes merged commit dc88dcc into aws:main Apr 29, 2026
3 checks passed
@kmcginnes kmcginnes deleted the rewrite-schema-merge branch April 29, 2026 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants