Releases: datajoint/datajoint-python
v2.2.0
What's Changed
For a comprehensive overview of all new features, see What's New in DataJoint 2.2.
Added
- Graph-driven cascade delete and restrict on Diagram (#1407, fixes #865, #1110): New
Diagram.cascade(),Diagram.restrict(),Diagram.prune(), andDiagram.counts()methods replace the error-driven cascade approach. Delete and drop operations now use the pipeline DAG to determine affected tables before executing, with full dry-run support viasafemode=True. - Thread-safe mode with
dj.Instance(#1404): Newdj.Instance()class provides independent database connections with connection-scoped configuration. Enables safe concurrent access from multiple threads (e.g., web servers, parallel workers). - Directory references in
<filepath@store>(#1415, fixes #1410): Filepath storage now supports directory references.is_diris detected dynamically; existence checks and storage operations handle directories correctly.
Fixed
populate()withreserve_jobs=Trueignores restrictions (#1414, fixes #1413): Restrictions are now correctly applied when fetching pending keys in distributed mode, matching the behavior of direct (non-distributed) populate.- Populate antijoin uses
.proj()for correct pending key computation (#1405): Fixes cases where overlapping secondary attributes caused incorrect pending key calculations. - Allow attribute names starting with 'index' in declarations (#1412, fixes #1411): Table definitions with attribute names like
index_valueno longer raise parse errors. - Cascade delete failures on MySQL 8 (fixes #1110): The graph-driven cascade in #1407 eliminates the error code mismatch (1217 vs 1451) that caused cascade delete failures on MySQL 8.
Changed
- Backend-agnostic quoting and adapter abstractions (#1419): Refactored identifier quoting, table name construction, and schema queries into adapter methods for cleaner multi-backend (MySQL + PostgreSQL) support.
skip_duplicates=Truebehavior documented for PostgreSQL (#1417, fixes #1049): PostgreSQL already enforces secondary unique constraints whenskip_duplicates=True(raisesDuplicateErroron secondary unique conflicts, unlike MySQL which skips silently). This asymmetry is now documented and tested.
Full Changelog: v2.1.1...v2.2.0
v2.1.1
What's Changed
Bug Fixes
- Atomic job reservation to prevent race condition (#1399, fixes #1398):
Job.reserve()now uses a single atomicUPDATE ... WHERE status='pending'instead of a non-atomic SELECT→UPDATE pattern, preventing multiple workers from reserving the same key. - Hide comments from table preview display (#1393): SQL comments in table definitions are no longer shown in
.preview()output. - Correct Part table names in diagrams (#1392): Part tables now display correctly in diagrams by properly stripping the module prefix.
Removals
- Remove
size_on_disk(#1395): Removedsize_on_diskproperty fromTableandSchemaclasses. Use database-native tools for storage metrics.
Full Changelog: v2.1.0...v2.1.1
v0.14.9
What's Changed
Bug Fix
- Skip redundant S3 upload when file already exists (#1400, fixes #1397): After a transaction rollback,
upload_filepathno longer re-uploads files that already exist in S3 with matching size and contents hash. This avoids unnecessary network transfers and potential timeouts on large files.
Maintenance
Full Changelog: v0.14.8...v0.14.9
Release 2.1.0
What's Changed
Added
-
PostgreSQL backend support — DataJoint now supports PostgreSQL as an alternative to MySQL. Use
dj.config['database.backend'] = 'postgresql'to connect to PostgreSQL databases. (#1338, #1339, #1340) -
Diagram improvements (#1345)
- New
collapse()method for high-level pipeline views - Mermaid output format support via
output='mermaid' - Schema grouping with module labels
- Direction control (
direction='LR'ordirection='TB') - Default diagram direction changed from TB to LR
- New
-
Singleton tables — Support for tables with empty primary keys (#1341)
Changed
- Performance: Lazy-load
deepdiffandtqdmin autopopulate for faster imports (#1349) - Packaging: Switched from setuptools to hatchling for build system (#1358)
Deprecated
- The
migratemodule shows deprecation warning (#1373)
Fixed
- Allow table class names with underscores (with warning) (#1375)
Documentation
- Converted all docstrings to NumPy style (#1378)
Full Changelog: v2.0.2...v2.1.0
DataJoint 2.0.2
Bug Fixes
-
fix: Support 'KEY' in fetch() for backward compatibility (#1384)
- Restores
fetch('KEY')syntax from DataJoint 0.14 - Fixes #1381
- Restores
-
fix: Handle inhomogeneous array shapes in to_arrays() (#1382)
- Correctly handles blob arrays with different shapes
- Fixes #1380
-
fix: Handle semantic_check for job table operations (#1383)
- Fixes
populate(reserve_jobs=True)whenkeep_completed=True - Fixes #1379
- Fixes
-
fix: Handle missing SSL context in multiprocess populate (#1377)
- Prevents errors when SSL context is not available in child processes
Installation
pip install datajoint==2.0.2DataJoint 0.14.8
Bug Fixes
- fix: Add config option to skip filepath checksum on insert (#1387)
- New
filepath_checksum_size_limit_insertconfig option - Prevents transaction timeouts when inserting large files with filepath attributes in three-part
make()methods - Fixes #1386
- New
Usage
import datajoint as dj
# Skip checksum on insert for files > 1GB
dj.config['filepath_checksum_size_limit_insert'] = 1024 * 1024 * 1024Installation
pip install datajoint==0.14.8Release 2.0.1
⚡️ Enhancements
- fix: Remove setuptools, ipython, matplotlib, faker, and urllib3 from runtime dependencies(#1372)@dimitri-yatsenko
🐛 Bug Fixes
- fix: Allow table class names with underscores (with warning)(#1375)@dimitri-yatsenko
- fix: make
fetcha class method of user tables for backward compatibility with pre-v2.0(#1375)@dimitri-yatsenko
Full Changelog: v2.0.0...v2.0.1
Release 2.0.0
DataJoint 2.0 - Computational Foundation for Agentic Data Pipelines
This is a major release representing a complete rewrite of the DataJoint Python library. It introduces a modernized architecture with an extensible type system, object-augmented schemas, semantic matching, and improved developer experience.
Related:
- PR #1311 — Complete rewrite implementation
- Discussion #1235 — DataJoint 2.0 design
- Discussion #1354 — Object-Augmented Schemas (OAS)
- Discussion #1256 — Extensible type system
- Discussion #1243 — Semantic matching and lineage
💥 Breaking Changes
Platform Requirements
- Python 3.10+ required - Dropped support for Python 3.9 and earlier
- MySQL 8.0+ required - Dropped support for MySQL 5.x and pre-8.0 versions
Architecture Changes
- New package structure - Source code moved to
src/datajoint/ - Extensible Type/Codec System - New
<codec>syntax replaces hardcoded blob/attach handling. Custom codecs extenddj.Codecwithencode()/decode()methods - Object-Augmented Schemas (OAS) - Schema-addressed storage (
<object@>,<npy@>) creates browsable paths mirroring database structure - Semantic Matching with Lineage -
~lineagetable tracks attribute origins. Joins/restrictions enforce homologous namesakes must share lineage - Table-Specific Jobs Tables - Each Computed/Imported table has its own
~~table_namejobs table (replaces shared jobs table) - New Configuration System - pydantic-settings based config with
datajoint.json,.secrets/directory, andDJ_*environment variables - New Test Infrastructure - Uses testcontainers for automatic MySQL/MinIO management (no manual docker-compose required)
Removed/Deprecated Features
dj.conn()interactive prompts - Use environment variables or config filedj.kill()anddj.kill_quick()- Use database administration toolsotumatdependency - S3 credential management simplified- Positional tuple inserts deprecated - Use dict with explicit field names
~logtable deprecated - Schema-level logging table no longer used
🚀 Major Features
Core Type System
Scientist-friendly type names with portable semantics:
- Numeric:
float32,float64,int64,int32,int16,int8,bool - Special:
uuid(binary(16)),json,bytes(longblob) - Temporal:
date,datetime - String:
char(n),varchar(n),enum(...) - Fixed-point:
decimal(m,n)
Extensible Codec System
class GraphCodec(dj.Codec):
name = "graph"
def get_dtype(self, is_store): return "<blob>"
def encode(self, value, *, key=None, store_name=None): ...
def decode(self, stored, *, key=None): ...
# Use in definitions: data : <graph>Built-in codecs: <blob>, <blob@>, <attach>, <attach@>, <hash@>, <object@>, <npy@>, <filepath@>
Object-Augmented Schemas (OAS)
- Hash-addressed (
<blob@>,<attach@>,<hash@>): Content-addressed with MD5 deduplication (base32-encoded, 26 chars). Paths:_hash/{hash[:2]}/{hash[2:4]}/{hash} - Schema-addressed (
<object@>,<npy@>): Paths mirror schema structure:{schema}/{table}/{pk}/{attribute} - Filepath references (
<filepath@>): Reference existing files in stores without copying - Lazy references:
NpyRefandObjectRefprovide metadata access without I/O
Semantic Matching
- Lineage tracking identifies attribute origins (
schema.table.attribute) - Binary operations (join, restrict, union, aggr) enforce lineage compatibility
- Use
schema.rebuild_lineage()for legacy schema migration
Jobs 2.0
- Per-table job queues with
~~table_namenaming pattern - Composite index
(status, priority, scheduled_time)for efficient job fetching - Improved error tracking and job status management
New Query Operator
extend(other)- Left-joins a functionally dependent table, preserving primary key and row count
Modernized Output Methods
keys()- Returns list of primary key dictsto_arrays(*attrs)- Returns tuple of numpy arraysto_dicts()- Returns list of dictionariesto_pandas()- Returns pandas DataFrameto_polars()- Returns Polars DataFrameto_arrow()- Returns PyArrow Tablefetch()preserved with deprecation warning for backward compatibility
Configuration Enhancements
datajoint.jsonproject config with parent directory search.secrets/directory for sensitive values (gitignore this)database.database_prefixsetting for automatic schema name prefixingdatabase.create_tablessetting to control automatic table creationdj.config.override()context manager for temporary config changes
📚 Documentation
Documentation has been moved to a dedicated repository and completely rewritten using the Diátaxis framework:
- Live site: https://docs.datajoint.com
- Repository: https://github.com/datajoint/datajoint-docs
Structure:
- Tutorials — Learn by building real pipelines (Jupyter notebooks)
- How-To Guides — Practical task-oriented guides
- Explanation — Understanding concepts and design
- Reference — Specifications and API documentation
- Migration Guide — Upgrade from legacy versions
⚖️ License Change
DataJoint 2.0 is released under Apache 2.0 license (previously LGPLv2.1).
0.14.7
🐛 Bug Fixes
- fix: Pass make_kwargs to make_fetch in tripartite pattern (#1360) @dimitri-yatsenko
When using generator-based make (make_fetch, make_compute, make_insert), make_kwargs passed to populate() were not being forwarded to make_fetch. This caused TypeError when using make_kwargs with the tripartite pattern.
Fixes #1350
⚠️ End-of-Life Notice
This is the final maintenance release for the 0.14.x branch.
- No further 0.14.x releases are planned
- There will be no v0.15 — the next major version is v2.0
- Security fixes only will be considered on a case-by-case basis
We encourage all users on 0.14.x to plan their migration to v2.0.
Full Changelog: v0.14.6...v0.14.7
Release 0.14.6
⚡️ Enhancements
- update documentation and devcontainer(#1250)@dimitri-yatsenko
- Update version 0.14.5(#1249)@kavenk
📝 Documentation
- update documentation and devcontainer(#1250)@dimitri-yatsenko
- Update version 0.14.5(#1249)@kavenk
Full Changelog: v0.14.5...v0.14.6