Releases · datajoint/datajoint-python

01 Apr 15:51

dimitri-yatsenko

v2.2.0

865bd29

v2.2.0 Latest

Latest

What's Changed

For a comprehensive overview of all new features, see What's New in DataJoint 2.2.

Added

Graph-driven cascade delete and restrict on Diagram (#1407, fixes #865, #1110): New Diagram.cascade(), Diagram.restrict(), Diagram.prune(), and Diagram.counts() methods replace the error-driven cascade approach. Delete and drop operations now use the pipeline DAG to determine affected tables before executing, with full dry-run support via safemode=True.
Thread-safe mode with dj.Instance (#1404): New dj.Instance() class provides independent database connections with connection-scoped configuration. Enables safe concurrent access from multiple threads (e.g., web servers, parallel workers).
Directory references in <filepath@store> (#1415, fixes #1410): Filepath storage now supports directory references. is_dir is detected dynamically; existence checks and storage operations handle directories correctly.

Fixed

populate() with reserve_jobs=True ignores restrictions (#1414, fixes #1413): Restrictions are now correctly applied when fetching pending keys in distributed mode, matching the behavior of direct (non-distributed) populate.
Populate antijoin uses .proj() for correct pending key computation (#1405): Fixes cases where overlapping secondary attributes caused incorrect pending key calculations.
Allow attribute names starting with 'index' in declarations (#1412, fixes #1411): Table definitions with attribute names like index_value no longer raise parse errors.
Cascade delete failures on MySQL 8 (fixes #1110): The graph-driven cascade in #1407 eliminates the error code mismatch (1217 vs 1451) that caused cascade delete failures on MySQL 8.

Changed

Backend-agnostic quoting and adapter abstractions (#1419): Refactored identifier quoting, table name construction, and schema queries into adapter methods for cleaner multi-backend (MySQL + PostgreSQL) support.
skip_duplicates=True behavior documented for PostgreSQL (#1417, fixes #1049): PostgreSQL already enforces secondary unique constraints when skip_duplicates=True (raises DuplicateError on secondary unique conflicts, unlike MySQL which skips silently). This asymmetry is now documented and tested.

Full Changelog: v2.1.1...v2.2.0

Assets 4

17 Feb 19:54

dimitri-yatsenko

v2.1.1

4a7e1e8

v2.1.1

What's Changed

Bug Fixes

Atomic job reservation to prevent race condition (#1399, fixes #1398): Job.reserve() now uses a single atomic UPDATE ... WHERE status='pending' instead of a non-atomic SELECT→UPDATE pattern, preventing multiple workers from reserving the same key.
Hide comments from table preview display (#1393): SQL comments in table definitions are no longer shown in .preview() output.
Correct Part table names in diagrams (#1392): Part tables now display correctly in diagrams by properly stripping the module prefix.

Removals

Remove size_on_disk (#1395): Removed size_on_disk property from Table and Schema classes. Use database-native tools for storage metrics.

Full Changelog: v2.1.0...v2.1.1

Assets 4

17 Feb 19:40

dimitri-yatsenko

v0.14.9

1e6cc11

v0.14.9

What's Changed

Bug Fix

Skip redundant S3 upload when file already exists (#1400, fixes #1397): After a transaction rollback, upload_filepath no longer re-uploads files that already exist in S3 with matching size and contents hash. This avoids unnecessary network transfers and potential timeouts on large files.

Maintenance

Pin setuptools<82 for pkg_resources compatibility (#1396, fixes #1394)

Full Changelog: v0.14.8...v0.14.9

Assets 4

05 Feb 21:43

dimitri-yatsenko

v2.1.0

a1ab055

Release 2.1.0

What's Changed

Added

PostgreSQL backend support — DataJoint now supports PostgreSQL as an alternative to MySQL. Use dj.config['database.backend'] = 'postgresql' to connect to PostgreSQL databases. (#1338, #1339, #1340)
Diagram improvements (#1345)
- New collapse() method for high-level pipeline views
- Mermaid output format support via output='mermaid'
- Schema grouping with module labels
- Direction control (direction='LR' or direction='TB')
- Default diagram direction changed from TB to LR
Singleton tables — Support for tables with empty primary keys (#1341)

Changed

Performance: Lazy-load deepdiff and tqdm in autopopulate for faster imports (#1349)
Packaging: Switched from setuptools to hatchling for build system (#1358)

Deprecated

The migrate module shows deprecation warning (#1373)

Fixed

Allow table class names with underscores (with warning) (#1375)

Documentation

Converted all docstrings to NumPy style (#1378)

Full Changelog: v2.0.2...v2.1.0

Assets 4

05 Feb 18:35

dimitri-yatsenko

v2.0.2

da0cc0c

DataJoint 2.0.2

Bug Fixes

fix: Support 'KEY' in fetch() for backward compatibility (#1384)
- Restores fetch('KEY') syntax from DataJoint 0.14
- Fixes #1381
fix: Handle inhomogeneous array shapes in to_arrays() (#1382)
- Correctly handles blob arrays with different shapes
- Fixes #1380
fix: Handle semantic_check for job table operations (#1383)
- Fixes populate(reserve_jobs=True) when keep_completed=True
- Fixes #1379
fix: Handle missing SSL context in multiprocess populate (#1377)
- Prevents errors when SSL context is not available in child processes

Installation

pip install datajoint==2.0.2

Assets 4

05 Feb 20:11

dimitri-yatsenko

v0.14.8

60a8f05

DataJoint 0.14.8

Bug Fixes

fix: Add config option to skip filepath checksum on insert (#1387)
- New filepath_checksum_size_limit_insert config option
- Prevents transaction timeouts when inserting large files with filepath attributes in three-part make() methods
- Fixes #1386

Usage

import datajoint as dj

# Skip checksum on insert for files > 1GB
dj.config['filepath_checksum_size_limit_insert'] = 1024 * 1024 * 1024

Installation

pip install datajoint==0.14.8

Assets 4

03 Feb 19:46

github-actions

v2.0.1

5a78063

Release 2.0.1

⚡️ Enhancements

fix: Remove setuptools, ipython, matplotlib, faker, and urllib3 from runtime dependencies(#1372)@dimitri-yatsenko

🐛 Bug Fixes

fix: Allow table class names with underscores (with warning)(#1375)@dimitri-yatsenko
fix: make fetch a class method of user tables for backward compatibility with pre-v2.0(#1375)@dimitri-yatsenko

Full Changelog: v2.0.0...v2.0.1

Contributors

dimitri-yatsenko

Assets 4

03 Feb 01:55

dimitri-yatsenko

v2.0.0

3a1c2db

Release 2.0.0

DataJoint 2.0 - Computational Foundation for Agentic Data Pipelines

This is a major release representing a complete rewrite of the DataJoint Python library. It introduces a modernized architecture with an extensible type system, object-augmented schemas, semantic matching, and improved developer experience.

Related:

PR #1311 — Complete rewrite implementation
Discussion #1235 — DataJoint 2.0 design
Discussion #1354 — Object-Augmented Schemas (OAS)
Discussion #1256 — Extensible type system
Discussion #1243 — Semantic matching and lineage

💥 Breaking Changes

Platform Requirements

Python 3.10+ required - Dropped support for Python 3.9 and earlier
MySQL 8.0+ required - Dropped support for MySQL 5.x and pre-8.0 versions

Architecture Changes

New package structure - Source code moved to src/datajoint/
Extensible Type/Codec System - New <codec> syntax replaces hardcoded blob/attach handling. Custom codecs extend dj.Codec with encode()/decode() methods
Object-Augmented Schemas (OAS) - Schema-addressed storage (<object@>, <npy@>) creates browsable paths mirroring database structure
Semantic Matching with Lineage - ~lineage table tracks attribute origins. Joins/restrictions enforce homologous namesakes must share lineage
Table-Specific Jobs Tables - Each Computed/Imported table has its own ~~table_name jobs table (replaces shared jobs table)
New Configuration System - pydantic-settings based config with datajoint.json, .secrets/ directory, and DJ_* environment variables
New Test Infrastructure - Uses testcontainers for automatic MySQL/MinIO management (no manual docker-compose required)

Removed/Deprecated Features

dj.conn() interactive prompts - Use environment variables or config file
dj.kill() and dj.kill_quick() - Use database administration tools
otumat dependency - S3 credential management simplified
Positional tuple inserts deprecated - Use dict with explicit field names
~log table deprecated - Schema-level logging table no longer used

🚀 Major Features

Core Type System

Scientist-friendly type names with portable semantics:

Numeric: float32, float64, int64, int32, int16, int8, bool
Special: uuid (binary(16)), json, bytes (longblob)
Temporal: date, datetime
String: char(n), varchar(n), enum(...)
Fixed-point: decimal(m,n)

Extensible Codec System

class GraphCodec(dj.Codec):
    name = "graph"
    def get_dtype(self, is_store): return "<blob>"
    def encode(self, value, *, key=None, store_name=None): ...
    def decode(self, stored, *, key=None): ...

# Use in definitions: data : <graph>

Built-in codecs: <blob>, <blob@>, <attach>, <attach@>, <hash@>, <object@>, <npy@>, <filepath@>

Object-Augmented Schemas (OAS)

Hash-addressed (<blob@>, <attach@>, <hash@>): Content-addressed with MD5 deduplication (base32-encoded, 26 chars). Paths: _hash/{hash[:2]}/{hash[2:4]}/{hash}
Schema-addressed (<object@>, <npy@>): Paths mirror schema structure: {schema}/{table}/{pk}/{attribute}
Filepath references (<filepath@>): Reference existing files in stores without copying
Lazy references: NpyRef and ObjectRef provide metadata access without I/O

Semantic Matching

Lineage tracking identifies attribute origins (schema.table.attribute)
Binary operations (join, restrict, union, aggr) enforce lineage compatibility
Use schema.rebuild_lineage() for legacy schema migration

Jobs 2.0

Per-table job queues with ~~table_name naming pattern
Composite index (status, priority, scheduled_time) for efficient job fetching
Improved error tracking and job status management

New Query Operator

extend(other) - Left-joins a functionally dependent table, preserving primary key and row count

Modernized Output Methods

keys() - Returns list of primary key dicts
to_arrays(*attrs) - Returns tuple of numpy arrays
to_dicts() - Returns list of dictionaries
to_pandas() - Returns pandas DataFrame
to_polars() - Returns Polars DataFrame
to_arrow() - Returns PyArrow Table
fetch() preserved with deprecation warning for backward compatibility

Configuration Enhancements

datajoint.json project config with parent directory search
.secrets/ directory for sensitive values (gitignore this)
database.database_prefix setting for automatic schema name prefixing
database.create_tables setting to control automatic table creation
dj.config.override() context manager for temporary config changes

📚 Documentation

Documentation has been moved to a dedicated repository and completely rewritten using the Diátaxis framework:

Live site: https://docs.datajoint.com
Repository: https://github.com/datajoint/datajoint-docs

Structure:

Tutorials — Learn by building real pipelines (Jupyter notebooks)
How-To Guides — Practical task-oriented guides
Explanation — Understanding concepts and design
Reference — Specifications and API documentation
Migration Guide — Upgrade from legacy versions

⚖️ License Change

DataJoint 2.0 is released under Apache 2.0 license (previously LGPLv2.1).

Assets 4

02 Feb 19:25

dimitri-yatsenko

v0.14.7

832d92b

0.14.7

🐛 Bug Fixes

fix: Pass make_kwargs to make_fetch in tripartite pattern (#1360) @dimitri-yatsenko

When using generator-based make (make_fetch, make_compute, make_insert), make_kwargs passed to populate() were not being forwarded to make_fetch. This caused TypeError when using make_kwargs with the tripartite pattern.

Fixes #1350

⚠️ End-of-Life Notice

This is the final maintenance release for the 0.14.x branch.

No further 0.14.x releases are planned
There will be no v0.15 — the next major version is v2.0
Security fixes only will be considered on a case-by-case basis

We encourage all users on 0.14.x to plan their migration to v2.0.

Full Changelog: v0.14.6...v0.14.7

Contributors

dimitri-yatsenko

Assets 4

31 Jul 22:06

github-actions

v0.14.6

701f5ad

Release 0.14.6

⚡️ Enhancements

update documentation and devcontainer(#1250)@dimitri-yatsenko
Update version 0.14.5(#1249)@kavenk

📝 Documentation

update documentation and devcontainer(#1250)@dimitri-yatsenko
Update version 0.14.5(#1249)@kavenk

Full Changelog: v0.14.5...v0.14.6

Contributors

dimitri-yatsenko and kavenk

Assets 4

Releases: datajoint/datajoint-python

v2.2.0

What's Changed

Added

Fixed

Changed

Uh oh!

v2.1.1

What's Changed

Bug Fixes

Removals

Uh oh!

v0.14.9

What's Changed

Bug Fix

Maintenance

Uh oh!

Release 2.1.0

What's Changed

Added

Changed

Deprecated

Fixed

Documentation

Uh oh!

DataJoint 2.0.2

Bug Fixes

Installation

Uh oh!

DataJoint 0.14.8

Bug Fixes

Usage

Installation

Uh oh!

Release 2.0.1

⚡️ Enhancements

🐛 Bug Fixes

Contributors

Uh oh!

Release 2.0.0

DataJoint 2.0 - Computational Foundation for Agentic Data Pipelines

💥 Breaking Changes

🚀 Major Features

📚 Documentation

⚖️ License Change

Uh oh!

0.14.7

🐛 Bug Fixes

⚠️ End-of-Life Notice

Contributors

Uh oh!

Release 0.14.6

⚡️ Enhancements

📝 Documentation

Contributors

Uh oh!