Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -18,46 +18,48 @@ View the metadata cache information of the External Catalog in the currently con

## Table Information

| Column Name | Type | Description |
| ------------ | ---- | ----------------------- |
| CATALOG_NAME | text | The name of the Catalog |
| CACHE_NAME | text | The name of the cache |
| METRIC_NAME | text | The name of the metric |
| METRIC_VALUE | text | The value of the metric |
One row represents one cache entry on one FE for one external catalog.

| Column Name | Type | Description |
| ------------ | ---- | ----------- |
| FE_HOST | text | FE host that reports the stats |
| CATALOG_NAME | text | Catalog name |
| ENGINE_NAME | text | Meta cache engine name, such as `hive`, `iceberg`, `paimon` |
| ENTRY_NAME | text | Cache entry name inside the engine, such as `schema`, `file`, `manifest` |
| EFFECTIVE_ENABLED | boolean | Whether the cache is effectively enabled after evaluating `enable` / `ttl-second` / `capacity` |
| CONFIG_ENABLED | boolean | Raw `enable` flag from the cache config |
| AUTO_REFRESH | boolean | Whether async refresh-after-write is enabled for this entry |
| TTL_SECOND | bigint | TTL in seconds. `0` means disabled; `-1` means no expiration |
| CAPACITY | bigint | Max entry count |
| ESTIMATED_SIZE | bigint | Estimated current cache size |
| REQUEST_COUNT | bigint | Total requests |
| HIT_COUNT | bigint | Cache hits |
| MISS_COUNT | bigint | Cache misses |
| HIT_RATE | double | Hit rate |
| LOAD_SUCCESS_COUNT | bigint | Successful loads |
| LOAD_FAILURE_COUNT | bigint | Failed loads |
| TOTAL_LOAD_TIME_MS | bigint | Total load time in milliseconds |
| AVG_LOAD_PENALTY_MS | double | Average load time in milliseconds |
| EVICTION_COUNT | bigint | Evicted entries |
| INVALIDATE_COUNT | bigint | Explicit invalidations |
| LAST_LOAD_SUCCESS_TIME | text | Last successful load time |
| LAST_LOAD_FAILURE_TIME | text | Last failed load time |
| LAST_ERROR | text | Latest load error message |


## Usage Example

```text
+----------------------+-----------------------------+----------------------+---------------------+
| CATALOG_NAME | CACHE_NAME | METRIC_NAME | METRIC_VALUE |
+----------------------+-----------------------------+----------------------+---------------------+
| hive_iceberg_minio | iceberg_table_cache | eviction_count | 0 |
| hive_iceberg_minio | iceberg_table_cache | hit_ratio | 0.8235294117647058 |
| hive_iceberg_minio | iceberg_table_cache | average_load_penalty | 5.480102048333334E8 |
| hive_iceberg_minio | iceberg_table_cache | estimated_size | 6 |
| hive_iceberg_minio | iceberg_table_cache | hit_count | 28 |
| hive_iceberg_minio | iceberg_table_cache | read_count | 34 |
| hive_iceberg_minio | iceberg_snapshot_list_cache | eviction_count | 0 |
| hive_iceberg_minio | iceberg_snapshot_list_cache | hit_ratio | 1.0 |
| hive_iceberg_minio | iceberg_snapshot_list_cache | average_load_penalty | 0.0 |
| hive_iceberg_minio | iceberg_snapshot_list_cache | estimated_size | 0 |
| hive_iceberg_minio | iceberg_snapshot_list_cache | hit_count | 0 |
| hive_iceberg_minio | iceberg_snapshot_list_cache | read_count | 0 |
| hive_iceberg_minio | iceberg_snapshot_cache | eviction_count | 0 |
| hive_iceberg_minio | iceberg_snapshot_cache | hit_ratio | 0.45454545454545453 |
| hive_iceberg_minio | iceberg_snapshot_cache | average_load_penalty | 5.604907246666666E8 |
| hive_iceberg_minio | iceberg_snapshot_cache | estimated_size | 6 |
| hive_iceberg_minio | iceberg_snapshot_cache | hit_count | 5 |
| hive_iceberg_minio | iceberg_snapshot_cache | read_count | 11 |
```sql
SELECT catalog_name, engine_name, entry_name,
effective_enabled, ttl_second, capacity,
estimated_size, hit_rate, last_error
FROM information_schema.catalog_meta_cache_statistics
ORDER BY catalog_name, engine_name, entry_name;
```

The METRIC_NAME column contains the following Caffeine cache performance metrics:
- eviction_count: The number of entries that have been evicted from the cache
- hit_ratio: The ratio of cache requests which were hits (ranges from 0.0 to 1.0)
- average_load_penalty: The average time spent loading new values (in nanoseconds)
- estimated_size: The approximate number of entries in the cache
- hit_count: The number of times cache lookup methods have returned a cached value
- read_count: The total number of times cache lookup methods have been called
Typical usage:

- Use `ENGINE_NAME` + `ENTRY_NAME` to identify one logical cache entry.
- Use `EFFECTIVE_ENABLED`, `TTL_SECOND`, and `CAPACITY` to confirm the applied cache policy.
- Use `HIT_RATE`, `ESTIMATED_SIZE`, `LOAD_FAILURE_COUNT`, and `LAST_ERROR` to diagnose behavior.

72 changes: 72 additions & 0 deletions docs/lakehouse/catalogs/hive-catalog.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,78 @@ CREATE CATALOG [IF NOT EXISTS] catalog_name PROPERTIES (

The CommonProperties section is for entering common attributes. Please see the "Common Properties" section in the [Catalog Overview](../catalog-overview.md).

## Metadata Cache {#meta-cache}

To improve the performance of accessing external data sources, Apache Doris caches Hive metadata. Metadata includes table structure (Schema), partition lists, partition properties, and file lists.

:::tip
For versions before Doris 4.1.x, metadata caching is mainly controlled globally by FE configuration items. For details, see [Metadata Cache](../meta-cache.md).
Starting from Doris 4.1.x, Hive Catalog's external metadata cache is configured using the unified `meta.cache.*` keys.
:::

### Unified Property Model (4.1.x+) {#meta-cache-unified-model}

Each engine's cache entry uses a unified configuration key format: `meta.cache.<engine>.<entry>.{enable,ttl-second,capacity}`.

| Property | Example | Meaning |
|---|---|---|
| `enable` | `true/false` | Whether to enable this cache module. |
| `ttl-second` | `600`, `0`, `-1` | `0` means disable cache (takes effect immediately, can be used to see the latest data); `-1` means never expire; other positive integers mean TTL in seconds based on access time. |
| `capacity` | `10000` | Maximum number of cache entries (by count). `0` means disable. |

**Effective Logic:** The module cache only takes effect when `enable=true`, `ttl-second != 0`, and `capacity > 0`.

### Cache Modules {#meta-cache-unified-modules}

Hive Catalog includes the following cache modules:

| Module (`<entry>`) | Property Key Prefix | Cached Content and Impact |
|---|---|---|
| `schema` | `meta.cache.hive.schema.` | Caches table structure. Impact: Visibility of table column information. If disabled, the latest Schema is pulled for each query. |
| `partition_values` | `meta.cache.hive.partition_values.` | Caches partition values/names list. Impact: Partition pruning and enumeration. If disabled, new external partitions can be seen in real-time. |
| `partition` | `meta.cache.hive.partition.` | Caches partition properties (Location, input format, etc.). Impact: Specific metadata of partitions. |
| `file` | `meta.cache.hive.file.` | Caches file lists. Impact: Reduces remote LIST operation overhead. If disabled, file changes can be seen in real-time. |

### Legacy Parameter Mapping and Conversion {#meta-cache-mapping}

In version 4.1.x and later, unified keys are recommended. The following is the mapping between legacy Catalog properties and 4.1.x+ unified keys:

| Legacy Property Key | 4.1.x+ Unified Key | Description |
|---|---|---|
| `schema.cache.ttl-second` | `meta.cache.hive.schema.ttl-second` | Expiration time of table structure cache |
| `partition.cache.ttl-second` | `meta.cache.hive.partition_values.ttl-second` | Expiration time of partition value cache |
| `file.meta.cache.ttl-second` | `meta.cache.hive.file.ttl-second` | Expiration time of file list cache |

### Best Practices {#meta-cache-best-practices}

* **Real-time access to the latest data**: If you want each query to see the latest partition or file changes in the external data source, you can set the corresponding `ttl-second` to `0`.
```sql
-- Disable file list cache to see file changes in real-time
ALTER CATALOG hive_ctl SET PROPERTIES ("meta.cache.hive.file.ttl-second" = "0");
-- Disable partition value cache to see new partitions in real-time
ALTER CATALOG hive_ctl SET PROPERTIES ("meta.cache.hive.partition_values.ttl-second" = "0");
```
* **Performance optimization**: For scenarios where metadata changes are infrequent, it is recommended to appropriately increase `capacity` and `ttl-second` to reduce access pressure on Hive Metastore and file systems.

:::caution
**Hive Catalog Note**: Changes to `meta.cache.hive.*` properties **do not support hot-reload**. To ensure new configurations take effect, you must recreate the catalog or restart the FE node.
:::

### Observability {#meta-cache-unified-observability}

Cache metrics can be observed through the `information_schema.catalog_meta_cache_statistics` system table:

```sql
SELECT catalog_name, engine_name, entry_name,
effective_enabled, ttl_second, capacity,
estimated_size, hit_rate, load_failure_count, last_error
FROM information_schema.catalog_meta_cache_statistics
WHERE catalog_name = 'hive_ctl' AND engine_name = 'hive'
ORDER BY entry_name;
```

See the documentation for this system table: [catalog_meta_cache_statistics](../../admin-manual/system-tables/information_schema/catalog_meta_cache_statistics.md).

### Supported Hive Versions

Supports Hive 1.x, 2.x, 3.x, and 4.x.
Expand Down
64 changes: 64 additions & 0 deletions docs/lakehouse/catalogs/hudi-catalog.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,70 @@ CREATE CATALOG [IF NOT EXISTS] catalog_name PROPERTIES (
| ------------------------------- | -------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------- |
| `hudi.use_hive_sync_partition` | `use_hive_sync_partition` | Whether to use the partition information already synchronized by Hive Metastore. If true, partition information will be obtained directly from Hive Metastore. Otherwise, it will be obtained from the metadata file of the file system. Obtaining information from Hive Metastore is more efficient, but users need to ensure that the latest metadata has been synchronized to Hive Metastore. | false |

## Metadata Cache {#meta-cache}

To improve the performance of accessing external data sources, Apache Doris caches Hudi metadata. Metadata includes table structure (Schema), partition information, FS View, and Meta Client objects.

:::tip
For versions before Doris 4.1.x, metadata caching is mainly controlled globally by FE configuration items. For details, see [Metadata Cache](../meta-cache.md).
Starting from Doris 4.1.x, Hudi-related external metadata cache is configured using the unified `meta.cache.*` keys.
:::

### Unified Property Model (4.1.x+) {#meta-cache-unified-model}

Each engine's cache entry uses a unified configuration key format: `meta.cache.<engine>.<entry>.{enable,ttl-second,capacity}`.

| Property | Example | Meaning |
|---|---|---|
| `enable` | `true/false` | Whether to enable this cache module. |
| `ttl-second` | `600`, `0`, `-1` | `0` means disable cache (takes effect immediately, can be used to see the latest data); `-1` means never expire; other positive integers mean TTL in seconds based on access time. |
| `capacity` | `10000` | Maximum number of cache entries (by count). `0` means disable. |

**Effective Logic:** The module cache only takes effect when `enable=true`, `ttl-second != 0`, and `capacity > 0`.

### Cache Modules {#meta-cache-unified-modules}

Hudi Catalog includes the following cache modules:

| Module (`<entry>`) | Property Key Prefix | Cached Content and Impact |
|---|---|---|
| `schema` | `meta.cache.hudi.schema.` | Caches table structure. Impact: Visibility of table column information. If disabled, the latest Schema is pulled for each query. |
| `partition` | `meta.cache.hudi.partition.` | Caches Hudi partition-related metadata. Impact: Used for partition discovery and pruning. |
| `fs_view` | `meta.cache.hudi.fs_view.` | Caches Hudi filesystem view related metadata. |
| `meta_client` | `meta.cache.hudi.meta_client.` | Caches Hudi Meta Client objects. Impact: Reduces redundant loading of Hudi metadata. |

### Legacy Parameter Mapping and Conversion {#meta-cache-mapping}

In version 4.1.x and later, unified keys are recommended. The following is the mapping between legacy Catalog properties and 4.1.x+ unified keys:

| Legacy Property Key | 4.1.x+ Unified Key | Description |
|---|---|---|
| `schema.cache.ttl-second` | `meta.cache.hudi.schema.ttl-second` | Expiration time of table structure cache |

### Best Practices {#meta-cache-best-practices}

* **Real-time access to the latest data**: If you want each query to see the latest data changes or schema changes for Hudi tables, you can set the `ttl-second` for `schema` or `partition` to `0`.
```sql
-- Disable partition metadata cache to detect the latest partition changes in Hudi tables
ALTER CATALOG hudi_ctl SET PROPERTIES ("meta.cache.hudi.partition.ttl-second" = "0");
```
* **Performance optimization**: Changes via `ALTER CATALOG ... SET PROPERTIES` support hot-reload in Hudi (via the HMS catalog property update path).

### Observability {#meta-cache-unified-observability}

Cache metrics can be observed through the `information_schema.catalog_meta_cache_statistics` system table:

```sql
SELECT catalog_name, engine_name, entry_name,
effective_enabled, ttl_second, capacity,
estimated_size, hit_rate, load_failure_count, last_error
FROM information_schema.catalog_meta_cache_statistics
WHERE catalog_name = 'hudi_ctl' AND engine_name = 'hudi'
ORDER BY entry_name;
```

See the documentation for this system table: [catalog_meta_cache_statistics](../../admin-manual/system-tables/information_schema/catalog_meta_cache_statistics.md).

### Supported Hudi Versions

The current dependent Hudi version is 0.15. It is recommended to access Hudi data version 0.14 and above.
Expand Down
72 changes: 72 additions & 0 deletions docs/lakehouse/catalogs/iceberg-catalog.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,78 @@ CREATE CATALOG [IF NOT EXISTS] catalog_name PROPERTIES (

The CommonProperties section is for entering general properties. See the [Catalog Overview](../catalog-overview.md) for details on common properties.

## Metadata Cache {#meta-cache}

To improve the performance of accessing external data sources, Apache Doris caches Iceberg metadata. Metadata includes table structure (Schema), table objects, view objects, and manifest details.

:::tip
For versions before Doris 4.1.x, metadata caching is mainly controlled globally by FE configuration items. For details, see [Metadata Cache](../meta-cache.md).
Starting from Doris 4.1.x, Iceberg Catalog's external metadata cache is configured using the unified `meta.cache.*` keys.
:::

### Unified Property Model (4.1.x+) {#meta-cache-unified-model}

Each engine's cache entry uses a unified configuration key format: `meta.cache.<engine>.<entry>.{enable,ttl-second,capacity}`.

| Property | Example | Meaning |
|---|---|---|
| `enable` | `true/false` | Whether to enable this cache module. |
| `ttl-second` | `600`, `0`, `-1` | `0` means disable cache (takes effect immediately, can be used to see the latest data); `-1` means never expire; other positive integers mean TTL in seconds based on access time. |
| `capacity` | `10000` | Maximum number of cache entries (by count). `0` means disable. |

**Effective Logic:** The module cache only takes effect when `enable=true`, `ttl-second != 0`, and `capacity > 0`.

### Cache Modules {#meta-cache-unified-modules}

Iceberg Catalog includes the following cache modules:

| Module (`<entry>`) | Property Key Prefix | Cached Content and Impact |
|---|---|---|
| `schema` | `meta.cache.iceberg.schema.` | Caches table structure. Impact: Visibility of table column information. If disabled, the latest Schema is pulled for each query. |
| `table` | `meta.cache.iceberg.table.` | Caches Iceberg table metadata objects. Impact: Reduces Catalog/Metastore round-trips. |
| `view` | `meta.cache.iceberg.view.` | Caches Iceberg View metadata objects. |
| `manifest` | `meta.cache.iceberg.manifest.` | Caches manifest details. Impact: Reduces repeated manifest access overhead. Note: This module is disabled by default and must be enabled manually. |

### Legacy Parameter Mapping and Conversion {#meta-cache-mapping}

In version 4.1.x and later, unified keys are recommended. The following is the mapping between legacy Catalog properties and 4.1.x+ unified keys:

| Legacy Property Key | 4.1.x+ Unified Key | Description |
|---|---|---|
| `schema.cache.ttl-second` | `meta.cache.iceberg.schema.ttl-second` | Expiration time of table structure cache |

### Best Practices {#meta-cache-best-practices}

* **Real-time access to the latest data**: If you want each query to see the latest snapshots or schema changes for Iceberg tables, you can set the `ttl-second` for `schema` or `table` to `0`.
```sql
-- Disable table object cache to detect snapshot changes
ALTER CATALOG iceberg_ctl SET PROPERTIES ("meta.cache.iceberg.table.ttl-second" = "0");
```
* **Performance optimization**:
* Enabling manifest cache can significantly speed up query planning for large tables:
```sql
ALTER CATALOG iceberg_ctl SET PROPERTIES (
"meta.cache.iceberg.manifest.enable" = "true",
"meta.cache.iceberg.manifest.ttl-second" = "600"
);
```
* Changes via `ALTER CATALOG ... SET PROPERTIES` support hot-reload in Iceberg Catalog.

### Observability {#meta-cache-unified-observability}

Cache metrics can be observed through the `information_schema.catalog_meta_cache_statistics` system table:

```sql
SELECT catalog_name, engine_name, entry_name,
effective_enabled, ttl_second, capacity,
estimated_size, hit_rate, load_failure_count, last_error
FROM information_schema.catalog_meta_cache_statistics
WHERE catalog_name = 'iceberg_ctl' AND engine_name = 'iceberg'
ORDER BY entry_name;
```

See the documentation for this system table: [catalog_meta_cache_statistics](../../admin-manual/system-tables/information_schema/catalog_meta_cache_statistics.md).

### Supported Iceberg Versions

| Doris Version | Iceberg SDK Version |
Expand Down
Loading