From 08f1198e4d20729e812743881228e1e24a32a63b Mon Sep 17 00:00:00 2001 From: "Christopher L. Shannon" Date: Fri, 21 Mar 2025 11:17:26 -0400 Subject: [PATCH 1/3] Add updates for merge changes in Accumulo 4.0 --- _docs-2/administration/merging.md | 89 +++++++++++++++++++++++++++++++ 1 file changed, 89 insertions(+) create mode 100644 _docs-2/administration/merging.md diff --git a/_docs-2/administration/merging.md b/_docs-2/administration/merging.md new file mode 100644 index 000000000..8ba330260 --- /dev/null +++ b/_docs-2/administration/merging.md @@ -0,0 +1,89 @@ +--- +title: Merging +category: administration +order: 6 +--- + +Accumulo 4.0 has improved tablet merging support, including: + +* Merging no longer requires "chop" compactions. +* Merging is now managed by FATE +* Accumulo now supports auto merging of tablets. + +## New Merge Design + +Merge used to be a slow operation because tablets had to be compacted before merging. This was necessary because Rfiles may contain data outside the tablet range and this data needed to be removed. +The updated merge algorithm works by "fencing" the RFiles in a tablet by the valid range. This operation is a fast metadata operation and the valid range of a file is now inserted into the file column. +Scans will only return data in the specified range so compactions are no longer required. The normal system compaction process will eventually remove the data outside the range. + +## Auto Merge + +Accumulo supports auto merging tablets that are below a certain threshold, similar to splitting tablets that are above a threshold. +The manager runs a task that periodically looks for ranges of tablets that can be merged. For a range of tablets to be eligible to be merged the following must be true: + +1. All tablets in the range must be marked as eligible to be merged using the per tablet `TabletMergeability` setting. (more below) +2. The combined files must be less than `table.merge.file.max` +3. The total size must be less than `table.mergeability.threshold`. This is defined as the combined size of RFiles as a percentage of the split threshold + +## Configuration + +The following properties are used to configure merging:. + +* `manager.tablet.mergeability.interval` -Time to wait between scanning tables to identify ranges of tablets that can be auto-merged (default is `24h`) +* `table.mergeability.threshold` - A range of tablets are eligible for automatic merging until the combined size of RFiles reaches this percentage of the split threshold. (default is `.25`) +* `table.merge.file.max` - The maximum number of files that a merge operation will process (default is `10000`) + +## Tablet Mergeability + +Each tablet can be marked individually with a value to indicate if/when it can be auto merged by the system. +The following are the possible settings: + +* `NEVER` - Tablets are never eligible for automatic merging +* `ALWAYS` - Tablets are always eligible for automatic merging +* `DELAY` - Tablets are eligible to be merged after the configured delay, relative to the Manager time. + +### Tablet Mergeability Defaults + +* System generated splits - Defaults to `ALWAYS` mergeable. Any system created tablets are always eligible to be merged. +* User added splits - Defaults to `NEVER` mergeable if not specified. + +### Configuring Tablets with the API + +#### Adding/updating splits + +There is a new `putSplits()` method that takes a map of splits and mergeability settings and will either create those splits or update existing with the settings. + +```java +// Adding splits or updating existing splits +String tableName = "table"; +SortedMap splits = new TreeMap<>(); +// Mark each split with its mergeability setting +splits.put(new Text(String.format("%09d", 333)), TabletMergeability.always()); +splits.put(new Text(String.format("%09d", 444)), TabletMergeability.always()); +splits.put(new Text(String.format("%09d", 666)), TabletMergeability.never()); +splits.put(new Text(String.format("%09d", 999)), + TabletMergeability.after(Duration.ofDays(1))); +// add or update splits +client.tableOperations().putSplits(String tableName, splits); +``` + +`TabletInformation` contains information describing the current mergeability state inside `TabletMergeAbilityInfo`. + +#### Listing TabletMergeabilityInfo +```java +try (Stream tabletInfo = + client.tableOperations().getTabletInformation(table, new Range())) { + tabletInfo.forEach(ti -> { + TabletMergeabilityInfo tmi = ti.getTabletMergeabilityInfo(); + // Some examples of the API usage + // Gets the optional delay that is configured + Optional delay = tmi.getDelay(); + // If the tablet is currently eligilbe for merging + boolean mergeable = tmi.isMergeable(); + // Optional estimated elapsed time since the delay was set + Optional elapsed = tmi.getElapsed(); + // Optional estimated remaining time before the tablet is eligible for merging + Optional remaining = tmi.getRemaining(); + }); +} +``` From 9cfd876486fff224e6db411f05c8c7304c25dfa0 Mon Sep 17 00:00:00 2001 From: "Christopher L. Shannon" Date: Thu, 27 Mar 2025 15:26:11 -0400 Subject: [PATCH 2/3] Updates based on feedback --- _docs-2/administration/merging.md | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/_docs-2/administration/merging.md b/_docs-2/administration/merging.md index 8ba330260..798090045 100644 --- a/_docs-2/administration/merging.md +++ b/_docs-2/administration/merging.md @@ -29,9 +29,9 @@ The manager runs a task that periodically looks for ranges of tablets that can b The following properties are used to configure merging:. -* `manager.tablet.mergeability.interval` -Time to wait between scanning tables to identify ranges of tablets that can be auto-merged (default is `24h`) +* `manager.tablet.mergeability.interval` - Time to wait between scanning tables to identify ranges of tablets that can be auto-merged (default is `24h`) * `table.mergeability.threshold` - A range of tablets are eligible for automatic merging until the combined size of RFiles reaches this percentage of the split threshold. (default is `.25`) -* `table.merge.file.max` - The maximum number of files that a merge operation will process (default is `10000`) +* `table.merge.file.max` - The maximum number of files that a merge operation will process (default is `10000`). This property also applies to merges through the API as well. ## Tablet Mergeability @@ -47,6 +47,11 @@ The following are the possible settings: * System generated splits - Defaults to `ALWAYS` mergeable. Any system created tablets are always eligible to be merged. * User added splits - Defaults to `NEVER` mergeable if not specified. +### Upgrade + +During upgrade all existing tablets will be marked with a default of `NEVER` for the TabletMergeability column to preserve +the previous behavior. Only new tablets that are generated by system splits will be marked as `ALWAYS`. + ### Configuring Tablets with the API #### Adding/updating splits From 0a42f55998fe0b7c2964cfc1c7a95452ef1e0b4f Mon Sep 17 00:00:00 2001 From: Dave Marion Date: Mon, 28 Apr 2025 19:30:11 +0000 Subject: [PATCH 3/3] Moved new merging page to 4.x docs --- {_docs-2 => _docs-4}/administration/merging.md | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename {_docs-2 => _docs-4}/administration/merging.md (100%) diff --git a/_docs-2/administration/merging.md b/_docs-4/administration/merging.md similarity index 100% rename from _docs-2/administration/merging.md rename to _docs-4/administration/merging.md