diff --git a/docs/content.zh/docs/deployment/tasks-scheduling/_index.md b/docs/content.zh/docs/deployment/tasks-scheduling/_index.md index 06fbdd801c4a3..c166d764c2101 100644 --- a/docs/content.zh/docs/deployment/tasks-scheduling/_index.md +++ b/docs/content.zh/docs/deployment/tasks-scheduling/_index.md @@ -1,5 +1,5 @@ --- -title: Tasks Scheduling +title: Task 调度 bookCollapseSection: true weight: 9 --- diff --git a/docs/content.zh/docs/deployment/tasks-scheduling/balanced_tasks_scheduling.md b/docs/content.zh/docs/deployment/tasks-scheduling/balanced_tasks_scheduling.md index 7006cc56e0c88..cadfc32fe35e9 100644 --- a/docs/content.zh/docs/deployment/tasks-scheduling/balanced_tasks_scheduling.md +++ b/docs/content.zh/docs/deployment/tasks-scheduling/balanced_tasks_scheduling.md @@ -1,5 +1,5 @@ --- -title: Balanced Tasks Scheduling +title: Task 均衡调度 weight: 5 type: docs @@ -23,101 +23,73 @@ specific language governing permissions and limitations under the License. --> -# Balanced Tasks Scheduling +# Task 均衡调度 -This page describes the background and principle of balanced tasks scheduling, -how to use it when running streaming jobs. +本文档描述了 Task 均衡调度的背景和原理,以及如何在流处理作业中使用它。 -## Background +## 背景 -When the parallelism of all vertices within a Flink streaming job is inconsistent, -the [default strategy]({{< ref "docs/deployment/config" >}}#taskmanager-load-balance-mode) -of Flink to deploy tasks sometimes leads some `TaskManagers` have more tasks while others have fewer tasks, -resulting in excessive resource utilization at some `TaskManagers` -that contain more tasks and becoming a bottleneck for the entire job processing. +当 Flink 流处理作业中所有顶点的并行度不一致时,Flink 部署 Task 的[默认策略]({{< ref "docs/deployment/config" >}}#taskmanager-load-balance-mode)有时会导致某些 TaskManager 部署的 Task 较多,而其他 TaskManager 部署的 Task 较少,从而造成部署 Task 较多的 TaskManager 资源使用过度,成为整个作业处理的瓶颈。 -{{< img src="/fig/deployments/tasks-scheduling/tasks_scheduling_skew_case.svg" alt="The Skew Case of Tasks Scheduling" class="offset" width="50%" >}} +{{< img src="/fig/deployments/tasks-scheduling/tasks_scheduling_skew_case.svg" alt="任务调度倾斜示例" class="offset" width="50%" >}} -As shown in figure (a), given a Flink job comprising two vertices, `JobVertex-A (JV-A)` and `JobVertex-B (JV-B)`, -with parallelism degrees of `6` and `3` respectively, -and both vertices sharing the same slot sharing group. -Under the default tasks scheduling strategy, as illustrated in figure (b), -the distribution of tasks across `TaskManagers` may result in significant disparities in task load. -Specifically, the `TaskManager`s with the highest number of tasks may host `4` tasks, -while the one with the lowest load may have only `2` tasks. -Consequently, the `TaskManager`s bearing 4 tasks is prone to become a performance bottleneck for the entire job. +如图(a)所示,假设 Flink 作业包含两个顶点:`JobVertex-A(JV-A)`和`JobVertex-B(JV-B)`,并行度分别为 `6` 和 `3`,且两个顶点属于同一个 Slot 共享组。在默认 Task 调度策略下,如图(b)所示,Task 在 TaskManager 之间的分布可能导致 Task 负载显著不均。具体来说,Task 数量最多的 TaskManager 可能承载 `4` 个任务,而负载最低的 TaskManager 可能只有 `2` 个任务。因此,承载 4 个 Task 的 TaskManager 容易成为整个作业的性能瓶颈。 -Therefore, Flink provides the task-quantity-based balanced tasks scheduling capability. -Within the job's resource view, it aims to ensure that the number of tasks -scheduled to each `TaskManager` as close as possible to, -thereby improving the resource usage skew among `TaskManagers`. +因此,Flink 提供了基于 Task 数量的 Task 均衡调度能力。在作业的资源视图中,它旨在确保分配给每个 TaskManager 的任务数量尽可能接近,从而改善 TaskManager 之间的资源使用倾斜。 -Note The presence of inconsistent parallelism does not imply that this strategy must be used, as this is not always the case in practice. +注意 并非并行度不一致就必须使用此策略,需根据实际情况判断。 -## Principle +## 原理 -The task-quantity-based load balancing tasks scheduling strategy completes the assignment of tasks to `TaskManagers` in two phases: -- The tasks-to-slots assignment phase -- The slots-to-TaskManagers assignment phase +基于 Task 数量的负载均衡调度策略将 Task 分配给 TaskManager 的过程分为两个阶段: +- Task 到 Slot 的分配阶段 +- Slot 到 TaskManager 的分配阶段 -This section will use two examples to illustrate the simplified process and principle of -how the task-quantity-based tasks scheduling strategy handles the assignments in these two phases. +本节将通过两个示例对基于 Task 数量的调度策略在上述两个阶段中处理 Task 分配的简化流程及其原理加以说明。 -### The tasks-to-slots assignment phase +### Task 到 Slot 的分配阶段 -Taking the job shown in figure (c) as an example, it contains five job vertices with parallelism degrees of `1`, `4`, `4`, `2`, and `3`, respectively. -All five job vertices belong to the default slot sharing group. +以图(c)所示的作业为例,它包含五个作业顶点,并行度分别为 `1`、`4`、`4`、`2` 和 `3`。这五个作业顶点都属于默认 Slot 共享组。 -{{< img src="/fig/deployments/tasks-scheduling/tasks_to_slots_allocation_principle.svg" alt="The Tasks To Slots Allocation Principle Demo" class="offset" width="65%" >}} +{{< img src="/fig/deployments/tasks-scheduling/tasks_to_slots_allocation_principle.svg" alt="Task 到 Slot 分配原理示例" class="offset" width="65%" >}} -During the tasks-to-slots assignment phase, this tasks scheduling strategy: -- First directly assigns the tasks of the vertices with the highest parallelism to the `i-th` slot. +在 Task 到 Slot 的分配阶段,该调度策略: +- 首先直接将并行度最高顶点的 Task 分配到第 `i` 个 Slot。 - That is, task `JV-Bi` is assigned directly to `sloti`, and task `JV-Ci` is assigned directly to `sloti`. + 即将任务 `JV-Bi` 直接分配到 `sloti`,将任务 `JV-Ci` 也直接分配到 `sloti`。 -- Next, for tasks belonging to job vertices with sub-maximal parallelism, they are assigned in a round-robin fashion across the slots within the current -slot sharing group until all tasks are allocated. +- 接下来,对于属于次高并行度作业顶点的 Task ,以轮询方式分配给当前 Slot 共享组内的 Slot,直到所有 Task 分配完毕。 -As shown in figure (e), under the task-quantity-based assignment strategy, the range (max-min difference) of the number of tasks per slot is `1`, -which is better than the range of `3` under the default strategy shown in figure (d). +如图(e)所示,在基于 Task 数量的分配策略下,每个 Slot 的任务数量范围(最大值与最小值之差)为 `1`,这优于图(d)所示默认策略下的范围 `3`。 -Thus, this ensures a more balanced distribution of the number of tasks across slots. +因此,这确保了 Slot 之间 Task 数量更加均衡地分布。 -### The slots-to-TaskManagers assignment phase +### Slot 到 TaskManager 的分配阶段 -As shown in figure (f), given a Flink job comprising two vertices, `JV-A` and `JV-B`, with parallelism of `6` and `3` respectively, -and both vertices sharing the same slot sharing group. +如图(f)所示,假设 Flink 作业包含两个顶点 `JV-A` 和 `JV-B`,并行度分别为 `6` 和 `3`,且两个顶点属于同一个 Slot 共享组。 -{{< img src="/fig/deployments/tasks-scheduling/slots_to_taskmanagers_allocation_principle.svg" alt="The Slots to TaskManagers Allocation Principle Demo" class="offset" width="75%" >}} +{{< img src="/fig/deployments/tasks-scheduling/slots_to_taskmanagers_allocation_principle.svg" alt="Slot 到 TaskManager 分配原理示例" class="offset" width="75%" >}} -The assignment result after the first phase is shown in figure (g), -where `Slot0`, `Slot1`, and `Slot2` each contain `2` tasks, while the remaining slots contain `1` task each. +第一阶段的分配结果如图(g)所示,其中 `Slot0`、`Slot1` 和 `Slot2` 各包含 `2` 个 Task ,其余 Slot 各包含 `1` 个 Task 。 -Subsequently: -- The strategy submits all slot requests and waits until all slot resources required for the current job are ready. +随后: +- 策略提交所有 Slot 请求,并等待当前作业所需的所有 Slot 资源准备就绪。 -Once the slot resources are ready: -- The strategy then sorts all slot requests in descending order based on the number of tasks contained in each request. -Afterward, it sequentially assigns each slot request to the `TaskManager` with the smallest current tasks loading. -This process continues until all slot requests have been allocated. +Slot 资源准备就绪后: +- 首先根据每个请求包含的 Task 数量按降序对所有 Slot 请求进行排序。然后,依次将每个 Slot 请求分配给当前 Task 负载最小的 TaskManager。此过程持续进行,直到所有 Slot 请求分配完毕。 -The final assignment result is shown in figure (i), where each `TaskManager` ends up with exactly `3` tasks, -resulting in a task count difference of `0` between `TaskManagers`. In contrast, the scheduling result under the default strategy, -shown in figure (h), has a task count difference of `2` between `TaskManagers`. +最终分配结果如图(i)所示,每个 TaskManager 最终恰好承载 `3` 个 Task ,TaskManager 之间的 Task 数量差异为 `0`。相比之下,默认策略下的调度结果如图(h)所示,TaskManager 之间的 Task 数量差异为 `2`。 -Therefore, if you are seeing performance bottlenecks of the sort described above, -then using this load balancing tasks scheduling strategy can improve performance. -Be aware that you should not use this strategy, if you are not seeing these bottlenecks, -as you may experience performance degradation. +因此,如果你遇到上述描述的性能瓶颈问题,使用这种 Task 负载均衡调度策略可以改善性能。请注意,如果你没有遇到这些瓶颈问题,则不应使用此策略,因为这可能导致性能下降。 -## Usage +## 使用方法 -You can enable balanced tasks scheduling through the following configuration item: +你可以通过以下配置项启用 Task 均衡调度: - `taskmanager.load-balance.mode`: `tasks` -## More details +## 更多详情 -See the FLIP-370 for more details. +更多详细信息请参阅 [FLIP-370](https://cwiki.apache.org/confluence/x/U56zDw)。 {{< top >}}