-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Add mon groups for resctrl. #2523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -158,32 +158,38 @@ init process will block waiting for the parent to finish setup. | |
| ### IntelRdt | ||
|
|
||
| Intel platforms with new Xeon CPU support Resource Director Technology (RDT). | ||
| Cache Allocation Technology (CAT) and Memory Bandwidth Allocation (MBA) are | ||
| two sub-features of RDT. | ||
| Cache Allocation Technology (CAT), Cache Monitoring Technology (CMT), | ||
| Memory Bandwidth Allocation (MBA) and Memory Bandwidth Monitoring (MBM) are | ||
| four sub-features of RDT. | ||
|
|
||
| Cache Allocation Technology (CAT) provides a way for the software to restrict | ||
| cache allocation to a defined 'subset' of L3 cache which may be overlapping | ||
| with other 'subsets'. The different subsets are identified by class of | ||
| service (CLOS) and each CLOS has a capacity bitmask (CBM). | ||
|
|
||
| Cache Monitoring Technology (CMT) supports monitoring of the last-level cache (LLC) occupancy | ||
| for each running thread simultaneously. | ||
|
|
||
| Memory Bandwidth Allocation (MBA) provides indirect and approximate throttle | ||
| over memory bandwidth for the software. A user controls the resource by | ||
| indicating the percentage of maximum memory bandwidth or memory bandwidth limit | ||
| in MBps unit if MBA Software Controller is enabled. | ||
| indicating the percentage of maximum memory bandwidth or memory bandwidth | ||
| limit in MBps unit if MBA Software Controller is enabled. | ||
|
|
||
| Memory Bandwidth Monitoring (MBM) supports monitoring of total and local memory bandwidth | ||
| for each running thread simultaneously. | ||
|
|
||
| It can be used to handle L3 cache and memory bandwidth resources allocation | ||
| for containers if hardware and kernel support Intel RDT CAT and MBA features. | ||
| More details about Intel RDT CAT and MBA can be found in the section 17.18 and 17.19, Volume 3 | ||
| of Intel Software Developer Manual: | ||
| https://software.intel.com/en-us/articles/intel-sdm | ||
|
|
||
| In Linux 4.10 kernel or newer, the interface is defined and exposed via | ||
| About Intel RDT kernel interface: | ||
| In Linux 4.14 kernel or newer, the interface is defined and exposed via | ||
| "resource control" filesystem, which is a "cgroup-like" interface. | ||
|
|
||
| Comparing with cgroups, it has similar process management lifecycle and | ||
| interfaces in a container. But unlike cgroups' hierarchy, it has single level | ||
| filesystem layout. | ||
|
|
||
| CAT and MBA features are introduced in Linux 4.10 and 4.12 kernel via | ||
| "resource control" filesystem. | ||
|
|
||
| Intel RDT "resource control" filesystem hierarchy: | ||
| ``` | ||
|
Creatone marked this conversation as resolved.
|
||
| mount -t resctrl resctrl /sys/fs/resctrl | ||
|
|
@@ -194,25 +200,46 @@ tree /sys/fs/resctrl | |
| | | |-- cbm_mask | ||
| | | |-- min_cbm_bits | ||
| | | |-- num_closids | ||
| | |-- L3_MON | ||
| | | |-- max_threshold_occupancy | ||
| | | |-- mon_features | ||
|
Creatone marked this conversation as resolved.
|
||
| | | |-- num_rmids | ||
| | |-- MB | ||
| | |-- bandwidth_gran | ||
| | |-- delay_linear | ||
| | |-- min_bandwidth | ||
| | |-- num_closids | ||
| |-- ... | ||
| |-- mon_groups | ||
| |-- <rmid> | ||
| |-- ... | ||
| |-- mon_data | ||
| |-- mon_L3_00 | ||
| |-- llc_occupancy | ||
| |-- mbm_local_bytes | ||
| |-- mbm_total_bytes | ||
| |-- ... | ||
|
Creatone marked this conversation as resolved.
|
||
| |-- tasks | ||
| |-- schemata | ||
| |-- tasks | ||
| |-- <container_id> | ||
| |-- <clos> | ||
| |-- ... | ||
| |-- schemata | ||
| |-- mon_data | ||
| |-- mon_L3_00 | ||
| |-- llc_occupancy | ||
| |-- mbm_local_bytes | ||
| |-- mbm_total_bytes | ||
| |-- ... | ||
| |-- tasks | ||
|
Creatone marked this conversation as resolved.
|
||
| |-- schemata | ||
| |-- ... | ||
| ``` | ||
|
Creatone marked this conversation as resolved.
|
||
|
|
||
| For runc, we can make use of `tasks` and `schemata` configuration for L3 | ||
| cache and memory bandwidth resources constraints. | ||
| cache and memory bandwidth resources constraints, `mon_data` directory for | ||
| CMT and MBM statistics. | ||
|
|
||
| The file `tasks` has a list of tasks that belongs to this group (e.g., | ||
| <container_id>" group). Tasks can be added to a group by writing the task ID | ||
| "<clos>" group). Tasks can be added to a group by writing the task ID | ||
| to the "tasks" file (which will automatically remove them from the previous | ||
| group to which they belonged). New tasks created by fork(2) and clone(2) are | ||
| added to the same group as their parent. | ||
|
|
@@ -224,7 +251,7 @@ L3 cache schema: | |
| It has allocation bitmasks/values for L3 cache on each socket, which | ||
| contains L3 cache id and capacity bitmask (CBM). | ||
| ``` | ||
|
Creatone marked this conversation as resolved.
|
||
| Format: "L3:<cache_id0>=<cbm0>;<cache_id1>=<cbm1>;..." | ||
| Format: "L3:<cache_id0>=<cbm0>;<cache_id1>=<cbm1>;..." | ||
| ``` | ||
|
Creatone marked this conversation as resolved.
|
||
| For example, on a two-socket machine, the schema line could be "L3:0=ff;1=c0" | ||
| which means L3 cache id 0's CBM is 0xff, and L3 cache id 1's CBM is 0xc0. | ||
|
|
@@ -240,7 +267,7 @@ Memory bandwidth schema: | |
| It has allocation values for memory bandwidth on each socket, which contains | ||
| L3 cache id and memory bandwidth. | ||
| ``` | ||
|
Creatone marked this conversation as resolved.
|
||
| Format: "MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..." | ||
| Format: "MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..." | ||
| ``` | ||
|
Creatone marked this conversation as resolved.
|
||
| For example, on a two-socket machine, the schema line could be "MB:0=20;1=70" | ||
|
|
||
|
|
@@ -251,8 +278,10 @@ that is allocated is also dependent on the CPU model and can be looked up at | |
| min_bw + N * bw_gran. Intermediate values are rounded to the next control | ||
| step available on the hardware. | ||
|
|
||
| If MBA Software Controller is enabled through mount option "-o mba_MBps" | ||
| If MBA Software Controller is enabled through mount option "-o mba_MBps": | ||
| ``` | ||
| mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl | ||
|
Creatone marked this conversation as resolved.
|
||
| ``` | ||
| We could specify memory bandwidth in "MBps" (Mega Bytes per second) unit | ||
| instead of "percentages". The kernel underneath would use a software feedback | ||
| mechanism or a "Software Controller" which reads the actual bandwidth using | ||
|
|
@@ -263,11 +292,12 @@ For example, on a two-socket machine, the schema line could be | |
| "MB:0=5000;1=7000" which means 5000 MBps memory bandwidth limit on socket 0 | ||
| and 7000 MBps memory bandwidth limit on socket 1. | ||
|
|
||
| For more information about Intel RDT kernel interface: | ||
| For more information about Intel RDT kernel interface: | ||
| https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt | ||
|
|
||
| ``` | ||
|
|
||
| An example for runc: | ||
| ``` | ||
| Consider a two-socket machine with two L3 caches where the default CBM is | ||
| 0x7ff and the max CBM length is 11 bits, and minimum memory bandwidth of 10% | ||
| with a memory bandwidth granularity of 10%. | ||
|
|
@@ -281,7 +311,17 @@ maximum memory bandwidth of 20% on socket 0 and 70% on socket 1. | |
| "closID": "guaranteed_group", | ||
|
Creatone marked this conversation as resolved.
|
||
| "l3CacheSchema": "L3:0=7f0;1=1f", | ||
| "memBwSchema": "MB:0=20;1=70" | ||
| } | ||
| } | ||
| } | ||
| ``` | ||
| Another example: | ||
| ``` | ||
| We only want to monitor memory bandwidth and llc occupancy. | ||
| "linux": { | ||
| "intelRdt": { | ||
| "enableMBM": true, | ||
| "enableCMT": true | ||
| } | ||
|
Comment on lines
+311
to
+324
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In my opinion, this config interface may make users confused. The users don't know what it really implies: "monitoring only" or "empty allocation config"? What do you think if we could add a flag into the config linux.intelRdt to indicate either CTRL_MON group or MON group will be created for the container? Something like:
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I left a related comment. |
||
| } | ||
| ``` | ||
|
|
||
|
|
||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,42 @@ | ||
| package configs_test | ||
|
|
||
| import ( | ||
| "encoding/json" | ||
| "reflect" | ||
| "testing" | ||
|
|
||
| "github.com/opencontainers/runc/libcontainer/configs" | ||
| ) | ||
|
|
||
| func TestUnmarshalIntelRDT(t *testing.T) { | ||
| testCases := []struct { | ||
| JSON string | ||
| Expected configs.IntelRdt | ||
| }{ | ||
| { | ||
| "{\"enableMBM\": true}", | ||
| configs.IntelRdt{EnableMBM: true, EnableCMT: false}, | ||
| }, | ||
| { | ||
| "{\"enableMBM\": true,\"enableCMT\": false}", | ||
| configs.IntelRdt{EnableMBM: true, EnableCMT: false}, | ||
| }, | ||
| { | ||
| "{\"enableMBM\": false,\"enableCMT\": true}", | ||
| configs.IntelRdt{EnableMBM: false, EnableCMT: true}, | ||
| }, | ||
| } | ||
|
|
||
| for _, tc := range testCases { | ||
| got := configs.IntelRdt{} | ||
|
|
||
| err := json.Unmarshal([]byte(tc.JSON), &got) | ||
| if err != nil { | ||
| t.Fatal(err) | ||
| } | ||
|
|
||
| if !reflect.DeepEqual(tc.Expected, got) { | ||
| t.Errorf("expected unmarshalled IntelRDT config %+v, got %+v", tc.Expected, got) | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there not any change? Only adjust wrapping width?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nearly entire
SPEC.mdis copy from huge comment inlibcontainer/intelrdt/intelrdt.go. So I want to unify this.This is just the adjusted text.