How to use Cgroups with YARN to limit the CPU utilization

Goal:

YARN parameters like mapreduce.map.cpu.vcores/mapreduce.reduce.cpu.vcores can not hard limit the CPU utilization.
This article explains how to configure YARN to use Control Groups (Cgroups) when you want to limit and monitor the CPU resources that are available to process YARN containers on a node.

Env:

MapR 5.1 with Hadoop 2.7.0
CentOS 6.5 or CentOS 7.1

Solution:

For example:
Each node has 4 CPU cores, and I want all YARN applications to only use 1 CPU(25%).

1.a Install libcgroup package on all nodes(For CentOS 6 only)

yum install libcgroup

And make sure the service cgconfig is running:

# service cgconfig status
Running

You can see below virtual file system /cgroup contains all subsystems:

# ls -altr /cgroup/
total 8
dr-xr-xr-x. 27 root root 4096 Mar 10 10:23 ..
drwxr-xr-x   3 root root    0 Mar 10 10:23 cpuset
drwxr-xr-x   3 root root    0 Mar 10 10:23 cpu
drwxr-xr-x   3 root root    0 Mar 10 10:23 cpuacct
drwxr-xr-x   3 root root    0 Mar 10 10:23 memory
drwxr-xr-x   3 root root    0 Mar 10 10:23 devices
drwxr-xr-x   3 root root    0 Mar 10 10:23 freezer
drwxr-xr-x   2 root root    0 Mar 10 10:23 net_cls
drwxr-xr-x   3 root root    0 Mar 10 10:23 blkio
drwxr-xr-x. 10 root root 4096 Jul 12 12:27 .

1.b Umount the cpu cgroups (For CentOS 7 only):

For CentOS 7, we do not need to install libcgroup, because cgroups should be mounted already:

$ mount -v|grep -i cgr
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,seclabel,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpuacct,cpu)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/net_cls type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)

However we need to umount the cpu cgroups, otherwise we could not mount cpu cgroup for "/mycgroup" in the following steps:

umount /sys/fs/cgroup/cpu,cpuacct

After that, below steps are the same for CentOS 7.

2. Create a mount point for Cgroup

mkdir -p /mycgroup/cpu
chown mapr:mapr /mycgroup/cpu

Note: The reason why we need to change the ownership to "mapr" is because RM and NM are started by "mapr" user in my lab.

3. Configure YARN configurations on all RM and NM nodes

For example, put below in yarn-site.xml

<property>
 <name>yarn.nodemanager.container-executor.class</name>
 <value>org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor</value>
</property>

<property>
 <name>yarn.nodemanager.linux-container-executor.group</name>
 <value>mapr</value>
</property>

<property>
 <name>yarn.nodemanager.linux-container-executor.resources-handler.class</name>
 <value>org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler</value>
</property>

<property>
 <name>yarn.nodemanager.linux-container-executor.cgroups.hierarchy</name>
 <value>/hadoop-yarn</value>
</property>

<property>
 <name>yarn.nodemanager.linux-container-executor.cgroups.mount</name>
 <value>true</value>
</property>

<property>
 <name>yarn.nodemanager.linux-container-executor.cgroups.mount-path</name>
 <value>/mycgroup</value>
</property>

<property>
 <name>yarn.nodemanager.resource.percentage-physical-cpu-limit</name>
 <value>25</value>
</property>

<property>
 <name>yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage</name>
 <value>true</value>
</property>

Note:
a. yarn.nodemanager.linux-container-executor.group should be set to the same as yarn.nodemanager.linux-container-executor.group setting in container-executor.cfg. Default value is "mapr".
b. yarn.nodemanager.resource.percentage-physical-cpu-limit is set to 25 in this example which means all YARN jobs/containers can only use 25% of total CPU(s) on this node. Since this node has 4 CPU cores, so 1 CPU core is the hard limit for YARN.

3. Restart RM and NM

maprcli node services -name resourcemanager -action restart -filter csvc=="resourcemanager"
maprcli node services -name nodemanager -action restart -filter csvc=="nodemanager"

4. Test by running a large job

hadoop jar /opt/mapr/hadoop/hadoop-0.20.2/hadoop-0.20.2-dev-examples.jar pi 10 50000000000000

Monitor the total CPU utilization of all YARN containers on a single node using "top" command:

Tasks: 201 total,   1 running, 200 sleeping,   0 stopped,   0 zombie
Cpu(s): 25.9%us,  0.4%sy,  0.0%ni, 73.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   8062400k total,  6902780k used,  1159620k free,   151708k buffers
Swap:  8208376k total,   201404k used,  8006972k free,   432344k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
24255 mapr      20   0 2711m 247m  38m S 50.2  3.1   0:09.50 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.101-3.b13.el6_8.x86_64/jre/bin/java 
24256 mapr      20   0 2712m 246m  38m S 49.5  3.1   0:09.42 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.101-3.b13.el6_8.x86_64/jre/bin/java

Here are 2 evidences to prove that cgroups is taking effect:
a. The total CPU utilization should be around 25% since there are no other CPU-consuming processes running except YARN.
b. As we know, total CPU utilization for YARN is limit to 25% -- 1 CPU core in this case. There are totally 2 YARN containers running, so each of them gets 0.5 CPU core as shown above.

Refer:

http://maprdocs.mapr.com/home/AdministratorGuide/c-yarn-c-groups.html
http://www.linux-admins.net/2012/07/setting-up-linux-cgroups-control-groups.html

2 comments:

KiranOctober 11, 2017 at 11:02 AM
Can I limit the usage of cpu cores for spark jobs using this yarn cgroups? Our cluster is a multi-tenant cluster where we see occasionally users consuming 100% of CPU. As we don't have a final property that can be set on spark end to control the no.of executors cores to be used, we are seeing this issue.

Thursday, September 8, 2016