Goal:
YARN parameters like mapreduce.map.cpu.vcores/mapreduce.reduce.cpu.vcores can not hard limit the CPU utilization.This article explains how to configure YARN to use Control Groups (Cgroups) when you want to limit and monitor the CPU resources that are available to process YARN containers on a node.
Env:
MapR 5.1 with Hadoop 2.7.0CentOS 6.5 or CentOS 7.1
Solution:
For example:Each node has 4 CPU cores, and I want all YARN applications to only use 1 CPU(25%).
1.a Install libcgroup package on all nodes(For CentOS 6 only)
yum install libcgroupAnd make sure the service cgconfig is running:
# service cgconfig status RunningYou can see below virtual file system /cgroup contains all subsystems:
# ls -altr /cgroup/ total 8 dr-xr-xr-x. 27 root root 4096 Mar 10 10:23 .. drwxr-xr-x 3 root root 0 Mar 10 10:23 cpuset drwxr-xr-x 3 root root 0 Mar 10 10:23 cpu drwxr-xr-x 3 root root 0 Mar 10 10:23 cpuacct drwxr-xr-x 3 root root 0 Mar 10 10:23 memory drwxr-xr-x 3 root root 0 Mar 10 10:23 devices drwxr-xr-x 3 root root 0 Mar 10 10:23 freezer drwxr-xr-x 2 root root 0 Mar 10 10:23 net_cls drwxr-xr-x 3 root root 0 Mar 10 10:23 blkio drwxr-xr-x. 10 root root 4096 Jul 12 12:27 .
1.b Umount the cpu cgroups (For CentOS 7 only):
For CentOS 7, we do not need to install libcgroup, because cgroups should be mounted already:$ mount -v|grep -i cgr tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,seclabel,mode=755) cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd) cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset) cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpuacct,cpu) cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory) cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices) cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer) cgroup on /sys/fs/cgroup/net_cls type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls) cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio) cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event) cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)However we need to umount the cpu cgroups, otherwise we could not mount cpu cgroup for "/mycgroup" in the following steps:
umount /sys/fs/cgroup/cpu,cpuacctAfter that, below steps are the same for CentOS 7.
2. Create a mount point for Cgroup
mkdir -p /mycgroup/cpu chown mapr:mapr /mycgroup/cpuNote: The reason why we need to change the ownership to "mapr" is because RM and NM are started by "mapr" user in my lab.
3. Configure YARN configurations on all RM and NM nodes
For example, put below in yarn-site.xml<property> <name>yarn.nodemanager.container-executor.class</name> <value>org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor</value> </property> <property> <name>yarn.nodemanager.linux-container-executor.group</name> <value>mapr</value> </property> <property> <name>yarn.nodemanager.linux-container-executor.resources-handler.class</name> <value>org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler</value> </property> <property> <name>yarn.nodemanager.linux-container-executor.cgroups.hierarchy</name> <value>/hadoop-yarn</value> </property> <property> <name>yarn.nodemanager.linux-container-executor.cgroups.mount</name> <value>true</value> </property> <property> <name>yarn.nodemanager.linux-container-executor.cgroups.mount-path</name> <value>/mycgroup</value> </property> <property> <name>yarn.nodemanager.resource.percentage-physical-cpu-limit</name> <value>25</value> </property> <property> <name>yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage</name> <value>true</value> </property>Note:
a. yarn.nodemanager.linux-container-executor.group should be set to the same as yarn.nodemanager.linux-container-executor.group setting in container-executor.cfg. Default value is "mapr".
b. yarn.nodemanager.resource.percentage-physical-cpu-limit is set to 25 in this example which means all YARN jobs/containers can only use 25% of total CPU(s) on this node. Since this node has 4 CPU cores, so 1 CPU core is the hard limit for YARN.
3. Restart RM and NM
maprcli node services -name resourcemanager -action restart -filter csvc=="resourcemanager" maprcli node services -name nodemanager -action restart -filter csvc=="nodemanager"
4. Test by running a large job
hadoop jar /opt/mapr/hadoop/hadoop-0.20.2/hadoop-0.20.2-dev-examples.jar pi 10 50000000000000Monitor the total CPU utilization of all YARN containers on a single node using "top" command:
Tasks: 201 total, 1 running, 200 sleeping, 0 stopped, 0 zombie Cpu(s): 25.9%us, 0.4%sy, 0.0%ni, 73.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 8062400k total, 6902780k used, 1159620k free, 151708k buffers Swap: 8208376k total, 201404k used, 8006972k free, 432344k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 24255 mapr 20 0 2711m 247m 38m S 50.2 3.1 0:09.50 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.101-3.b13.el6_8.x86_64/jre/bin/java 24256 mapr 20 0 2712m 246m 38m S 49.5 3.1 0:09.42 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.101-3.b13.el6_8.x86_64/jre/bin/java
Here are 2 evidences to prove that cgroups is taking effect:
a. The total CPU utilization should be around 25% since there are no other CPU-consuming processes running except YARN.
b. As we know, total CPU utilization for YARN is limit to 25% -- 1 CPU core in this case. There are totally 2 YARN containers running, so each of them gets 0.5 CPU core as shown above.
Refer:
http://maprdocs.mapr.com/home/AdministratorGuide/c-yarn-c-groups.htmlhttp://www.linux-admins.net/2012/07/setting-up-linux-cgroups-control-groups.html
Can I limit the usage of cpu cores for spark jobs using this yarn cgroups? Our cluster is a multi-tenant cluster where we see occasionally users consuming 100% of CPU. As we don't have a final property that can be set on spark end to control the no.of executors cores to be used, we are seeing this issue.
ReplyDeleteThe hardlimit for CPU usage is at NM level, not at YARN job level.
DeleteOne thing you can try is to :
1. Use Node Labels to mark certain mount of NMs only used for Spark on YARN jobs.
2. Then limit the NM level CPU utilization for those NMs.