mapred-site.xml
<property> <name>mapred.fairscheduler.smalljob.schedule.enable</name> <value>true</value> <description>Enable small job fast scheduling inside fair scheduler. TaskTrackers should reserve a slot called ephemeral slot which is used for smalljob if cluster is busy. </description> </property> <!-- Small job definition. If a job does not satisfy any of following limits it is not considered as a small job and will be moved out of small job pool. --> <property> <name>mapred.fairscheduler.smalljob.max.maps</name> <value>10</value> <description>Small job definition. Max number of maps allowed in small job. </description> </property> <property> <name>mapred.fairscheduler.smalljob.max.reducers</name> <value>10</value> <description>Small job definition. Max number of reducers allowed in small job. </description> </property> <property> <name>mapred.fairscheduler.smalljob.max.inputsize</name> <value>10737418240</value> <description>Small job definition. Max input size in bytes allowed for a small job. Default is 10GB. </description> </property> <property> <name>mapred.fairscheduler.smalljob.max.reducer.inputsize</name> <value>1073741824</value> <description>Small job definition. Max estimated input size for a reducer allowed in small job. Default is 1GB per reducer. </description> </property> <property> <name>mapred.cluster.ephemeral.tasks.memory.limit.mb</name> <value>200</value> <description>Small job definition. Max memory in mbytes reserved for an ephermal slot. Default is 200mb. This value must be same on JobTracker and TaskTracker nodes. </description> </property>
Secured TaskTracker
<property> <name>mapred.tasktracker.task-controller.config.overwrite</name> <value>true</value> <description>LinuxTaskController needs a config file set at HADOOP_HOME/conf/taskcontroller.cfg It has following parameters - mapred.local.dir = Local dir used by tasktracker, taken from mapred-site.xml. hadoop.log.dir = hadoop log dir, taken from system properties of the tasktracker process mapreduce.tasktracker.group = groups allowed to run tasktracker see 'mapreduce.tasktracker.group' min.user.id = Don't allow any user below this uid to launch a task. banned.users = users who are not allowed to launch any tasks. If set to true, TaskTracker will always overwrite config file with default values as min.user.id = -1(check disabled), banned.users = bin, mapreduce.tasktracker.group = root Set to false while using customized config and restart TaskTracker. </description> </property>To disallow root:
1.Edit mapred-site.xml and set mapred.tasktracker.task-controller.config.overwrite = false on all TaskTracker nodes.
2.Edit taskcontroller.cfg and set min.user.id=0 on all TaskTracker nodes.
3.Restart all TaskTrackers.
To disallow all superusers:
1.Edit mapred-site.xml and set mapred.tasktracker.task-controller.config.overwrite = false on all TaskTracker nodes.
2.Edit taskcontroller.cfg and set min.user.id=1000 on all TaskTracker nodes.
3.Restart all TaskTrackers.
To disallow specific users:
1.Edit mapred-site.xml and set mapred.tasktracker.task-controller.config.overwrite = false on all TaskTracker nodes.
2.Edit taskcontroller.cfg and add the parameter banned.users on all TaskTracker nodes, setting it to a comma-separated list of usernames.
Example: banned.users=foo,bar
3.Restart all TaskTrackers.
To remove all user restrictions, and run all jobs as root:
1.Edit mapred-site.xml and set mapred.task.tracker.task.controller = org.apache.hadoop.mapred.DefaultTaskController on all TaskTracker nodes.
2.Restart all TaskTrackers.
Standalone Operation
Input=local, output=mfs hadoop jar hadoop-0.20.2-dev-examples.jar grep -Dmapred.job.tracker=local file:///opt/mapr/hadoop/hadoop-0.20.2/input /output 'dfs[a-z.]+' Input=local, output=local Setting mapred.job.tracker=local in the mapred-site.xml hadoop jar hadoop-0.20.2-dev-examples.jar grep -Dmapred.job.tracker=local file:///opt/mapr/hadoop/hadoop-0.20.2/input file:///opt/mapr/hadoop/hadoop-0.20.2/output 'dfs[a-z.]+' Input=mfs, output=mfs hadoop jar hadoop-0.20.2-dev-examples.jar grep /input /output 'dfs[a-z.]+'
Memory for Services
/opt/mapr/conf/warden.conf service.command.tt.heapsize.percent=2 #The percentage of heap space reserved for the TaskTracker. service.command.tt.heapsize.max=325 #The maximum heap space that can be used by the TaskTracker. service.command.tt.heapsize.min=64 #The minimum heap space for use by the TaskTracker.
$ cat /opt/mapr/conf/warden.conf|grep size|grep percent service.command.jt.heapsize.percent=10 service.command.tt.heapsize.percent=2 service.command.hbmaster.heapsize.percent=4 service.command.hbregion.heapsize.percent=25 service.command.cldb.heapsize.percent=8 service.command.mfs.heapsize.percent=20 service.command.webserver.heapsize.percent=3 service.command.os.heapsize.percent=3
MapReduce Memory
/opt/mapr/hadoop/hadoop-0.20.2/conf/mapred-site.xml: <property> <name>mapreduce.tasktracker.reserved.physicalmemory.mb</name> <value></value> <description> Maximum phyiscal memory tasktracker should reserve for mapreduce tasks. If tasks use more than the limit, task using maximum memory will be killed. Expert only: Set this value iff tasktracker should use a certain amount of memory for mapreduce tasks. In MapR Distro warden figures this number based on services configured on a node. Setting mapreduce.tasktracker.reserved.physicalmemory.mb to -1 will disable physical memory accounting and task management. </description> </property>
OOM killer
/opt/mapr/hadoop/hadoop-0.20.2/conf/mapred-site.xmlmapred.child.oom_adj -17 to +15
Increase the OOM adjust for oom killer (linux specific). We only allow increasing the adj value. (valid values: 0-15)
Map tasks Memory
/opt/mapr/hadoop/hadoop-0.20.2/conf/mapred-site.xml: io.sort.mb Buffer used to hold map outputs in memory before writing final map outputs. Setting this value very low may cause spills. By default if left empty value is set to 50% of heapsize for map. If a average input to map is "MapIn" bytes then typically value of io.sort.mb should be '1.25 times MapIn' bytes.
Reduce tasks Memory
mapred.reduce.child.java.optsJava opts for the reduce tasks. Default heapsize(-Xmx) is determined by memory reserved for mapreduce at tasktracker.
Reduce task is given more memory than map task.
Default memory for a reduce task = (Total Memory reserved for mapreduce) * (2*#reduceslots / (#mapslots + 2*#reduceslots))
Tasks number
mapred.tasktracker.map.tasks.maximum(CPUS > 2) ? (CPUS * 0.75) : 1
(At least one Map slot, up to 0.75 times the number of CPUs)
mapred.tasktracker.reduce.tasks.maximum(CPUS > 2) ? (CPUS * 0.50) : 1
(At least one Map slot, up to 0.50 times the number of CPUs)
variables in formula:
CPUS - number of CPUs present on the node
DISKS - number of disks present on the node
MEM - memory reserved for MapReduce tasks
mapreduce.tasktracker.prefetch.maptasksHow many map tasks should be scheduled in-advance on a tasktracker.
To be given in % of map slots. Default is 1.0 which means number of tasks overscheduled = total map slots on TT.
No comments:
Post a Comment