Friday, November 4, 2016

How to troubleshoot Hive local task in a separate JVM process

Goal:

How to troubleshoot Hive local task in a separate JVM process.
Especially when troubleshooting issues of Map Join's local task, we may want to put this local task into a separate JVM process instead of in the Hive CLI process.

Env:

Hive 1.2

Solution:

Starting from Hive 0.14, parameter hive.exec.submit.local.task.via.child is introduced to determine whether local tasks (typically mapjoin hashtable generation phase) run in a separate JVM (true recommended) or not.
One benefit is to avoid the OutOfMemory error happened when Map Join is generating the hashtable.
Some other benefit is to isolate this local task for better troubleshooting, for example, run "jstack" to get the stacktrace of the local task.

For example, if we run a Map Join query in Hive CLI, everything is run inside the Hive CLI JVM.
If we set hive.exec.submit.local.task.via.child=true, when Map Join is in progress, "ps -ef|grep hive" can show the extra JVM -- pid=14247(org.apache.hadoop.hive.ql.exec.mr.ExecDriver):
mapr     14247 11527 99 14:42 pts/0    00:00:08 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.101-3.b13.el6_8.x86_64/jre/bin/java -Xmx1000m -Dhiveserver2.auth=NONE -Dhadoop.login=simple -Djava.security.auth.login.config=/opt/mapr/conf/mapr.login.conf -Dzookeeper.sasl.clientconfig=Client_simple -Dzookeeper.saslprovider=com.mapr.security.simplesasl.SimpleSaslProvider -Dmapr.library.flatclass -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/opt/mapr/hadoop/hadoop-2.7.0/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/opt/mapr/hadoop/hadoop-2.7.0 -Dhadoop.id.str=mapr -Dhadoop.root.logger=INFO,console -Djava.library.path=/opt/mapr/hadoop/hadoop-2.7.0/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Xmx512m -Dhadoop.security.logger=INFO,NullAppender -Djava.security.auth.login.config=/opt/mapr/conf/mapr.login.conf -Dzookeeper.sasl.clientconfig=Client_simple -Dzookeeper.saslprovider=com.mapr.security.simplesasl.SimpleSaslProvider -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/opt/mapr/hadoop/hadoop-2.7.0/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/opt/mapr/hadoop/hadoop-2.7.0 -Dhadoop.id.str=mapr -Dhadoop.root.logger=INFO,console -Djava.library.path=/opt/mapr/hadoop/hadoop-2.7.0/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Xmx512m -Xmx512m -Dhadoop.security.logger=INFO,NullAppender -Djava.security.auth.login.config=/opt/mapr/conf/mapr.login.conf -Dzookeeper.sasl.clientconfig=Client_simple -Dzookeeper.saslprovider=com.mapr.security.simplesasl.SimpleSaslProvider org.apache.hadoop.util.RunJar /opt/mapr/hive/hive-1.2/lib/hive-exec-1.2.0-mapr-1609.jar org.apache.hadoop.hive.ql.exec.mr.ExecDriver -localtask -plan file:/tmp/mapr/658bf551-921b-4531-a270-5f84d3ffb978/hive_2016-11-04_14-42-07_338_4188192754533390017-1/-local-10006/plan.xml -jobconffile file:/tmp/mapr/658bf551-921b-4531-a270-5f84d3ffb978/hive_2016-11-04_14-42-07_338_4188192754533390017-1/-local-10007/jobconf.xml
mapr     11527  9387  7 14:39 pts/0    00:00:12 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.101-3.b13.el6_8.x86_64/jre/bin/java -Xmx256m -Dhiveserver2.auth=NONE -Dhadoop.login=simple -Djava.security.auth.login.config=/opt/mapr/conf/mapr.login.conf -Dzookeeper.sasl.clientconfig=Client_simple -Dzookeeper.saslprovider=com.mapr.security.simplesasl.SimpleSaslProvider -Dmapr.library.flatclass -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/opt/mapr/hadoop/hadoop-2.7.0/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/opt/mapr/hadoop/hadoop-2.7.0 -Dhadoop.id.str=mapr -Dhadoop.root.logger=INFO,console -Djava.library.path=/opt/mapr/hadoop/hadoop-2.7.0/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Xmx512m -Dhadoop.security.logger=INFO,NullAppender -Djava.security.auth.login.config=/opt/mapr/conf/mapr.login.conf -Dzookeeper.sasl.clientconfig=Client_simple -Dzookeeper.saslprovider=com.mapr.security.simplesasl.SimpleSaslProvider org.apache.hadoop.util.RunJar /opt/mapr/hive/hive-1.2/lib/hive-cli-1.2.0-mapr-1609.jar org.apache.hadoop.hive.cli.CliDriver

This extra JVM process will run all the local tasks outside of Hive CLI.

1 comment:

  1. Will this local task bind to a port ? if yes, is that a ephemeral port or a static port ? We ran into port binding issues on port 9012 with this hive property set to true.
    set hive.exec.submit.local.task.via.child=true;

    ReplyDelete

Popular Posts