Goal:
Hive on Tez : How to identify which DAG ID for which Hive query in the same DAG Application MasterEnv:
Hive 2.1Tez 0.8
Root Cause:
One DAG Application Master may run multiple Hive queries.One example is: when you run 3 different queries very quickly in the same Hive shell, the 3 Hive queries may share the same DAG Application Master container.
Eg:
hive> set hive.execution.engine=tez; hive> select count(*) from firsttable; Query ID = mapr_20170523134547_2c04d119-5495-401e-b7b7-2ad87f0f5627 Total jobs = 1 Launching Job 1 out of 1 Status: Running (Executing on YARN cluster with App id application_1475192050844_0023) ---------------------------------------------------------------------------------------------- VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED ---------------------------------------------------------------------------------------------- Map 1 container SUCCEEDED 0 0 0 0 0 0 Reducer 2 ...... container SUCCEEDED 1 1 0 0 0 0 ---------------------------------------------------------------------------------------------- VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 6.47 s ---------------------------------------------------------------------------------------------- OK 0 Time taken: 15.396 seconds, Fetched: 1 row(s) hive> select count(*) from passwords; Query ID = mapr_20170523134609_cd1d202d-5608-4dfb-944e-136ee7b97627 Total jobs = 1 Launching Job 1 out of 1 Status: Running (Executing on YARN cluster with App id application_1475192050844_0023) ---------------------------------------------------------------------------------------------- VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED ---------------------------------------------------------------------------------------------- Map 1 .......... container SUCCEEDED 1 1 0 0 0 0 Reducer 2 ...... container SUCCEEDED 1 1 0 0 0 0 ---------------------------------------------------------------------------------------------- VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 1.19 s ---------------------------------------------------------------------------------------------- OK 29 Time taken: 1.921 seconds, Fetched: 1 row(s) hive> select count(distinct col0) from passwords; Query ID = mapr_20170523134619_b8a83051-1577-437d-931d-ea7fbc6be807 Total jobs = 1 Launching Job 1 out of 1 Status: Running (Executing on YARN cluster with App id application_1475192050844_0023) ---------------------------------------------------------------------------------------------- VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED ---------------------------------------------------------------------------------------------- Map 1 .......... container SUCCEEDED 1 1 0 0 0 0 Reducer 2 ...... container SUCCEEDED 1 1 0 0 0 0 Reducer 3 ...... container SUCCEEDED 1 1 0 0 0 0 ---------------------------------------------------------------------------------------------- VERTICES: 03/03 [==========================>>] 100% ELAPSED TIME: 7.24 s ---------------------------------------------------------------------------------------------- OK 29 Time taken: 8.058 seconds, Fetched: 1 row(s)All above 3 different Hive queries are sharing the same DAG Application Master in the same application -- application_1475192050844_0023.
Here is how to identify which DAG ID is for which Hive query.
Solution:
In the DAG Application Master container log directory, search key words "Generating DAG graphviz file":[root@s4 container_e02_1475192050844_0023_01_000001]# grep "Generating DAG graphviz file, dagId=" * syslog:2017-05-23 13:45:55,437 [INFO] [IPC Server handler 0 on 45496] |app.DAGAppMaster|: Generating DAG graphviz file, dagId=dag_1475192050844_0023_1, filePath=/opt/mapr/hadoop/hadoop-2.7.0/logs/userlogs/application_1475192050844_0023/container_e02_1475192050844_0023_01_000001/dag_1475192050844_0023_1.dot syslog_dag_1475192050844_0023_1_post:2017-05-23 13:46:09,871 [INFO] [IPC Server handler 0 on 45496] |app.DAGAppMaster|: Generating DAG graphviz file, dagId=dag_1475192050844_0023_2, filePath=/opt/mapr/hadoop/hadoop-2.7.0/logs/userlogs/application_1475192050844_0023/container_e02_1475192050844_0023_01_000001/dag_1475192050844_0023_2.dot syslog_dag_1475192050844_0023_2_post:2017-05-23 13:46:20,424 [INFO] [IPC Server handler 0 on 45496] |app.DAGAppMaster|: Generating DAG graphviz file, dagId=dag_1475192050844_0023_3, filePath=/opt/mapr/hadoop/hadoop-2.7.0/logs/userlogs/application_1475192050844_0023/container_e02_1475192050844_0023_01_000001/dag_1475192050844_0023_3.dotNote:
The 1st Hive query's DAG ID is in "syslog".
The 2nd Hive query's DAG ID is in "DAG syslog_post" of 1st Hive query.
The 3rd Hive query's DAG ID is in "DAG syslog_post" of 2nd Hive query.
Then from each "DAG syslog", we can find the mapping relationship between DAG IDs and Hive queries:
[root@s4 container_e02_1475192050844_0023_01_000001]# head -1 syslog_dag_1475192050844_0023_[1-3] ==> syslog_dag_1475192050844_0023_1 <== 2017-05-23 13:45:55,442 [INFO] [IPC Server handler 0 on 45496] |app.DAGAppMaster|: Running DAG: select count(*) from firsttable(Stage-1), callerContext={ context=HIVE, callerType=HIVE_QUERY_ID, callerId=mapr_20170523134547_2c04d119-5495-401e-b7b7-2ad87f0f5627 } ==> syslog_dag_1475192050844_0023_2 <== 2017-05-23 13:46:09,873 [INFO] [IPC Server handler 0 on 45496] |app.DAGAppMaster|: Running DAG: select count(*) from passwords(Stage-1), callerContext={ context=HIVE, callerType=HIVE_QUERY_ID, callerId=mapr_20170523134609_cd1d202d-5608-4dfb-944e-136ee7b97627 } ==> syslog_dag_1475192050844_0023_3 <== 2017-05-23 13:46:20,426 [INFO] [IPC Server handler 0 on 45496] |app.DAGAppMaster|: Running DAG: select count(distinct col0) from passwords(Stage-1), callerContext={ context=HIVE, callerType=HIVE_QUERY_ID, callerId=mapr_20170523134619_b8a83051-1577-437d-931d-ea7fbc6be807 }
No comments:
Post a Comment