Goal
Understand the different lock behaviors between DummyTxnManager and DbTxnManager in Hive. The example query is INSERT into a partition table.Env
Hive 2.1MapR 5.2
Test Preparation
1. Create a Hive partition table "testpart" as described in http://www.openkb.info/2014/11/hive-locks-tablepartition-level.html
2. Create 2 child partitions with data for "testpart":
INSERT OVERWRITE TABLE testpart PARTITION(one = 'a', two = 'b') SELECT * FROM passwords; INSERT OVERWRITE TABLE testpart PARTITION(one = 'c', two = 'd') SELECT * FROM passwords;
3. Hive Metastore has 2 types of lock configurations
a. DummyTxnManagerIt means to set below in hive-site.xml and restart Hive Metastore:
<property> <name>hive.support.concurrency</name> <value>true</value> </property> <property> <name>hive.txn.manager</name> <value>org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager</value> </property>b. DbTxnManager
It means to set below in hive-site.xml and restart Hive Metastore:
<property> <name>hive.support.concurrency</name> <value>true</value> </property> <property> <name>hive.txn.manager</name> <value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value> </property>
4. We will test below 2 types of INSERT
a. Dynamic Partition Insertset hive.exec.dynamic.partition.mode=nostrict; INSERT OVERWRITE TABLE testpart PARTITION(one,two) SELECT *,'a','b' FROM passwords;b. Static Partition Insert
INSERT OVERWRITE TABLE testpart PARTITION(one = 'a', two = 'b') SELECT * FROM passwords;
Solution
1. DummyTxnManager + Dynamic Partition Insert
Here we open 2 Hive CLI sessions, and run one Dynamic Partition Insert in each session.The locks are:
hive> show locks extended; OK default@passwords SHARED LOCK_QUERYID:mapr_20180710102038_18601ec0-9f6d-4305-8446-d6c895fc5013 LOCK_TIME:1531243238410 LOCK_MODE:IMPLICIT LOCK_QUERYSTRING:INSERT OVERWRITE TABLE testpart PARTITION(one,two) SELECT *,'a','b' FROM passwords default@passwords SHARED LOCK_QUERYID:mapr_20180710102138_a9494859-ff06-4359-895c-00eb8c348ffd LOCK_TIME:1531243299188 LOCK_MODE:IMPLICIT LOCK_QUERYSTRING:INSERT OVERWRITE TABLE testpart PARTITION(one,two) SELECT *,'a','b' FROM passwords default@testpart SHARED LOCK_QUERYID:mapr_20180710102038_18601ec0-9f6d-4305-8446-d6c895fc5013 LOCK_TIME:1531243238411 LOCK_MODE:IMPLICIT LOCK_QUERYSTRING:INSERT OVERWRITE TABLE testpart PARTITION(one,two) SELECT *,'a','b' FROM passwords default@testpart SHARED LOCK_QUERYID:mapr_20180710102138_a9494859-ff06-4359-895c-00eb8c348ffd LOCK_TIME:1531243299189 LOCK_MODE:IMPLICIT LOCK_QUERYSTRING:INSERT OVERWRITE TABLE testpart PARTITION(one,two) SELECT *,'a','b' FROM passwords Time taken: 0.035 seconds, Fetched: 28 row(s)
Both of the Dynamic Partition Inserts(into the same leaf partition) can run concurrently because each of the INSERT only needs:
- SHARED lock on parent table -- "testpart"
2. DummyTxnManager + Static Partition Insert
Here we open 2 Hive CLI sessions, and run one Static Partition Insert in each session.The locks are:
hive> show locks extended; OK default@passwords SHARED LOCK_QUERYID:mapr_20180710104550_8f7fd2d8-7bbc-4e8c-983f-94549ff4cda8 LOCK_TIME:1531244750212 LOCK_MODE:IMPLICIT LOCK_QUERYSTRING:INSERT OVERWRITE TABLE testpart PARTITION(one = 'a', two = 'b') SELECT * FROM passwords default@passwords SHARED LOCK_QUERYID:mapr_20180710104604_92fb819d-8cd1-4a1c-bbc5-60db7c59cc51 LOCK_TIME:1531244764648 LOCK_MODE:IMPLICIT LOCK_QUERYSTRING:INSERT OVERWRITE TABLE testpart PARTITION(one = 'a', two = 'b') SELECT * FROM passwords default@testpart SHARED LOCK_QUERYID:mapr_20180710104604_92fb819d-8cd1-4a1c-bbc5-60db7c59cc51 LOCK_TIME:1531244764648 LOCK_MODE:IMPLICIT LOCK_QUERYSTRING:INSERT OVERWRITE TABLE testpart PARTITION(one = 'a', two = 'b') SELECT * FROM passwords default@testpart SHARED LOCK_QUERYID:mapr_20180710104550_8f7fd2d8-7bbc-4e8c-983f-94549ff4cda8 LOCK_TIME:1531244750212 LOCK_MODE:IMPLICIT LOCK_QUERYSTRING:INSERT OVERWRITE TABLE testpart PARTITION(one = 'a', two = 'b') SELECT * FROM passwords default@testpart@one=a SHARED LOCK_QUERYID:mapr_20180710104604_92fb819d-8cd1-4a1c-bbc5-60db7c59cc51 LOCK_TIME:1531244764648 LOCK_MODE:IMPLICIT LOCK_QUERYSTRING:INSERT OVERWRITE TABLE testpart PARTITION(one = 'a', two = 'b') SELECT * FROM passwords default@testpart@one=a SHARED LOCK_QUERYID:mapr_20180710104550_8f7fd2d8-7bbc-4e8c-983f-94549ff4cda8 LOCK_TIME:1531244750212 LOCK_MODE:IMPLICIT LOCK_QUERYSTRING:INSERT OVERWRITE TABLE testpart PARTITION(one = 'a', two = 'b') SELECT * FROM passwords default@testpart@one=a/two=b EXCLUSIVE LOCK_QUERYID:mapr_20180710104550_8f7fd2d8-7bbc-4e8c-983f-94549ff4cda8 LOCK_TIME:1531244750212 LOCK_MODE:IMPLICIT LOCK_QUERYSTRING:INSERT OVERWRITE TABLE testpart PARTITION(one = 'a', two = 'b') SELECT * FROM passwords Time taken: 0.027 seconds, Fetched: 49 row(s)Only one of the Static Partition Inserts(into the same leaf partition) can run at one time because each of the INSERT needs:
- SHARED lock on parent table -- "testpart"
- SHARED lock on parent partition(s) -- "testpart@one=a"
- EXCLUSIVE lock on the leaf partition -- "testpart@one=a/two=b".
From above output, the 2nd Static Partition Insert's query ID is mapr_20180710104604_92fb819d-8cd1-4a1c-bbc5-60db7c59cc51.
After the 1st Static Partition Insert completed, then 2nd Static Partition Insert started.
At that time, the locks became:
hive> show locks extended; OK default@passwords SHARED LOCK_QUERYID:mapr_20180710104604_92fb819d-8cd1-4a1c-bbc5-60db7c59cc51 LOCK_TIME:1531244764648 LOCK_MODE:IMPLICIT LOCK_QUERYSTRING:INSERT OVERWRITE TABLE testpart PARTITION(one = 'a', two = 'b') SELECT * FROM passwords default@testpart SHARED LOCK_QUERYID:mapr_20180710104604_92fb819d-8cd1-4a1c-bbc5-60db7c59cc51 LOCK_TIME:1531244764648 LOCK_MODE:IMPLICIT LOCK_QUERYSTRING:INSERT OVERWRITE TABLE testpart PARTITION(one = 'a', two = 'b') SELECT * FROM passwords default@testpart@one=a SHARED LOCK_QUERYID:mapr_20180710104604_92fb819d-8cd1-4a1c-bbc5-60db7c59cc51 LOCK_TIME:1531244764648 LOCK_MODE:IMPLICIT LOCK_QUERYSTRING:INSERT OVERWRITE TABLE testpart PARTITION(one = 'a', two = 'b') SELECT * FROM passwords default@testpart@one=a/two=b EXCLUSIVE LOCK_QUERYID:mapr_20180710104604_92fb819d-8cd1-4a1c-bbc5-60db7c59cc51 LOCK_TIME:1531244764648 LOCK_MODE:IMPLICIT LOCK_QUERYSTRING:INSERT OVERWRITE TABLE testpart PARTITION(one = 'a', two = 'b') SELECT * FROM passwords Time taken: 0.017 seconds, Fetched: 28 row(s)
Due to this lock behaviors, it means, Static Partition Inserts into different leaf partitions can run concurrently because the EXCLUSIVE lock is at leaf partition level.
Of course, SELECT queries on other leaf partitions(without EXCLUSIVE lock) can also run concurrently with the Static Partition Insert.
I have done tests to prove this theory, but I will not share the complete output here.
3. DbTxnManager + Dynamic Partition Insert
Running one Dynamic Partition Insert in Hive CLI session, and the locks are:hive> show locks extended; OK Lock ID Database Table Partition State Blocked By Type Transaction ID Last Hearbeat Acquired At User Hostname Agent Info 12.1 default passwords NULL ACQUIRED SHARED_READ NULL 1531256068000 1531256068000 mapr s4.poc.com mapr_20180710135425_b4e412b0-ddb9-4c3a-92bc-fda863df71d7 12.2 default testpart NULL ACQUIRED EXCLUSIVE NULL 1531256068000 1531256068000 mapr s4.poc.com mapr_20180710135425_b4e412b0-ddb9-4c3a-92bc-fda863df71d7 Time taken: 0.012 seconds, Fetched: 3 row(s)Only one of the Dynamic Partition Insert can run at one time because each of the INSERT needs:
- EXCLUSIVE lock on parent table -- "testpart".
4. DbTxnManager + Static Partition Insert
Running one Static Partition Insert in Hive CLI session, and the locks are:hive> show locks extended; OK Lock ID Database Table Partition State Blocked By Type Transaction ID Last Hearbeat Acquired At User Hostname Agent Info 13.1 default passwords NULL ACQUIRED SHARED_READ NULL 1531256273000 1531256273000 mapr s4.poc.com mapr_20180710135753_3086d4dc-b157-429e-9604-c8a094468a3e 13.2 default testpart one=a/two=b ACQUIRED EXCLUSIVE NULL 1531256273000 1531256273000 mapr s4.poc.com mapr_20180710135753_3086d4dc-b157-429e-9604-c8a094468a3e Time taken: 0.01 seconds, Fetched: 3 row(s)Only one of the Static Partition Inserts(into the same leaf partition) can run at one time because each of the INSERT needs:
- EXCLUSIVE lock on the leaf partition -- "testpart@one=a/two=b".
See below locks:
hive> show locks extended; OK Lock ID Database Table Partition State Blocked By Type Transaction ID Last Hearbeat Acquired At User Hostname Agent Info 18.1 default passwords NULL ACQUIRED SHARED_READ NULL 1531258035000 1531258035000 mapr s4.poc.com mapr_20180710142715_d40501ee-9267-47de-bf6a-9d18ef4a1df7 19.1 default passwords NULL ACQUIRED SHARED_READ NULL 1531258040000 1531258040000 mapr s4.poc.com mapr_20180710142720_a1be323c-905d-4f4c-b7dc-33634df68a91 18.2 default testpart one=a/two=b ACQUIRED EXCLUSIVE NULL 1531258035000 1531258035000 mapr s4.poc.com mapr_20180710142715_d40501ee-9267-47de-bf6a-9d18ef4a1df7 19.2 default testpart one=c/two=d ACQUIRED EXCLUSIVE NULL 1531258040000 1531258040000 mapr s4.poc.com mapr_20180710142720_a1be323c-905d-4f4c-b7dc-33634df68a91 Time taken: 0.012 seconds, Fetched: 5 row(s)
However due to my test, SELECT on parent table or SELECT on other leaf partition(s) will be blocked.
Here we open 2 Hive CLI sessions. One session is running Static Partition Insert on partition(one=a/two=b). The other session is running SELECT on another partition(one=c/two=d).
The locks are:
hive> show locks extended; OK Lock ID Database Table Partition State Blocked By Type Transaction ID Last Hearbeat Acquired At User Hostname Agent Info 16.1 default passwords NULL ACQUIRED SHARED_READ NULL 1531257690000 1531257690000 mapr s4.poc.com mapr_20180710142129_e99fa6fc-2cfa-4e66-a596-d83e10285802 16.2 default testpart one=a/two=b ACQUIRED EXCLUSIVE NULL 1531257690000 1531257690000 mapr s4.poc.com mapr_20180710142129_e99fa6fc-2cfa-4e66-a596-d83e10285802 17.1 default testpart NULL WAITING 16.2 SHARED_READ NULL 1531257724000 NULL mapr s4.poc.com mapr_20180710142136_21312ed9-f1a1-442a-8545-1942885fbc77 17.2 default testpart one=c/two=d WAITING SHARED_READ NULL 1531257724000 NULL mapr s4.poc.com mapr_20180710142136_21312ed9-f1a1-442a-8545-1942885fbc77 Time taken: 0.006 seconds, Fetched: 5 row(s)As we can see from above, the SELECT on another partition(one=c/two=d) will wait for SHARED lock on the parent table which is blocked by the EXCLUSIVE lock on partition(one=a/two=b).
Key Takeaways
1. DummyTxnManager + Dynamic Partition Insert- SHARED lock on parent table -- "testpart"
- Dynamic Partition Inserts(no matter into the same or different leaf partition)
- SELECT(on parent table or on any partition)
2. DummyTxnManager + Static Partition Insert
- SHARED lock on parent table -- "testpart"
- SHARED lock on parent partition(s) -- "testpart@one=a"
- EXCLUSIVE lock on the leaf partition -- "testpart@one=a/two=b".
- Static Partition Insert into different leaf partitions
- SELECT on other leaf partitions(without EXCLUSIVE lock)
- EXCLUSIVE lock on parent table -- "testpart".
All of them will be blocked.
4. DbTxnManager + Static Partition Insert
- EXCLUSIVE lock on the leaf partition -- "testpart@one=a/two=b".
- Static Partition Insert into different leaf partitions
Is it necessary to use hive transactions and ORC tables to take advantage of the exclusive lock by DbTxnManager + Dynamic Partition Insert ,
ReplyDeleteor could I just use 'DbTxnManager + Dynamic Partition Insert' without hive transactions on parquet tables ?
It is not a requirement to use ORC or parquet format.
DeleteAbove tests are done using text format tables.
"Hive transaction" is a different feature which only works on ORC format.
need a way to figure out which hive query is holding up the locks, but when we enable (DbTxnManager) hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager then the show locks extended on beeline does not show the actual query holding up the lock like it did under DummyTransationManager,
ReplyDeleteso to figure out the query holding up the lock(s) which tables in the mysql metastore should I probe and how would one figure out the actual (HS2 based) query holding up the locks pls
It permits you to really look at the ordering status and upgrade the perceivability of your websites.
ReplyDeleteonohosting.com/