Goal:
As we know, Hive locks can be used to isolate reads and writes.Please refer to my previous blogs for details:
http://www.openkb.info/2014/11/hive-locks-tablepartition-level.html
http://www.openkb.info/2018/07/hive-different-lock-behaviors-between.html
That lock mechanism is maintained by Hive and it could have performance overhead or unexpected lock behavior based on your application logic.
This article shows another way to use snapshot feature of MapR-FS to isolate Hive reads and writes.
Env:
MapR 6.1Hiv 2.3
Solution:
The idea of this solution is straightforward and simple:- When Hive writes such as data loading finishes, take the MapR volume snapshot.
- Create an external Hive table based on that snapshot Hive data as a read-only copy for Hive reads.
- In the meantime, the Hive writes can still happen on the original Hive table.
Below is a simple example:
1. Create a MapR Volume named "testhive" mounted at /testhive
maprcli volume create -name testhive -path /testhive2. Create a Hive table in that volume
CREATE TABLE `testsnapshot`( `id` int ) LOCATION 'maprfs:/testhive/testsnapshot';And Load 500 rows into this table:
insert into testsnapshot select 1 from src; select count(*) from testsnapshot; 5003. Create a snapshot on the volume when no writes are happening
maprcli volume snapshot create -snapshotname mysnapshot -volume testhiveThe snapshot data for that Hive table named "testsnapshot" is:
$ hadoop fs -ls /testhive/.snapshot/mysnapshot/testsnapshot Found 1 items -rwxr-xr-x 3 mapr mapr 1000 2019-03-11 10:34 /testhive/.snapshot/mysnapshot/testsnapshot/000000_04. Create an external table on that snapshot data
CREATE EXTERNAL TABLE `testsnapshot_ext`( `id` int ) LOCATION 'maprfs:/testhive/.snapshot/mysnapshot/testsnapshot';5. In the meantime, write into the original Hive table
insert into testsnapshot select 1 from src; select count(*) from testsnapshot; 1000 select count(*) from testsnapshot_ext; 500
As you can see, the Hive external table "testsnapshot_ext" based on the snapshot data can be a read-only copy for Hive reads.
The Hive writes can still happen in the meantime on the original Hive table "testsnapshot".
This use case works well if you plan to periodically load data into the Hive table, and you do not want to cause inconsistency for Hive reads instead of using Hive locks feature.
No comments:
Post a Comment