Goal:
How to install Thrift and run a sample Hbase thrift job towards Hbase Thrift Gateway on MapR Cluster.The example job is written in Python, and it just scans a MapR-DB table.
Env:
Hbase 1.1.1MapR 5.2
CentOS 6.5
Solution:
Before following this article, please follow MapR Documentation to install and start Hbase thrift service.1. Install Thrift
The Apache Thrift software framework, for scalable cross-language services development, combines a software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, OCaml and Delphi and other languages.Please follow this link to download and install Thrift on CentOS env.
After installation, please identify the thrift source code location. For example:
/home/mapr/hao/thrift/thrift/lib/py/src
2.Download Hbase Source Code from github based on your Hbase version
Here my Hbase version is 1.1.1.git clone git@github.com:apache/hbase.git cd hbase git checkout remotes/origin/branch-1.1
Please identify the location of Hbase.thrift file after downloading the Hbase source code:
./hbase-thrift/src/main/resources/org/apache/hadoop/hbase/thrift/Hbase.thrift
3. Generate the bindings for Python language
Copy above Hbase.thrift file identified in step #2 to current working directory and run below command:thrift -gen py ./Hbase.thrift mv gen-py/* . rm -rf gen-py/
Copy above thrift source code identifyed in step #1 here:
mkdir thrift cp -rp /home/mapr/hao/thrift/thrift/lib/py/src/* ./thrift/
4. Install needed python library
You may skip this step if you already have installed needed python library.yum install python-pip pip install six
5. Create a sample Hbase thrift job named "test.py" in Python
from thrift.transport import TSocket from thrift.protocol import TBinaryProtocol from thrift.transport import TTransport from hbase import Hbase # host is where Hbase thrift service is running. host = "localhost" # port is Hbase thrift service default port -- 9090. port = "9090" # tablename is the MapR-DB table which this sample job scans. tablename = "/user/mapr/maprdb_sample_table" # numRows is the number of rows that "scannerGetList" retrieves from the scanner at once. numRows = 5 # columnName is the column which will be printed out later. columnName = "cf:mycolumn" # Connect to HBase Thrift server transport = TTransport.TBufferedTransport(TSocket.TSocket(host, port)) protocol = TBinaryProtocol.TBinaryProtocolAccelerated(transport) # Create and open the client connection client = Hbase.Client(protocol) transport.open() # Scan the MapR-DB table scan = Hbase.TScan(startRow="111111", stopRow="22222") scannerId = client.scannerOpenWithScan(tablename, scan, None) row = client.scannerGet(scannerId) rowList = client.scannerGetList(scannerId,numRows) while rowList: for row in rowList: message = row.columns.get(columnName).value rowKey = row.row print "rowKey = " + rowKey + ", columnValue = " + message rowList = client.scannerGetList(scannerId,numRows) client.scannerClose(scannerId) # Close the client connection transport.close()
Above sample job scans a MapR-DB table named "/user/mapr/maprdb_sample_table" and prints out row key and column "cf:mycolumn"'s value.
6. Execute Thrift job in Python
python ./test.py
No comments:
Post a Comment