Goal:
This article records the common commands and issues for hbase replication.
Solution:
1. Add the target as peer
hbase shell> add_peer "us_east","hostname.of.zookeeper:5181:/path-to-hbase"
2. Enable and Disable table replication
hbase shell> enable_table_replication "t1"
hbase shell> disable_table_replication "t1"
3. Copy table from source to target
hbase org.apache.hadoop.hbase.mapreduce.CopyTable --peer.adr=hostname.of.zookeeper:5181:/path-to-hbase t1
4. Remove target as peer
hbase shell> remove_peer "us_east"
5. List all peers
hbase shell> list_peers
6. Verify the rows between source and target table
hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication peer1 table1
Compare the GOODROWS and BADROWS.
7. Monitor Replication Status
# Prints the status of each source and its sinks, sorted by hostname.
hbase shell> status 'replication'
# Prints the status for each replication source, sorted by hostname.
hbase shell> status 'replication', 'source'
# Prints the status for each replication sink, sorted by hostname.
hbase shell> status 'replication', 'sink'
8. HBase Replication Metrics
Metric | Description |
---|---|
| Number of WALs to process (excludes the one which is being processed) at the replication source. |
| Number of of mutations shipped. |
| Number of mutations read from WALs at the replication source. |
| Age of last batch shipped by the replication source. |
9. Practice for replicating one existing table from cluster A to cluster B
on cluster A:
hbase shell> add_peer "B","hostname.of.zookeeper:5181:/path-to-hbase"
hbase shell> enable_table_replication "t1"
hbase shell> disable_peer 'B'
Then use either CopyTable, Export/Import or ExportSnapshot to copy table "t1" from A to B.
hbase shell> enable_peer 'B'
10. Hbase replication related parameters
<property>
<name>hbase.replication</name>
<value>true</value>
<description>Allow HBase tables to be replicated.</description>
</property>
<property>
<name>replication.source.nb.capacity</name>
<value>25000</value>
<description>The data records synchronized to the sink side each time cannot be greater than the threshold, and the default is 25000</description>
</property>
<property>
<name>replication.source.ratio</name>
<value>0.1</value>
<description>The RegionServer of this ratio is selected from the cluster to be backed up as potential ReplicationSink, and the default value is 0.1</description>
</property>
<property>
<name>replication.source.size.capacity</name>
<value>67108864</value>
<description>The size of the data synchronized to the sink side each time cannot exceed this threshold, and the default is 64M</description>
</property>
<property>
<name>replication.sleep.before.failover</name>
<value>2000</value>
<description>Before transferring the ReplicationQueue in the dead RegionServer to another RegionServer, take a nap for 2 seconds</description>
</property>
<property>
<name>replication.executor.workers</name>
<value>1</value>
<description>The number of threads engaged in replication, the default is 1</description>
</property>
Known Issues
1. HBASE-18111
The cluster connection was aborted when the ZookeeperWatcher receive a AuthFailed event. Then the HBaseInterClusterReplicationEndpoint's replicate() method will stuck in a while loop.
One symptom is the jstack on RS shows:
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.sleepForRetries(HBaseInterClusterReplicationEndpoint.java:127)
at org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.replicate(HBaseInterClusterReplicationEndpoint.java:199)
at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:905)
at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:492)
This is fixed on 1.3.3, 1.4.0, 2.0.0.
2. HBASE-24359
Replication will be stuck after we delete CFs from both the source and the sink, if the source still has outstanding edits that now it could not get rid of. Now all replication is backed up behind these unreplicatable edits.
The fix is to introduce a new config hbase.replication.drop.on.deleted.columnfamily, default is false. When config to true, the replication will drop the edits for columnfamily that has been deleted from the replication source and target.
This is fixed on 2.3.0 and 3.0.0.
References
https://blog.cloudera.com/what-are-hbase-znodes/
https://blog.cloudera.com/apache-hbase-replication-overview/
https://blog.cloudera.com/online-apache-hbase-backups-with-copytable/
https://blog.cloudera.com/introduction-to-apache-hbase-snapshots/
No comments:
Post a Comment