Theory:
mapr exportstream and mapr importstream are used together to export data from MapR streams into binary sequence files, and then import the data from the binary sequence files into other MapR streams.So we can use the 2 tools to do cold backup/restore.
After that, mapr diffstreams can be used to check the differences between the 2 streams.
mapr formatresult can be used to parse a sequence file generated by mapr diffstreams.
Experiment:
1. Backup stream /stream/s1 to MFS location /tmp/backup_s1 -- exportstream
mapr exportstream -src /stream/s1 -dst /tmp/backup_s1The physical size of stream "/stream/s1" is about 163MB:
# maprcli stream topic list -path /stream/s1 topic partitions logicalsize consumers maxlag physicalsize info 1 371326976 1 858571 171065344The backup size is even smaller -- 100MB.
[root@v5 backup_s1]# ls -altr total 101620 drwxrwxrwx 3 mapr root 1 Apr 26 07:09 .. drwxr--r-- 2 root root 3 Apr 26 07:09 . -rw-r--r-- 1 root root 563 Apr 26 14:39 part0 -rw-r--r-- 1 root root 175 Apr 26 14:39 part2 -rw-r--r-- 1 root root 104055345 Apr 26 14:39 part1 [root@v5 backup_s1]# pwd /mapr/my2.cluster.com/tmp/backup_s1This means the compression ratio of the backup could be better than the source stream itself.
2. Restore the backup to stream /stream/s1_clone -- importstream
maprcli stream create -path /stream/s1_clone mapr importstream -src /tmp/backup_s1 -dst /stream/s1_cloneThe target stream size is similar as the source stream.
# maprcli stream topic list -path /stream/s1_clone topic partitions logicalsize consumers maxlag physicalsize info 1 371957760 1 440290179 172695552
3. Compare the # of rows between them -- streamanalyzer
# mapr streamanalyzer -path /stream/s1 -topics info Total number of messages: 16002707 # mapr streamanalyzer -path /stream/s1_clone -topics info Total number of messages: 14502707Here we can see 1500000 messages difference.
This could be due to the "dead" messages in source Stream which passed TTL(7 days by default).
This means that "mapr streamanalyzer" could count the "dead" messages also.
But interestingly after one day, I re-ran the "mapr streamanalyzer" and then the count of 2 streams match.
# mapr streamanalyzer -path /stream/s1 -topics info Total number of messages: 14502707 # mapr streamanalyzer -path /stream/s1_clone -topics info Total number of messages: 14502707Note: This difference is because my ntp service is not in sync among nodes.
4. Check the differences -- diffstreams
To prove that the different messages in #3 is due to "dead" messages, we can use "mapr diffstreams" to check the difference. The output will be stored in MFS directory "/tmp/diff".# mapr diffstreams -src /stream/s1 -dst /stream/s1_clone -outdir /tmp/diff tables '/stream/s1', and '/stream/s1_clone' didn't match Number of rows processed in '/stream/s1' : 250919 Number of rows processed in '/stream/s1_clone' : 227480 Mismatch row count in '/stream/s1' : 23439 Mismatch row count in '/stream/s1_clone' : 0 Rows with mismatch are stored in /tmp/diffInterestingly the different "row" count is 23439 instead of 1500000.
Why? No hurry, I will explain and dig deeper in the next step.
5. Parse the output file generated by diffstreams -- formatresult
The output of diffstreams is sequential file. So we need to use "mapr formatresult" to parse it into readable text format.# mapr formatresult -indir /tmp/diff/OpsForDstTable -outdir /tmp/diff/OpsForDstTable_parse Successfully created files in /tmp/diff/OpsForDstTable_parseThe row count of this output file is 23439 which matches the row count in step #4.
# wc -l opsfordst_1.diff.txt 23439 opsfordst_1.diff.txt # pwd /mapr/my2.cluster.com/tmp/diff/OpsForDstTable_parse
However the actual count of messages is 1500000 because each row has multiple messages in that output file.
Because the value of each message is stored in binary format in that output file, one easy way to count the # of messages is to count the # of word "binary" in that file.
This can be done using "vim" unix tool , and you just need to type ":%s/binary//gn" when using "vim".
Then it will show you the REAL count of different messages:
6. Prove the mismatched messages are "dead" messages
From the output in step #5, you can get the epoch timestamp of the mismatched messages.Take 1461080075789.0 for example, if you convert it to readable human time, it is actually April 19, 2016 at 8:34:35 AM PDT which is 7 days ago comparing to when I did the tests.
I was feeling overwhelmed when studying for my fundamentals of nursing course, but thanks to https://www.nursingpaper.com/questions/how-to-study-for-fundamentals-of-nursing/, I was able to find helpful tips and resources to make the process much easier. The website provided practical advice on how to study effectively, including strategies for taking notes, memorizing key concepts, and practicing critical thinking skills. I also appreciated the study guides and practice quizzes that helped me assess my knowledge and prepare for exams.
ReplyDelete