All tests below are done in MapR 4.0.1.
Theory
1. auto split is enabled by default.Check using "maprcli table info", for example:
# maprcli table info -path /maprtable -json|grep -i autosplit
"autosplit":true,
2. "regionsizemb" table attribute controls the average size of the regions into which MapR-DB tries to split the table as the table grows.Check using "maprcli table info", for example:
# maprcli table info -path /maprtable -json|grep -i regionsizemb "regionsizemb":4096,According to http://doc.mapr.com/display/MapR/table+edit :
If autosplit is set to true, MapR-DB splits a region when the size of the region exceeds 50% of the average value. For example, if the average value is 4096 MB, MapR-DB splits a region that is larger than 6144 MB.
Note that before a table is smaller than 4 regions, MapR-DB ignores the regionsizemb parameter and aggressively distributes the table data.
Lab
To verify above theory, firstly let's create a table "/maprtable" and bulk load 76733449 rows using Spark following steps here.1. Disable auto split manually and merge them into one region with size about 4.6GB.
# maprcli table edit -autosplit false -path /maprtable # maprcli table region list -path /maprtable numberofrows fid secondarynodes primarynode numberofrowswithdelete startkey logicalsize lastheartbeat endkey physicalsize 76733449 2115.523.263486 yarn-92 yarn-94 0 -INFINITY 4899782656 0 INFINITY 49643274242. Set region size to 512MB
maprcli table edit -regionsizemb 512 -path /maprtable3. Enable auto split and then average region size is about 512MB.
maprcli table edit -autosplit true -path /maprtable # maprcli table region list -path /maprtable numberofrows fid secondarynodes primarynode numberofrowswithdelete startkey logicalsize lastheartbeat endkey physicalsize 9288340 2189.761.132748 yarn-94 yarn-92 0 -INFINITY 551845888 0 \x00Q\x03S 565313536 8218724 2191.465.132312 yarn-94 yarn-92 0 \x00Q\x03S 538714112 0 \x00\xAC\x89\xDE 553336832 7547911 2192.1708.134654 yarn-94 yarn-92 0 \x00\xAC\x89\xDE 486211584 0 \x01\x1E\xF2C 490905600 7628796 2193.34.131220 yarn-94 yarn-92 0 \x01\x1E\xF2C 489406464 0 \x01\x91\xC7\xCA 494075904 8536673 2194.34.131186 yarn-94 yarn-92 0 \x01\x91\xC7\xCA 547258368 0 \x02\x12-j 552493056 8650526 2195.723.132698 yarn-94 yarn-92 0 \x02\x12-j 557539328 0 \x02\x95v\xA6 562798592 8927659 2196.569.132256 yarn-94 yarn-92 0 \x02\x95v\xA6 573784064 0 \x03\x1CB\xAF 579248128 8973834 2116.322.263128 yarn-92 yarn-94 0 \x03\x1CB\xAF 578256896 0 \x03\xA4jS 583835648 8960986 2190.720.133176 yarn-94 yarn-92 0 \x03\xA4jS 576765952 0 INFINITY 5823201284. Disable auto split and merge them into one region again.
# maprcli table edit -autosplit false -path /maprtable # maprcli table region list -path /maprtable numberofrows fid secondarynodes primarynode numberofrowswithdelete startkey logicalsize lastheartbeat endkey physicalsize 76733449 2198.945.133124 yarn-92 yarn-94 0 -INFINITY 4899782656 0 INFINITY 49643274245. Set region size to 8GB.
maprcli table edit -regionsizemb 8192 -path /maprtable6. Enable auto split and it still generates 4 regions aggressively.
maprcli table edit -autosplit true -path /maprtable # maprcli table region list -path /maprtable numberofrows fid secondarynodes primarynode numberofrowswithdelete startkey logicalsize lastheartbeat endkey physicalsize 38305525 2190.721.133178 yarn-94 yarn-92 0 -INFINITY 2426093568 0 \x01\xE6&R 2466988032 17322521 2192.1709.134656 yarn-94 yarn-92 0 \x01\xE6&R 1114742784 0 \x02\xECP\xBD 1125318656 16298948 2198.945.133124 yarn-92 yarn-94 0 \x02\xECP\xBD 1050279936 0 \x03\xE3\x9DM 1060331520 4806455 2191.466.132314 yarn-94 yarn-92 0 \x03\xE3\x9DM 308666368 0 INFINITY 311689216
Conclusion
1. MapR-DB splits the regions once the region size reaches 150% of "regionsizemb" table attributes.2. MapR-DB will aggressively splits to at least 4 regions.
No comments:
Post a Comment