It is shown as "mapr::fs::RAThread::runner" when pstack <impalad_pid>.
When Impala is reading Parquet table, the number of RA threads are controlled by below factors:
- Number of columns of the Parquet table
- Table Size
- Number of Impalad processes
Env:
MapR 4.0.1 + Impala 1.4.13 Impalad processes
Preparation:
1. Create several Parquet tables with similar size, but with different columns.
Table Name | Table Size | Number of Columns |
---|---|---|
parquet_table_passwords_5 | 1.0 G | 5 |
parquet_table_passwords_10 | 1.0 G | 10 |
parquet_table_passwords_20 | 1.0 G | 20 |
parquet_table_passwords_20_half | 526.5 M | 20 |
Sample DDL:
CREATE TABLE default.parquet_table_passwords_5( col0 STRING, col1 STRING, col2 STRING, col3 STRING, col4 STRING ) STORED AS PARQUET ;
2. Start monitoring script to count the number of RA Threads spawned by each impalad process every 1 minute.
Please see [this article] to understand how to switch the file client firstly.If MapR C++ File Client is used(Default), use below script:
# cat monitorRA.sh #!/bin/bash while [ true ]; do date;clush -a "pstack \`pgrep impalad\`|grep mapr::fs::RAThread::runner|wc -l" sleep 60 doneNote: pstack actually calls gdb which will block the process, so be careful when using it in production.
If Hadoop JAVA File Client is used, use below script under the user who started impalad:
# cat monitor_ra_java.sh #!/bin/bash while [ true ]; do date;clush -a "jstack \`pgrep impalad\` |grep \"MapR RA\"|wc -l" sleep 10 done
Lab Tests
All of below tests are running single concurrent "select * from <table_name> limit 100000" repeatedly, and count the RA Thread number from each impalad process.Here are the test results to prove each factor.
Factor 1. Number of columns of the Parquet table
Table Name | Number of RA Threads |
Number of Columns |
---|---|---|
parquet_table_passwords_5 | Node1: 40 Node2: 40 Node3: 40 |
5 |
parquet_table_passwords_10 | Node1: 80 Node2: 80 Node3: 80 |
10 |
parquet_table_passwords_20 | Node1: 120 Node2: 160 Node3: 200 |
20 |
Factor 2. Table Size
Table Name | Number of RA Threads |
Table Size |
---|---|---|
parquet_table_passwords_20 | Node1: 120 Node2: 160 Node3: 200 |
1.0G |
parquet_table_passwords_20_half | Node1: 80 Node2: 80 Node3: 80 |
526.5 M |
Factor 3. Number of Impalad processes
During this test, we stopped one and two impalad processes separately.Table Name | Number of RA Threads |
Number of Impalad |
---|---|---|
parquet_table_passwords_20 | Node1: 120 Node2: 160 Node3: 200 |
3 |
parquet_table_passwords_20 | Node1: 200 Node2: 280 |
2 |
parquet_table_passwords_20 | Node1: 480 | 1 |
Conclusion
1. The more columns of Parquet tables, more RA threads.2. The larger the table is, more RA threads.
3. Total number of RA threads dose not change.
No comments:
Post a Comment