Impala admission control

This articles explains some tips regarding Impala admission control and configuring resource pools using Cloudera Manager.
Env: CDH5 with CM5.

1. Configuring "Dynamic Resource Pools" using CM5 is actually modifying llama-site.xml and fair-scheduler.xml.

For example, if I create a resource pool named "impalapool" with 2 subpools named "subimpala" and "anotherimpala".

The llama-site.xml is:

<?xml version="1.0" encoding="UTF-8"?>

<!--Autogenerated by Cloudera Manager-->
<configuration>
  <property>
    <name>llama.am.throttling.maximum.placed.reservations.root.impalapool</name>
    <value>100</value>
  </property>
  <property>
    <name>llama.am.throttling.maximum.queued.reservations.root.impalapool</name>
    <value>200</value>
  </property>
  <property>
    <name>llama.am.throttling.maximum.placed.reservations.root.impalapool.subimpala</name>
    <value>50</value>
  </property>
  <property>
    <name>llama.am.throttling.maximum.queued.reservations.root.impalapool.subimpala</name>
    <value>100</value>
  </property>
  <property>
    <name>llama.am.throttling.maximum.placed.reservations.root.impalapool.anotherimpala</name>
    <value>50</value>
  </property>
  <property>
    <name>llama.am.throttling.maximum.queued.reservations.root.impalapool.anotherimpala</name>
    <value>100</value>
  </property>
</configuration>

The fair-scheduler.xml is:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<allocations>
    <queue name="root">
        <weight>1.0</weight>
        <schedulingPolicy>drf</schedulingPolicy>
        <aclSubmitApps>*</aclSubmitApps>
        <aclAdministerApps>*</aclAdministerApps>
        <queue name="default">
            <weight>1.0</weight>
            <schedulingPolicy>drf</schedulingPolicy>
            <aclSubmitApps>*</aclSubmitApps>
            <aclAdministerApps>*</aclAdministerApps>
        </queue>
        <queue name="impalapool">
            <maxResources>100 mb, 0 vcores</maxResources>
            <weight>1.0</weight>
            <schedulingPolicy>drf</schedulingPolicy>
            <aclSubmitApps>*</aclSubmitApps>
            <aclAdministerApps>*</aclAdministerApps>
            <queue name="subimpala">
                <maxResources>50 mb, 0 vcores</maxResources>
                <weight>1.0</weight>
                <schedulingPolicy>drf</schedulingPolicy>
                <aclSubmitApps>*</aclSubmitApps>
                <aclAdministerApps>*</aclAdministerApps>
            </queue>
            <queue name="anotherimpala">
                <maxResources>50 mb, 0 vcores</maxResources>
                <weight>1.0</weight>
                <schedulingPolicy>drf</schedulingPolicy>
                <aclSubmitApps>*</aclSubmitApps>
                <aclAdministerApps>*</aclAdministerApps>
            </queue>
        </queue>
    </queue>
</allocations>

We can see that llama-site.xml controls the queue limit, and fair-scheduler.xml controls the memory limit.

2. MEM_LIMIT can be used to override memory estimation, but it is per-node-limit.

If I am running a SQL in a pool with 100MB memory limit, it fails if estimation is more than 100MB.

[hdm.xxx.com:21000] > select count(*) from tab1;
Query: select count(*) from tab1
ERROR: Rejected query id=d142d0c9cf868bea:3a7cebec17383bb1 from pool root.impalapool : request memory estimate 168.00 MB, greater than pool limit 100.00 MB

I totally have 4 impala nodes in this lab, so if I set MEM_LIMIT to 50000000(bytes), about 50000000*4/1024/1024.0=190.73MB will be the estimated memory size.

[hdm.xxx.com:21000] > set mem_limit=50000000;
MEM_LIMIT set to 50000000
[hdm.xxx.com:21000] > select count(*) from tab1;
Query: select count(*) from tab1
ERROR: Rejected query id=d44817ab0f9e2627:d99fbd3d8804fea9 from pool root.impalapool : request memory estimate 190.73 MB, greater than pool limit 100.00 MB

So we have to set MEM_LIMIT to below 25MB per each node, so that total memory estimation is below resource pool limit.

[hdm.xxx.com:21000] > set mem_limit=25000000;
MEM_LIMIT set to 25000000
[hdm.xxx.com:21000] > select count(*) from tab1;
Query: select count(*) from tab1
+----------+
| count(*) |
+----------+
| 5        |
+----------+
Returned 1 row(s) in 0.33s

3. When a memory estimate from planning is not available, Impala falls back to a default value for a memory estimate.

For example, below simple SQL's estimated memory is 16GB, which means 4GB per node.

[hdm.xxx.com:21000] > select 1/2;
Query: select 1/2
ERROR: Rejected query id=f747114f9c089122:5ff736ab9cebd6ba from pool root.impalapool : request memory estimate 16.00 GB, greater than pool limit 100.00 MB

However the actual memory per node is between 8KB and 9KB.

[hdm.xxx.com:21000] > set mem_limit=8000;
MEM_LIMIT set to 8000
[hdm.xxx.com:21000] > select 1/2;
Query: select 1/2
ERROR: Memory limit exceeded
ERROR: Invalid query handle
[hdm.xxx.com:21000] > set mem_limit=9000;
MEM_LIMIT set to 9000
[hdm.xxx.com:21000] > select 1/2;
Query: select 1/2
+-----------+
| 1.0 / 2.0 |
+-----------+
| 0.5       |
+-----------+
Returned 1 row(s) in 0.13s

However you can set "-rm_default_memory" in "Impala Command Line Argument Advanced Configuration Snippet (Safety Valve)" on Cloudera Manager to override the default memory estimation. Then restart impala services.

4. "-rm_default_memory" is per-node option, like MEM_LIMIT.

After setting -rm_default_memory=20000000 in CM5 and restarting impala, below are test results for SQL "select 1".
I tried set "mem_limit" at impala-shell several times, and check the logs to get the "cluster_mem_estimate".
Here are the results:

MEM_LIMIT	cluster_mem_estimate
unset	76.29 MB
5000000	19.07 MB
10000000	38.15 MB
20000000	76.29 MB

So we can see that, for SQLs where a memory estimate from planning is not available, "-rm_default_memory" overrides the default memory estimation.
Then session level MEM_LIMIT overrides "-rm_default_memory".

5. Specify the full path of REQUEST_POOL.

Remember to specify the full path of REQUEST_POOL like "set request_pool=parentpool.childpool;", not like "set request_pool=childpool;".
In impala-shell, it does not validate the name of pool, so if there is typo or no full path of the pools, it will inherit the default pool configuration.
Correct:

[hdm.xxx.com:21000] > set request_pool=impalapool.smallpool;
REQUEST_POOL set to impalapool.smallpool
[hdm.xxx.com:21000] > select 1;
Query: select 1
ERROR: Rejected query id=e946ed1f1fd469b0:269164ecf6f29f9e from pool root.impalapool.smallpool : request memory estimate 76.29 MB, greater than pool limit 20.00 MB

From log file:

Schedule for id=e946ed1f1fd469b0:269164ecf6f29f9e in pool_name=root.impalapool.smallpool PoolConfig(max_requests=10 max_queued=20 mem_limit=20.00 MB) query cluster_mem_estimate=76.29 MB

Wrong:

[hdm.xxx.com:21000] > set request_pool=smallpool;
REQUEST_POOL set to smallpool
[hdm.xxx.com:21000] > select 1;
Query: select 1
+---+
| 1 |
+---+
| 1 |
+---+
Returned 1 row(s) in 0.13s

From log file:

Schedule for id=2b4297c577033dd3:589537e806c57783 in pool_name=root.smallpool PoolConfig(max_requests=20 max_queued=50 mem_limit=-1.00 B) query cluster_mem_estimate=76.29 MB

Tuesday, July 8, 2014