Wednesday, September 2, 2015

Hive CLI fails with error "Metastore contains multiple versions"

Env:

Hive 0.13

Symptom:

Hive CLI fails with error "Metastore contains multiple versions".

Below is a sample stacktrace:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:346)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1412)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:62)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:72)
at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2511)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2523)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:340)
... 7 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1410)
... 12 more
Caused by: MetaException(message:Metastore contains multiple versions)
at org.apache.hadoop.hive.metastore.ObjectStore.getMSchemaVersion(ObjectStore.java:6373)
at org.apache.hadoop.hive.metastore.ObjectStore.getMetaStoreSchemaVersion(ObjectStore.java:6335)
at org.apache.hadoop.hive.metastore.ObjectStore.checkSchema(ObjectStore.java:6294)
at org.apache.hadoop.hive.metastore.ObjectStore.verifySchema(ObjectStore.java:6282)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:108)
at com.sun.proxy.$Proxy9.verifySchema(Unknown Source)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:476)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:523)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:397)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.<init>(HiveMetaStore.java:356)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:54)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:59)
at org.apache.hadoop.hive.metastore.HiveMetaStore.newHMSHandler(HiveMetaStore.java:4944)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:171)

Root Cause:

There are 2 scenarios where Hive metastore will modify "VERSION" table.

1. hive.metastore.schema.verification=false AND get "NULL" when fetching the schema version from backend database due to some reason.

Hive will insert a new row in VERSION table in this case:
INSERT INTO `VERSION` (`VER_ID`,`VERSION_COMMENT`,`SCHEMA_VERSION`) VALUES (16,'Set by MetaStore','0.13.0')

2. hive.metastore.schema.verification=true AND metastore schema version is the same as Hive distribution.

Hive will update the existing row in VERSION table in this case:
UPDATE `VERSION` SET `SCHEMA_VERSION`='0.13.0',`VERSION_COMMENT`='Set by MetaStore' WHERE `VER_ID`=16

Actually only scenario 1 will add more rows into VERSION table.
Setting hive.metastore.schema.verification=true can avoid it;

Scenario 2 is actually a different bug HIVE-9749, the "if-else" logic is wrong in Hive 0.13 code and it is fixed in Hive 1.2.
The correct behavior of scenario 2 should be:  Hive should update VERSION table only when hive.metastore.schema.verification=false AND metastore schema version is different than Hive distribution.
But anyway, scenario 2 will never insert duplicate rows into VERSION table.

See the Hive 0.13 code of metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
private synchronized void checkSchema() throws MetaException {
  // recheck if it got verified by another thread while we were waiting
  if (isSchemaVerified.get()) {
    return;
  }
 
  boolean strictValidation =
    HiveConf.getBoolVar(getConf(), HiveConf.ConfVars.METASTORE_SCHEMA_VERIFICATION);
  // read the schema version stored in metastore db
  String schemaVer = getMetaStoreSchemaVersion();
  if (schemaVer == null) {
    // metastore has no schema version information
    if (strictValidation) {
          throw new MetaException("Version information not found in metastore. ");
        } else {
          LOG.warn("Version information not found in metastore. "
              + HiveConf.ConfVars.METASTORE_SCHEMA_VERIFICATION.toString() +
              " is not enabled so recording the schema version " +
              MetaStoreSchemaInfo.getHiveSchemaVersion());
          setMetaStoreSchemaVersion(MetaStoreSchemaInfo.getHiveSchemaVersion(),
              "Set by MetaStore");
      }
  } else {
    // metastore schema version is different than Hive distribution needs
    if (strictValidation) {
      if (!schemaVer.equalsIgnoreCase(MetaStoreSchemaInfo.getHiveSchemaVersion())) {
        throw new MetaException("Hive Schema version "
            + MetaStoreSchemaInfo.getHiveSchemaVersion() +
            " does not match metastore's schema version " + schemaVer +
            " Metastore is not upgraded or corrupt");
      } else {
        LOG.warn("Metastore version was " + schemaVer + " " +
            HiveConf.ConfVars.METASTORE_SCHEMA_VERIFICATION.toString() +
            " is not enabled so recording the new schema version " +
            MetaStoreSchemaInfo.getHiveSchemaVersion());
        setMetaStoreSchemaVersion(MetaStoreSchemaInfo.getHiveSchemaVersion(),
            "Set by MetaStore");
      }
    }
  }
  isSchemaVerified.set(true);
  return;
}
See the Hive 1.2 code of metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
private synchronized void checkSchema() throws MetaException {
  // recheck if it got verified by another thread while we were waiting
  if (isSchemaVerified.get()) {
    return;
  }
 
  boolean strictValidation =
    HiveConf.getBoolVar(getConf(), HiveConf.ConfVars.METASTORE_SCHEMA_VERIFICATION);
  // read the schema version stored in metastore db
  String schemaVer = getMetaStoreSchemaVersion();
  if (schemaVer == null) {
    if (strictValidation) {
      throw new MetaException("Version information not found in metastore. ");
    } else {
      LOG.warn("Version information not found in metastore. "
          + HiveConf.ConfVars.METASTORE_SCHEMA_VERIFICATION.toString() +
          " is not enabled so recording the schema version " +
          MetaStoreSchemaInfo.getHiveSchemaVersion());
      setMetaStoreSchemaVersion(MetaStoreSchemaInfo.getHiveSchemaVersion(),
        "Set by MetaStore " + USER + "@" + HOSTNAME);
    }
  } else {
    // metastore schema version is different than Hive distribution needs
    if (schemaVer.equalsIgnoreCase(MetaStoreSchemaInfo.getHiveSchemaVersion())) {
      LOG.debug("Found expected HMS version of " + schemaVer);
    } else {
      if (strictValidation) {
        throw new MetaException("Hive Schema version "
            + MetaStoreSchemaInfo.getHiveSchemaVersion() +
            " does not match metastore's schema version " + schemaVer +
            " Metastore is not upgraded or corrupt");
      } else {
        LOG.error("Version information found in metastore differs " + schemaVer +
            " from expected schema version " + MetaStoreSchemaInfo.getHiveSchemaVersion() +
            ". Schema verififcation is disabled " +
            HiveConf.ConfVars.METASTORE_SCHEMA_VERIFICATION + " so setting version.");
        setMetaStoreSchemaVersion(MetaStoreSchemaInfo.getHiveSchemaVersion(),
          "Set by MetaStore " + USER + "@" + HOSTNAME);
      }
    }
  }
  isSchemaVerified.set(true);
  return;
}
So from above codes, Hive 1.2 just switched the two if-clauses for scenario 2.
Although scenario 2 is also buggy, but it is not impacting this multiple versions issue.

Solution:

1. To fix this issue and bring Hive back online, we need to delete the bad rows from "VERSION" table in backend database.
For example, if current "VERSION" looks as below:
1
2
3
4
5
6
7
8
9
10
11
12
13
mysql> select * from version;
+--------+----------------+------------------+
| VER_ID | SCHEMA_VERSION | VERSION_COMMENT |
+--------+----------------+------------------+
| 1 | 0.13.0 | Set by MetaStore |
| 6 | 0.13.0 | Set by MetaStore |
| 11 | 0.13.0 | Set by MetaStore |
| 16 | 0.13.0 | Set by MetaStore |
| 21 | 0.13.0 | Set by MetaStore |
| 26 | 0.13.0 | Set by MetaStore |
| 31 | 0.13.0 | Set by MetaStore |
+--------+----------------+------------------+
7 rows in set (0.01 sec)
The fix is :
1
delete from version where ver_id<>1;
2. To avoid Hive metastore implicitly inserting into VERSION table, please set hive.metastore.schema.verification to true in hive-site.xml on all the nodes which will connect to this metastore database.


5 comments:

  1. Replies
    1. Good to know that this helps. Feel free to share the issue and resolution under this topic which may help others:)

      Delete
  2. Hi

    Can you throw some light on why insert overwrite runs slow, when I am loading data into a partitioned orc table from text table in hive using dynamic partitioning.

    text table: table_text
    orc table with partitions: table_orc

    Command: INSERT OVERWRITE table table_orc partitioned by (country)
    SELECT * FROM table_text;

    Number of partitions are 1000.

    The mapreduce job ran fine, but process is stucked at "Loading data to table default.table_orc (country=null)"

    I read in one of the links, that partition needs to be added to the metastore and this is an expensive operation to perform. Is it right understanding.

    Link: http://grokbase.com/t/hive/user/1549j1x41h/hive-slow-loading-data-process-with-parquet-over-30k-partitions

    ReplyDelete
  3. Thank you so much, it really worked:-)

    ReplyDelete

Popular Posts