SQL on Hadoop – A Common Tool Comparison
There are many different methods and tools for interacting and querying data within Hadoop. The most widely used tools allow for SQL based querying of the data. The following article summarises a great comparison by MapR of the most common SQL on Hadoop technologies available today.
SQL Mode | Hive | Drill | Impala | Presto | Spark/Shark |
Batch | Interactive | Interactive | Interactive | In-memory /streaming | |
SQL ANSI Completeness | Hive | Drill | Impala | Presto | Spark/Shark |
SELECT query | Medium | Medium | Medium | Medium | Medium |
DDL/DML | Medium | Low | Low | Medium | |
Packaged Analytic functions | Low | Low | |||
UDFs/Custom functions | High | Low | Low | High | |
Client Access | Hive | Drill | Impala | Presto | Spark/Shark |
Shell | Yes | Yes | Yes | Yes | Yes |
JDBC | Yes | Yes | Yes | Yes | Yes |
ODBC | Yes | Yes | Yes | Yes | |
Common File Format Support | Hive | Drill | Impala | Presto | Spark/Shark |
Text | Yes | Yes | Yes | Yes | |
CSV | Yes | Yes | Yes | ||
Sequence | Yes | Yes | Yes | Yes | |
RC | Yes | Yes | Yes | Yes | |
ORC | Yes | ||||
Parquet | Yes | Yes | Yes | ||
Avro | Yes | Yes | Yes | ||
JSON | Yes | Yes | Yes | ||
Compression | Yes | Yes | Yes | ||
Hive SerDe | Yes | Yes | Yes | ||
Data Sources | Hive | Drill | Impala | Presto | Spark/Shark |
Files | Yes | Yes | Yes | Yes | Yes |
HBase | Yes | Yes | Yes | Yes | |
Query non-Hadoop sources? | Yes | Yes | |||
Data Types | Hive | Drill | Impala | Presto | Spark/Shark |
Relational | Yes | Yes | Yes | Yes | Yes |
Complex | Yes | Yes | Yes | Yes | |
Metadata | Hive | Drill | Impala | Presto | Spark/Shark |
Hive Metadata Store | Yes | Yes | Yes | Yes | Yes |
The information in the table above was summarised from MapR’s website here.