Apache Impala is Top-Level Project
November 28, 2017
Apache Impala has graduated from the Apache Incubator to become a
Top-Level Project (TLP), signifying that the project's community and
products have been well-governed under the ASF's meritocratic process
Apache Impala is a modern,
high-performance analytic database for Apache Hadoop. The massively
parallel processing (MPP) SQL query engine allows for analytical queries
on data stored on-premises (in HDFS or Apache Kudu) or in Cloud object
storage via SQL or business intelligence tools without having to migrate
data sets into specialized systems or proprietary formats.
"The Impala project has grown a lot since we entered incubation in
December 2015," said Jim Apple, Vice President of Apache Impala. "With
the help of our mentors and the Incubator, we have grown as a community
and adopted the Apache Way, all while the Impala contributors have
helped make Impala more stable and performant."
In addition to using the same unified storage platform as other Hadoop
components, Impala also uses the same metadata, SQL syntax (Apache Hive
SQL), ODBC driver, and user interface (Impala query UI in Hue) as Hive.
This provides a familiar and unified platform for real-time or
• A familiar SQL interface that data scientists and analysts already
• The ability to query high volumes
of data (Big Data) in Apache Hadoop;
• Distributed queries in a cluster
environment, for convenient scaling and to make use of cost-effective
• The ability to share data files
between different components with no copy or export/import step; for
example, to write with Apache Pig, transform with Hive and query with
Impala. Impala can read from and write to Hive tables, enabling simple
data interchange using Impala for analytics on Hive-produced data; and
• A single system for big data
processing and analytics, so customers can avoid costly modeling and ETL
just for analytics.
Impala was inspired by Google's F1 database, which also separates query
processing from storage management. It was originally released in 2012
and entered the Apache Incubator in December 2015. The project has had
four releases during its incubation process.
"In 2011, we started development of Impala in order to make
state-of-the-art SQL analytics available to the user community as
open-source technology," said Marcel Kornacker, original founder of the
Impala project. "The graduation to an Apache Top-Level Project is a
recognition of the exceptional developer community that stands behind
Apache Impala is deployed across a number of industries such as
financial services, healthcare, and telecommunications, and is in use at
companies that include Caterpillar, Cox Automotive, Jobrapido, Marketing
Associates, the New York Stock Exchange, phData, and Quest Diagnostics.
In addition, Impala is shipped by Cloudera, MapR, and Oracle.
"Apache Impala is our interactive SQL tool of choice. Over 30 phData
customers have it deployed to production," said Brock Noland, Chief
Architect at phData. "Combined with Apache Kudu for real-time storage,
Impala has made architecting IoT and Data Warehousing use-cases dead
simple. We can deploy more production use-cases with fewer people,
delivering increased value to our customers. We're excited to see Impala
graduate to a top-level project and look forward to contributing to its
use Apache Impala to boost performance of our SQL queries against our
data lake," said Matteo Coloberti, Head of Analytics at Jobrapido.
"Impala is an incredible service that gives us impressive performance on
"We used to distribute Microsoft Excel reports to clients every one or
two days but now they can search on their own by customer, sales deal,
or even service type," said Andy Frey, CTO of Marketing Associates.
"Apache Impala is used to query millions of rows to identify specific
records that match the clients' criteria. We've even given clients a
'Query Hadoop' option that allows them to create simple SQL statements
and query Hadoop directly via Impala. We're able to offer a faster,
richer, and more accurate selection of services without the labor or
latency concerns that we used to have."
"The Apache Impala community is growing, and we welcome new contributors
to join in our efforts in our code, documentation, issue tracker, and
discussion forums," added Apple.