Amazon Managed Workflows for Apache Airflow GA
November 30, 2020
Amazon
Managed Workflows for Apache Airflow (MWAA) is a new managed service
that makes it easy for data engineers to execute data processing
workflows in the cloud. Apache Airflow is a popular open-source tool
that helps customers author, schedule, and monitor workflows. With
Amazon MWAA, customers can use the same familiar Airflow platform as
they do today to manage their workflows, and enjoy improved scalability,
availability, and security without the burden of having to build, scale,
and manage the underlying infrastructure. Amazon MWAA scales workflow
execution capacity based on customer needs, and integrates with AWS
security services to provide secure access to customers’ data. There are
no up-front investments required to use Amazon MWAA and customers only
pay for what they use.
Today, customers are using analytics and machine learning to derive
insights from massive amounts of data. To effectively use this data,
customers often need to first build a workflow that defines a series of
sequential tasks to prepare and process the data. Tens of thousands of
customers use AWS Step Functions to visually build and run
cost-effective and scalable event-driven workflows that execute tasks
across multiple AWS services. There are also customers who want the
Apache Airflow orchestration workflow, which has an active open source
community, a large library of pre-built integrations to third-party data
processing tools like Apache Spark and Hadoop, and the ability to use
Python scripts to create workflows. However, using Apache Airflow
requires data engineers to install, maintain, scale, and secure the
Apache Airflow environments, which adds cost and operational complexity.
Furthermore, to support role-based authentication for secure access,
Apache Airflow often requires a manual, iterative, and error-prone
combination of configuration changes, command-line interface (CLI)
commands, and, in some cases, edits to the Apache Airflow code.
Customers also must integrate and configure additional tools for
alerting for issues like system downtime, workflow errors, and task
execution delays. While customers really enjoy the pre-built
integrations and familiar Python programming language of Apache Airflow,
they want it without the added operational cost and complexity.
Amazon MWAA makes it easy for customers to build and execute Apache
Airflow workflows in AWS. Amazon MWAA manages the provisioning and
ongoing maintenance of Apache Airflow so customers no longer need to
worry about patching, scaling, or securing self-managed Apache Airflow
implementations. With Amazon MWAA, compute resources that execute tasks
are scaled on demand, providing consistent performance for users.
Customer data is secure by default as workloads run in customers’ own
isolated and secure cloud environments using Amazon’s Virtual Private
Cloud (Amazon VPC), with stored data encrypted using AWS Key Management
Service (AWS KMS). Amazon MWAA makes it easy for customers to combine
data using any of Apache Airflow’s integrations, including AWS services
and popular third-party tools like Apache Hadoop, Presto, Hive, and
Spark, to automate data processing, machine learning pipelines, and
software development and operations. Customers can provide role-based
access to Apache Airflow’s user interface easily and securely via AWS
Identity and Access Management (IAM), providing users Single Sign-On (SSO)
access for scheduling and viewing their workflow executions. Amazon MWAA
automatically sends Apache Airflow system metrics and logs to AWS’s
monitoring service, Amazon CloudWatch, making it easy for customers to
view task execution delays and workflow errors across one or more
environments without third party tools. With Amazon MWAA, data engineers
get the extensibility of Apache Airflow with the scalability,
availability, and security of AWS.
“Customers have told us they really like Apache Airflow because it
speeds the development of their data processing and machine learning
workflows, but they want it without the burden of scaling, operating,
and securing servers,” said Jesse Dougherty, Vice President, Application
Integration, AWS. “With Amazon MWAA, customers can use the same Apache
Airflow platform as they do today with the scalability, availability,
and security of AWS.”
Customers can launch a new Amazon MWAA environment from the AWS
Management Console, CLI, AWS CloudFormation, or AWS SDK, and start
running in minutes. Amazon MWAA is available today in US East (Northern
Virginia), US West (Oregon), US East (Ohio), Asia Pacific (Singapore),
Asia Pacific (Tokyo), Asia Pacific (Sydney), Europe (Ireland), Europe
(Frankfurt), and Europe (Stockholm), with more regions to come.
The Pokémon Company International, a subsidiary of The Pokémon Company
in Japan, manages the property outside of Asia and is responsible for
brand management, licensing, marketing, the Pokémon Trading Card Game,
the animated TV series, home entertainment, and the official Pokémon
website. “Amazon Managed Workflows for Apache Airflow meshes with our
security policy by providing single sign-on controlled access through
IAM roles and the ability to restrict access to our Amazon Virtual
Private Cloud,” said Eric Smith, Data Platform Engineer at The Pokémon
Company International. “With Amazon MWAA, we can focus on building
reliable data pipelines that achieve business goals rather than patching
and securing instances.”
Detroit-based
Rocket Mortgage, the nation’s largest mortgage lender, enables the
American Dream of homeownership and financial freedom through an
industry-leading, digital-driven client experience. “Amazon Managed
Workflows for Apache Airflow has helped us grow and scale our data
science and machine learning workflows with significantly less
infrastructure overhead," said Dan Jones, Senior Vice President of Data
Intelligence for Rocket Mortgage. "With this new service, our technology
teams are able to deliver best-in-class, data-driven solutions faster
than ever before."
GoDaddy is the company that empowers everyday entrepreneurs. With more
than 20 million customers worldwide, GoDaddy is the place people come to
name their ideas, build a professional website, attract customers, and
manage their work. “Amazon Managed Workflows for Apache Airflow solves
one of the biggest operational overheads with orchestration,” said
Jeremy Zogg, Senior Director of Engineering at GoDaddy. “We have spent a
lot of hours setting up, configuring, scaling, and monitoring our
on-premises Apache Airflow instances. This was our top challenge for our
workflow deployments and we’re excited to migrate and concentrate on
what we do best: harnessing the power of data to drive great outcomes
for our customers and business.” |