SEARCH FINANCIAL SERVICES INFRASTRUCTURE SECURITY SCIENCE INTERVIEWS

 

     

AWS Glue DataBrew GA

November 13, 2020

AWS Glue DataBrew is a new visual data preparation tool that enables customers to clean and normalize data without writing code. Since 2016, data engineers have used AWS Glue to create, run, and monitor extract, transform, and load (ETL) jobs. AWS Glue provides both code-based and visual interfaces, and has dramatically simplified extracting, orchestrating, and loading data in the cloud for customers. Data analysts and data scientists have wanted an easier way to clean and transform this data, and that’s what DataBrew delivers, with a service that allows data exploration and experimentation directly from AWS data lakes, data warehouses, and databases without writing code. AWS Glue DataBrew offers customers over 250 pre-built transformations to automate data preparation tasks (e.g. filtering anomalies, standardizing formats, and correcting invalid values) that would otherwise require days or weeks writing hand-coded transformations. Once the data is prepared, customers can immediately start using it with AWS and third-party analytics and machine learning services to query the data and train machine learning models. There are no upfront commitments or costs to use AWS Glue DataBrew, and customers only pay for creating and running transformations on datasets.

Preparing data for analytics and machine learning involves several necessary and time-consuming tasks, including data extraction, cleaning, normalization, loading, and the orchestration of ETL workflows at scale. For extracting, orchestrating, and loading data at scale, data engineers and ETL developers skilled in SQL or programming languages like Python or Scala can use AWS Glue. ETL developers often prefer the visual interfaces common in modern ETL tools over writing SQL, Python, or Scala, so AWS recently introduced AWS Glue Studio, a new visual interface to help author, run, and monitor ETL jobs without having to write any code. Once the data has been reliably moved, the underlying data still needs to be cleaned and normalized by data analysts and data scientists that operate in the lines of business and understand the context of the data. To clean and normalize the data, data analysts and data scientists have to either work with small batches of the data in Excel or Jupyter Notebooks, which cannot accommodate large data sets, or rely on scarce data engineers and ETL developers to write custom code to perform cleaning and normalization transformations. In an effort to spot anomalies in the data, highly skilled data engineers and ETL developers spend days or weeks writing custom workflows to pull data from different sources, then pivot, transpose, and slice the data multiple times, before they can iterate with data analysts or data scientists to identify and fix data quality issues. After they have developed these transformations, data engineers and ETL developers still need to schedule the custom workflows to run on an ongoing basis, so new incoming data can automatically be cleaned and normalized. Each time a data analyst or data scientist wants to change or add a transformation, the data engineers and ETL developers need to extract, load, clean, normalize, and orchestrate the data preparation tasks over again. This iterative process can take several weeks to months to complete; and as a result, customers spend as much as 80% of their time cleaning and normalizing data instead of actually analyzing the data and extracting value from it.

AWS Glue DataBrew is a visual data preparation tool for AWS Glue that allows data analysts and data scientists to clean and transform data with an interactive, point-and-click visual interface, without writing any code. With AWS Glue DataBrew end users can easily access and visually explore any amount of data across their organization directly from their Amazon Simple Storage Service (S3) data lake, Amazon Redshift data warehouse, and Amazon Aurora and Amazon Relational Database Service (RDS) databases. Customers can choose from over 250 built-in functions to combine, pivot, and transpose the data without writing code. AWS Glue DataBrew recommends data cleaning and normalization steps like filtering anomalies, normalizing data to standard date and time values, generating aggregates for analyses, and correcting invalid, misclassified, or duplicative data. For complex tasks like converting words to a common base or root word (e.g. converting “yearly” and “yearlong” to “year”), AWS Glue DataBrew also provides transformations that use advanced machine learning techniques like Natural Language Processing (NLP). Users can then save these cleaning and normalization steps into a workflow (called a recipe) and apply them automatically to future incoming data. If changes need to be made to the workflow, data analysts and data scientists simply update the cleaning and normalization steps in the recipe, and they are automatically applied to new data as it arrives. AWS Glue DataBrew publishes the prepared data to Amazon S3, which makes it easy for customers to immediately use it in analytics and machine learning. AWS Glue DataBrew is serverless and fully managed, so customers never need to configure, provision, or manage any compute resources.

“AWS customers are using data for analytics and machine learning at an unprecedented pace. However, these customers regularly tell us that their teams spend too much time on the undifferentiated, repetitive, and mundane tasks associated with data preparation,” said Raju Gulabani, VP of Database and Analytics, AWS. “Customers love the scalability and flexibility of code-based data preparation services like AWS Glue, but they could also benefit from allowing business users, data analysts, and data scientists to visually explore and experiment with data independently, without writing code. AWS Glue DataBrew features an easy-to-use visual interface that helps data analysts and data scientists of all technical levels understand, combine, clean, and transform data.”

AWS Glue DataBrew is generally available today in US East (N. Virginia), US East (Ohio), US West (Oregon), EU (Ireland), EU (Frankfurt), Asia Pacific (Sydney), and Asia Pacific (Tokyo), with availability in additional regions coming soon.

Terms of Use | Copyright © 2002 - 2020 CONSTITUENTWORKS SM  CORPORATION. All rights reserved. | Privacy Statement