Goto

Collaborating Authors

 aw glue


Migrating from AWS Glue to BigQuery for ETL

#artificialintelligence

Our journey with AWS Glue was a bit of a struggle once we started to dig deeper into the streaming functionality of it, the orchestration of so many layers added a huge overhead that we weren't expecting and whilst most of that is handled within the AWS suite of products, there are just too many benefits to switching our pipelines over to GCP and BigQuery to be ignored. Next steps are to finalise our deployment by using Cloud Composer (Airflow) to orchestrate the creation of each of the tables and provide a monitoring dashboard to help us detect failures and act on them. I will say that AWS got in touch with me after my previous article and I got on a call with the AWS Glue product team, in their words I had "hit pretty much every sharp edge possible" (seems to be a running theme with me -- perhaps I should switch careers to QA engineer?),


Extend Amazon SageMaker Pipelines to include custom steps using callback steps

#artificialintelligence

Launched at AWS re:Invent 2020, Amazon SageMaker Pipelines is the first purpose-built, easy-to-use continuous integration and continuous delivery (CI/CD) service for machine learning (ML). With Pipelines, you can create, automate, and manage end-to-end ML workflows at scale. You can extend your pipelines to include steps for tasks performed outside of Amazon SageMaker by taking advantage of custom callback steps. This feature lets you include tasks that are performed using other AWS services, third parties, or tasks run outside AWS. Before the launch of this feature, steps within a pipeline were limited to the supported native SageMaker steps.


Spark MLlib on AWS Glue

#artificialintelligence

AWS pushes Sagemaker as its machine learning platform. However, Spark's MLlib is a comprehensive library that runs distributed ML natively on AWS Glue -- and provides a viable alternative to their primary ML platform. One of the big benefits of Sagemaker is that it easily supports experimentation via its Jupyter Notebooks. But operationalising your Sagemaker ML can be difficult, particularly if you need to include ETL processing at the start of your pipeline. In this situation, Apache Spark's MLlib running on AWS Glue can be a good option -- by its very nature, it is immediately operationalised, integrated with ETL pre-processing and ready to be used in production for an end-to-end machine learning pipeline.


Setting up Amazon Personalize with AWS Glue

#artificialintelligence

Data can be used in a variety of ways to satisfy the needs of different business units, such as marketing, sales, or product. In this post, we focus on using data to create personalized recommendations to improve end-user engagement. Most ecommerce applications consume a huge amount of customer data that can be used to provide personalized recommendations; however, that data may not be cleaned or in the right format to provide those valuable insights. The goal of this post is to demonstrate how to use AWS Glue to extract, transform, and load your JSON data into a cleaned CSV format. We then show you how to run a recommendation engine powered by Amazon Personalize on your user interaction data to provide a tailored experience for your customers.


AWS: Your complete guide to Amazon Web Services & features

#artificialintelligence

In the current age of cloud computing, there is now a multitude of mature services available -- offering security, scalability, and reliability for many business computing needs. What was once a colossal undertaking to build a data center, install server racks, and design storage arrays has given way to an entire marketplace of services that are always just a click away. One leader in that marketplace is Amazon Web Services, which consists of 175 products and services in a vast catalog that provides cloud storage, compute power, app deployment, user account management, data warehousing, tools for managing and controlling Internet of Things devices, and just about anything you can think of that a business needs. AWS really grew in popularity and capability over the last decade. One reason is that AWS is so reliable and secure.



The Seven Design Principles of an AI-Ready Data Architecture

#artificialintelligence

AI is having a big impact on organizations of all sizes, across all industries. But if you don't have the proper data architecture in place to support AI and machine learning, you're likely to be disappointed in the results you're seeing. Here are seven principles to consider for an AI-ready data architecture. Artificial intelligence (AI) is all about data, all the time. Does your IT team's architecture enable computations to be performed on demand?


AWS Hopes Macie Machine Learning Tool Will Stem Cloud Data Loss

#artificialintelligence

Amazon has unveiled a machine learning-based tool aimed at securing sensitive data held in the cloud, after a number of high-profile data leaks involving customers of Amazon Web Services (AWS). The tool, called Macie, was announced at the AWS New York Summit event along with an automated extract, transform and load (ETL) service and a unified repository of AWS' data migration tools. The announcement follows several data breaches in which major companies were found to have stored sensitive data on AWS Simple Storage Service (S3) in a way that left it publicly accessible. Last month it was disclosed that Verizon had exposed data on about 6 million customers in this way, and similar incidents have affected voter information held by the Republican National Committee (RNC) and customer data exposed by wrestling entertainment company WWE. The RNC breach, disclosed in June, affected more than 198 million people, or about 61 percent of the US population, and was the country's largest-ever voter data exposure.