metaflow
Reasonable Scale Machine Learning with Open-Source Metaflow
Tagliabue, Jacopo, Bowne-Anderson, Hugo, Tuulos, Ville, Goyal, Savin, Cledat, Romain, Berg, David
As Machine Learning (ML) gains adoption across industries and new use cases, practitioners increasingly realize the challenges around effectively developing and iterating on ML systems: reproducibility, debugging, scalability, and documentation are elusive goals for real-world pipelines outside tech-first companies. In this paper, we review the nature of ML-oriented workloads and argue that re-purposing existing tools won't solve the current productivity issues, as ML peculiarities warrant specialized development tooling. We then introduce Metaflow, an open-source framework for ML projects explicitly designed to boost the productivity of data practitioners by abstracting away the execution of ML code from the definition of the business logic. We show how our design addresses the main challenges in ML operations (MLOps), and document through examples, interviews and use cases its practical impact on the field.
- North America > United States > New York > New York County > New York City (0.05)
- North America > United States > Hawaii > Honolulu County > Kailua (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- Information Technology > Services (1.00)
- Media (0.71)
- Government (0.68)
- Health & Medicine > Therapeutic Area > Endocrinology (0.46)
Machine Learning with the Modern Data Stack: A Case Study
A lot has already been said about the modern data stack (MDS) but the situation is significantly more scattered on the machine learning side of the fence: once data is properly transformed, how is it consumed downstream to produce business value? This post is intended for anybody wanting to bridge the gap between working with data and actually delivering business value using machine learning. The modern data stack (MDS) has been consolidated as a series of best practices around data collection, storage and transformation. A lot has been said already about the MDS as such, but the situation is more "scattered" on the other side of the fence: once data is properly transformed, how is that consumed downstream to produce business value? At the end of the day, ingesting and transforming data is not (for most companies) an end in itself: while tech giants figured out a while ago how to "get models in production", most companies still struggle to productionize a model in less than 3 months.
The Post-Modern Stack
If you followed us closely, Episode 4 brought us to the limit of Data Land and at the start of ML Land: it is now time to close the circle, and take those nicely transformed data rows into a machine learning model serving predictions to users. Clone the repo, check the video, buckle-up, and join us for one last trip together. The modern data stack (MDS) has been consolidating a number of best practices around data collection, storage and transformation. The web is full of examples (including our own!) of how to set up the MDS. However, they may leave you wondering what happens "on the ML side": once data is pre-aggregated and features pre-computed, how is that consumed downstream to produce business value? This post sets out to answer this question, by proposing a lightweight toolchain that leverages Metaflow as the backbone for ML operations: the community reaction to the "Bigger boat" repo has been overwhelmingly positive, but we thought we should also put forward a low-touch alternative for teams that want quicker start.
Integrating Pythonic visual reports into ML pipelines
These cards make it extremely easy to attach custom visual reports in every workflow, without having to install any additional tooling or infrastructure. The feature, which is developed together with Metaflow users at Coveo, is motivated by the ubiquitous needs of modern, data-centric ML workflows. If you'd like to get started right away, you can go straight to the documentation. Or watch the video below for a quick tour (without sound)! Model documentation needs to be about the entire ML pipeline.
GitHub - Netflix/metaflow: Build and manage real-life data science projects with ease!
Metaflow is a human-friendly Python/R library that helps scientists and engineers build and manage real-life data science projects. Metaflow was originally developed at Netflix to boost productivity of data scientists who work on a wide variety of projects from classical statistics to state-of-the-art deep learning. For more information, see Metaflow's website and documentation. Getting up and running with Metaflow is easy. Please see our contribution guide for more details.
- Media > Television (0.66)
- Media > Film (0.66)
- Information Technology > Services (0.66)
Starting your data science project with Metaflow? The MNIST use-case.
The priority of data scientists simply lies in picking out the right features, building and deploying their models, they do not like to be particularly bothered about other aspects like model versioning, job scheduling, flow architecture, compute resources management, which is needed to make operationalizing data science successful. Metaflow is an open-source tool by Netflix for managing data science workflows. It aims to boost the productivity of data scientists by allowing them to focus on actual data science work and by facilitating faster productionization of their deliverables. If you are familiar with Airflow or Luigi then you would understand the function of Metaflow. It allows you to run your data science process in steps, so each step is a node in the process and the nodes are connected like a graph as seen below.
10 MLops platforms to manage the machine learning lifecycle
For most professional software developers, using application lifecycle management (ALM) is a given. Data scientists, many of whom do not have a software development background, often have not used lifecycle management for their machine learning models. That's a problem that's much easier to fix now than it was a few years ago, thanks to the advent of "MLops" environments and frameworks that support machine learning lifecycle management. The easy answer to this question would be that machine learning lifecycle management is the same as ALM, but that would also be wrong. That's because the lifecycle of a machine learning model is different from the software development lifecycle (SDLC) in a number of ways.
- Education (0.96)
- Information Technology > Services (0.48)
- Information Technology > Software (0.37)
Unbundling Data Science Workflows with Metaflow and AWS Step Functions
We believe that problems are solved by people, not by tools. Following our human-centric, usability-driven approach, data scientists shouldn't have to care about the lower layers of the stack -- they should just work -- but we believe that there is no benefit in trying to pretend that the stack doesn't exist, which would be problematic especially when things fail. This article focuses on the job scheduler layer and the two layers that surround it: The architecture layer that defines the structure of the user's code, and the compute layer that defines how the code is executed. Since the initial open-source release of Metaflow, we have heard questions about how Metaflow compares to other workflow schedulers or how Metaflow workflows should be executed in production. The answer to both of these questions is the same: Metaflow is designed to be used in conjunction with a production-grade job scheduler. Today, we are releasing the first open-source integration with such a scheduler, AWS Step Functions, which you can use to execute your Metaflow workflows in a scalable and highly-available manner. Before going into details about AWS Step Functions, we want to highlight the role of the job scheduling layer in the Metaflow stack.
- Information Technology > Data Science (0.95)
- Information Technology > Artificial Intelligence (0.70)
Listen to Metaflow: Netflix Machine Learning Platform with Savin Goyal
"Brings us to aws flicks took the at the time unconventional decision to go all in on aws many years ago at this point, and that's treated. The the whole idea around blessed programming languages where you make a strong decision within an organization to restrict the number of programming languages with an organization and it it that constraint ends up helping the organization make decisions more quickly and allow for engineering mobility and so on. This has been the case with aws when when Netflix? Strongly moved onto aws and continue to do that. That extends to medfly show. A better flow is an open source framework, but it has a tight coupling with aws. So why is the tight coupling to aws useful for machine learning framework? Sue I won't say that. We are tightly coupled to eight of us. So when leave it open sourcing MEDOFF. No at that point in time, because we had a good amount of operational expertise with aws, we chose indicating the details are ready for this cloud integration, ...
- Information Technology > Services (0.73)
- Media > Television (0.63)
- Media > Film (0.63)
How Data Science is Boosting Netflix
Considering how long Netflix has been in the streaming business, it has stacked up heaps of data about its viewers, such as their age, gender, location, their taste in media, to name a few. By gathering information across every customer interaction, Netflix can dive right into the minds of its viewers and get an idea of what they might like to watch next even before they finish a show or movie. We have data that suggests there is different viewing behavior depending on the day of the week, the time of day, the device, and sometimes even the location. Netflix has a massive user base of more than 140 million subscribers. Over time, Netflix has deployed several algorithms and mechanisms that make use of this data and generate critical insights that help steer the company in the right direction.
- Media > Television (1.00)
- Media > Film (1.00)
- Information Technology > Services (1.00)