Goto

Collaborating Authors

 machine learning system



A Metrics-Oriented Architectural Model to Characterize Complexity on Machine Learning-Enabled Systems

Ferreira, Renato Cordeiro

arXiv.org Artificial Intelligence

--How can the complexity of ML-enabled systems be managed effectively? The goal of this research is to investigate how complexity affects ML-Enabled Systems (MLES). T o address this question, this research aims to introduce a metrics-based architectural model to characterize the complexity of MLES. The goal is to support architectural decisions, providing a guideline for the inception and growth of these systems. This paper showcases the first step for creating the metrics-based architectural model: an extension of a reference architecture that can describe MLES to collect their metrics.


Hidden Technical Debt in Machine Learning Systems

Neural Information Processing Systems

Machine learning offers a fantastically powerful toolkit for building useful complexprediction systems quickly. This paper argues it is dangerous to think ofthese quick wins as coming for free. Using the software engineering frameworkof technical debt, we find it is common to incur massive ongoing maintenancecosts in real-world ML systems. We explore several ML-specific risk factors toaccount for in system design. These include boundary erosion, entanglement,hidden feedback loops, undeclared consumers, data dependencies, configurationissues, changes in the external world, and a variety of system-level anti-patterns.


A Conceptual Framework for Ethical Evaluation of Machine Learning Systems

Gupta, Neha R., Hullman, Jessica, Subramonyam, Hari

arXiv.org Artificial Intelligence

Research in Responsible AI has developed a range of principles and practices to ensure that machine learning systems are used in a manner that is ethical and aligned with human values. However, a critical yet often neglected aspect of ethical ML is the ethical implications that appear when designing evaluations of ML systems. For instance, teams may have to balance a trade-off between highly informative tests to ensure downstream product safety, with potential fairness harms inherent to the implemented testing procedures. We conceptualize ethics-related concerns in standard ML evaluation techniques. Specifically, we present a utility framework, characterizing the key trade-off in ethical evaluation as balancing information gain against potential ethical harms. The framework is then a tool for characterizing challenges teams face, and systematically disentangling competing considerations that teams seek to balance. Differentiating between different types of issues encountered in evaluation allows us to highlight best practices from analogous domains, such as clinical trials and automotive crash testing, which navigate these issues in ways that can offer inspiration to improve evaluation processes in ML. Our analysis underscores the critical need for development teams to deliberately assess and manage ethical complexities that arise during the evaluation of ML systems, and for the industry to move towards designing institutional policies to support ethical evaluations.


Toward Cross-Layer Energy Optimizations in Machine Learning Systems

Chung, Jae-Won, Chowdhury, Mosharaf

arXiv.org Artificial Intelligence

The enormous energy consumption of machine learning (ML) and generative AI workloads shows no sign of waning, taking a toll on operating costs, power delivery, and environmental sustainability. Despite a long line of research on energy-efficient hardware, we found that software plays a critical role in ML energy optimization through two recent works: Zeus and Perseus. This is especially true for large language models (LLMs) because their model sizes and, therefore, energy demands are growing faster than hardware efficiency improvements. Therefore, we advocate for a cross-layer approach for energy optimizations in ML systems, where hardware provides architectural support that pushes energy-efficient software further, while software leverages and abstracts the hardware to develop techniques that bring hardware-agnostic energy-efficiency gains.


Towards MLOps: A DevOps Tools Recommender System for Machine Learning System

Shah, Pir Sami Ullah, Ahmad, Naveed, Beg, Mirza Omer

arXiv.org Artificial Intelligence

Applying DevOps practices to machine learning system is termed as MLOps and machine learning systems evolve on new data unlike traditional systems on requirements. The objective of MLOps is to establish a connection between different open-source tools to construct a pipeline that can automatically perform steps to construct a dataset, train the machine learning model and deploy the model to the production as well as store different versions of model and dataset. Benefits of MLOps is to make sure the fast delivery of the new trained models to the production to have accurate results. Furthermore, MLOps practice impacts the overall quality of the software products and is completely dependent on open-source tools and selection of relevant open-source tools is considered as challenged while a generalized method to select an appropriate open-source tools is desirable. In this paper, we present a framework for recommendation system that processes the contextual information (e.g., nature of data, type of the data) of the machine learning project and recommends a relevant toolchain (tech-stack) for the operationalization of machine learning systems. To check the applicability of the proposed framework, four different approaches i.e., rule-based, random forest, decision trees and k-nearest neighbors were investigated where precision, recall and f-score is measured, the random forest out classed other approaches with highest f-score value of 0.66.


Formal and Practical Elements for the Certification of Machine Learning Systems

Durand, Jean-Guillaume, Dubois, Arthur, Moss, Robert J.

arXiv.org Artificial Intelligence

Over the past decade, machine learning has demonstrated impressive results, often surpassing human capabilities in sensing tasks relevant to autonomous flight. Unlike traditional aerospace software, the parameters of machine learning models are not hand-coded nor derived from physics but learned from data. They are automatically adjusted during a training phase, and their values do not usually correspond to physical requirements. As a result, requirements cannot be directly traced to lines of code, hindering the current bottom-up aerospace certification paradigm. This paper attempts to address this gap by 1) demystifying the inner workings and processes to build machine learning models, 2) formally establishing theoretical guarantees given by those processes, and 3) complementing these formal elements with practical considerations to develop a complete certification argument for safety-critical machine learning systems. Based on a scalable statistical verifier, our proposed framework is model-agnostic and tool-independent, making it adaptable to many use cases in the industry. We demonstrate results on a widespread application in autonomous flight: vision-based landing.


Best Practices for Machine Learning Systems: An Industrial Framework for Analysis and Optimization

Chouliaras, Georgios Christos, Kiełczewski, Kornel, Beka, Amit, Konopnicki, David, Bernardi, Lucas

arXiv.org Artificial Intelligence

In the last few years, the Machine Learning (ML) and Artificial Intelligence community has developed an increasing interest in Software Engineering (SE) for ML Systems leading to a proliferation of best practices, rules, and guidelines aiming at improving the quality of the software of ML Systems. However, understanding their impact on the overall quality has received less attention. Practices are usually presented in a prescriptive manner, without an explicit connection to their overall contribution to software quality. Based on the observation that different practices influence different aspects of software-quality and that one single quality aspect might be addressed by several practices we propose a framework to analyse sets of best practices with focus on quality impact and prioritization of their implementation. We first introduce a hierarchical Software Quality Model (SQM) specifically tailored for ML Systems. Relying on expert knowledge, the connection between individual practices and software quality aspects is explicitly elicited for a large set of well-established practices. Applying set-function optimization techniques we can answer questions such as what is the set of practices that maximizes SQM coverage, what are the most important ones, which practices should be implemented in order to improve specific quality aspects, among others. We illustrate the usage of our framework by analyzing well-known sets of practices.


Machine Learning Systems are Bloated and Vulnerable

Zhang, Huaifeng, Ahmed, Fahmi Abdulqadir, Fatih, Dyako, Kitessa, Akayou, Alhanahnah, Mohannad, Leitner, Philipp, Ali-Eldin, Ahmed

arXiv.org Artificial Intelligence

Today's software is bloated with both code and features that are not used by most users. This bloat is prevalent across the entire software stack, from the operating system, all the way to software backends, frontends, and web-pages. In this paper, we focus on analyzing and quantifying bloat in machine learning containers. We develop MMLB, a framework to analyze bloat in machine learning containers, measuring the amount of bloat that exists on the container and package levels. Our tool quantifies the sources of bloat and integrates with vulnerability analysis tools to evaluate the impact of bloat on container vulnerabilities. Through experimentation with 15 machine learning containers from Tensorflow, Pytorch, and NVIDIA, we show that bloat is a significant issue, accounting for up to 80% of the container size in some cases. Our results demonstrate that bloat significantly increases the container provisioning time by up to 370% and exacerbates vulnerabilities by up to 99%.


SAI #21: What is Continuous Training (CT) in Machine Learning Systems?

#artificialintelligence

Any Metadata related to ML artifact creation is tracked here. We also track performance of the ML Model. Experiments become reproducible and comparable between each other. Model Registry could and in some cases should be treated as part of ML Metadata Store.