ml system
Magneton: Optimizing Energy Efficiency of ML Systems via Differential Energy Debugging
Pan, Yi, Qian, Wenbo, Xie, Dedong, Hu, Ruiyan, Hu, Yigong, Kasikci, Baris
The training and deployment of machine learning (ML) models have become extremely energy-intensive. While existing optimization efforts focus primarily on hardware energy efficiency, a significant but overlooked source of inefficiency is software energy waste caused by poor software design. This often includes redundant or poorly designed operations that consume more energy without improving performance. These inefficiencies arise in widely used ML frameworks and applications, yet developers often lack the visibility and tools to detect and diagnose them. We propose differential energy debugging, a novel approach that leverages the observation that competing ML systems often implement similar functionality with vastly different energy consumption. Building on this insight, we design and implement Magneton, an energy profiler that compares energy consumption between similar ML systems at the operator level and automatically pinpoints code regions and configuration choices responsible for excessive energy use. Applied to 9 popular ML systems spanning LLM inference, general ML frameworks, and image generation, Magneton detects and diagnoses 16 known cases of software energy inefficiency and further discovers 8 previously unknown cases, 7 of which have been confirmed by developers.
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
Surface Reading LLMs: Synthetic Text and its Styles
Despite a potential plateau in ML advancement, the societal impact of large language models lies not in approaching superintelligence but in generating text surfaces indistinguishable from human writing. While Critical AI Studies provides essential material and socio-technical critique, it risks overlooking how LLMs phenomenologically reshape meaning-making. This paper proposes a semiotics of "surface integrity" as attending to the immediate plane where LLMs inscribe themselves into human communication. I distinguish three knowledge interests in ML research (epistemology, epistēmē, and epistemics) and argue for integrating surface-level stylistic analysis alongside depth-oriented critique. Through two case studies examining stylistic markers of synthetic text, I argue how attending to style as a semiotic phenomenon reveals LLMs as cultural machines that transform the conditions of meaning emergence and circulation in contemporary discourse, independent of questions about machine consciousness.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > California > Santa Clara County > Stanford (0.14)
- (12 more...)
Point-level Uncertainty Evaluation of Mobile Laser Scanning Point Clouds
Xu, Ziyang, Wysocki, Olaf, Holst, Christoph
Y et, despite this progress, the point clouds acquired by MLS systems operating in real-world environments inevitably contain uncertainty arising from various error sources during acquisition and processing. Although MLS systems have advanced rapidly in both data collection and post-processing, research on uncertainty evaluation has received comparatively less attention and remains underdeveloped (Xu et al., 2025b). From a user's perspective, the quality of point clouds from MLS systems is a critical concern. As the foundational input for many downstream tasks, inadequate assessment of MLS point clouds' quality can easily impact high-precision applications such as navigation and change analysis. This will not only undermine reliability but also result in substantial waste of time and resources, which is unacceptable in real-world applications. There is a clear need for automated and reliable solutions for uncertainty evaluation. In MLS systems, four main categories of error sources contribute to uncertainty: instrumental errors, atmospheric errors, object-and geometry-related errors, and trajectory estimation errors (Habib et al., 2009, Schenk, 2001). Considering the characteristics of these error sources, existing uncertainty evaluation methods can be broadly divided into two categories: forward modeling and backward modeling (Shi et al., 2021). The core idea of forward modeling is grounded in variance-covariance propagation, which involves detailed theoretical analysis of MLS system errors.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.05)
- Asia > Taiwan (0.04)
On the Societal Impact of Machine Learning
This PhD thesis investigates the societal impact of machine learning (ML). ML increasingly informs consequential decisions and recommendations, significantly affecting many aspects of our lives. As these data-driven systems are often developed without explicit fairness considerations, they carry the risk of discriminatory effects. The contributions in this thesis enable more appropriate measurement of fairness in ML systems, systematic decomposition of ML systems to anticipate bias dynamics, and effective interventions that reduce algorithmic discrimination while maintaining system utility. I conclude by discussing ongoing challenges and future research directions as ML systems, including generative artificial intelligence, become increasingly integrated into society. This work offers a foundation for ensuring that ML's societal impact aligns with broader social values.
- Europe > Switzerland > Zürich > Zürich (0.04)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
- Europe > Austria (0.04)
- (11 more...)
- Social Sector (1.00)
- Law (1.00)
- Health & Medicine (1.00)
- (3 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)
Phantora: Maximizing Code Reuse in Simulation-based Machine Learning System Performance Estimation
Qin, Jianxing, Chen, Jingrong, Kong, Xinhao, Wu, Yongji, Yuan, Tianjun, Luo, Liang, Wang, Zhaodong, Zhang, Ying, Chen, Tingjun, Lebeck, Alvin R., Zhuo, Danyang
Modern machine learning (ML) training workloads place substantial demands on both computational and communication resources. Consequently, accurate performance estimation has become increasingly critical for guiding system design decisions, such as the selection of parallelization strategies, cluster configurations, and hardware provisioning. Existing simulation-based performance estimation requires reimplementing the ML framework in a simulator, which demands significant manual effort and is hard to maintain as ML frameworks evolve rapidly. This paper introduces Phantora, a hybrid GPU cluster simulator designed for performance estimation of ML training workloads. Phantora executes unmodified ML frameworks as is within a distributed, containerized environment. Each container emulates the behavior of a GPU server in a large-scale cluster, while Phantora intercepts and simulates GPU- and communication-related operations to provide high-fidelity performance estimation. We call this approach hybrid simulation of ML systems, in contrast to traditional methods that simulate static workloads. The primary advantage of hybrid simulation is that it allows direct reuse of ML framework source code in simulation, avoiding the need for reimplementation. Our evaluation shows that Phantora provides accuracy comparable to static workload simulation while supporting three state-of-the-art LLM training frameworks out-of-the-box. In addition, Phantora operates on a single GPU, eliminating the need for the resource-intensive trace collection and workload extraction steps required by traditional trace-based simulators. Phantora is open-sourced at https://github.com/QDelta/Phantora.
- North America > United States > Wisconsin (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- North America > United States > Colorado > Broomfield County > Broomfield (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.46)
Understanding Practitioners Perspectives on Monitoring Machine Learning Systems
Naveed, Hira, Grundy, John, Arora, Chetan, Khalajzadeh, Hourieh, Haggag, Omar
--Given the inherent non-deterministic nature of machine learning (ML) systems, their behavior in production environments can lead to unforeseen and potentially dangerous outcomes. For a timely detection of unwanted behavior and to prevent organizations from financial and reputational damage, monitoring these systems is essential. This paper explores the strategies, challenges, and improvement opportunities for monitoring ML systems from the practitioners' perspective. We conducted a global survey of 91 ML practitioners to collect diverse insights into current monitoring practices for ML systems. We aim to complement existing research through our qualitative and quantitative analyses, focusing on prevalent runtime issues, industrial monitoring and mitigation practices, key challenges, and desired enhancements in future monitoring tools. Our findings reveal that practitioners frequently struggle with runtime issues related to declining model performance, exceeding latency, and security violations. While most prefer automated monitoring for its increased efficiency, many still rely on manual approaches due to the complexity or lack of appropriate automation solutions. Practitioners report that the initial setup and configuration of monitoring tools is often complicated and challenging, particularly when integrating with ML systems and setting alert thresholds. Moreover, practitioners find that monitoring adds extra workload, strains resources, and causes alert fatigue. The desired improvements from the practitioners' perspective are: automated generation and deployment of monitors, improved support for performance and fairness monitoring, and recommendations for resolving runtime issues. These insights offer valuable guidance for the future development of ML monitoring tools that are better aligned with practitioners' needs. Machine Learning (ML) systems are being increasingly employed across various domains, including social media, e-commerce, and engineering - even critical domains such as finance, healthcare, and autonomous vehicles nowadays leverage ML to automate and enhance their services. Generative AI and Large Language Models (LLMs) have further boosted ML adoption by creating several new use cases [1], [2]. A typical ML system lifecycle begins by gathering requirements and preparing data, which is followed by the development of the ML component (experimentation, model training, and evaluation) and other traditional software components [3]. After development, the next step is integration and system testing. Once quality assurance is completed, the ML system is deployed to a production environment.
- Oceania > Australia (0.05)
- North America > United States (0.04)
- Asia > Pakistan (0.04)
- Research Report > New Finding (1.00)
- Questionnaire & Opinion Survey (1.00)
- Overview (1.00)
- Research Report > Experimental Study (0.93)
Philosophy-informed Machine Learning
A deep dive into the open literature shows that there are t hree fundamental limitations to current ML approaches, namely blackbox brittleness (which renders models uninterpretable and unreliable under distribution shift [2]), causal blindness (which conflates correlation with causation [3]), and alignment failures (which produce systems optimizing objectives misaligned with human values [4]) . These deficiencies stem from a profound philosophical poverty in how ML conceptualizes knowledge, reasoning, and values. The first fundamental limitation, b lackbox brittleness, manifests when trained models fail on seemingly trivial variations of their training distribution. For example, a vision model that accurately identifies stop signs under normal conditions might misclassify them entirely when small adversarial perturbations are applied [5] . Not surprisingly, t h e same brittleness extends beyond adversarial examples to everyday distribution shifts (e.g., natural language processing models exhibit performance degradation when processing text from different cultural contexts, etc.) [6] .
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > New Jersey (0.04)
- Research Report (0.64)
- Overview (0.46)
- Law (0.67)
- Health & Medicine > Therapeutic Area (0.32)
Monitoring Machine Learning Systems: A Multivocal Literature Review
Naveed, Hira, Barnett, Scott, Arora, Chetan, Grundy, John, Khalajzadeh, Hourieh, Haggag, Omar
Context: Dynamic production environments make it challenging to maintain reliable machine learning (ML) systems. Runtime issues, such as changes in data patterns or operating contexts, that degrade model performance are a common occurrence in production settings. Monitoring enables early detection and mitigation of these runtime issues, helping maintain users' trust and prevent unwanted consequences for organizations. Aim: This study aims to provide a comprehensive overview of the ML monitoring literature. Method: We conducted a multivocal literature review (MLR) following the well established guidelines by Garousi to investigate various aspects of ML monitoring approaches in 136 papers. Results: We analyzed selected studies based on four key areas: (1) the motivations, goals, and context; (2) the monitored aspects, specific techniques, metrics, and tools; (3) the contributions and benefits; and (4) the current limitations. We also discuss several insights found in the studies, their implications, and recommendations for future research and practice. Conclusion: Our MLR identifies and summarizes ML monitoring practices and gaps, emphasizing similarities and disconnects between formal and gray literature. Our study is valuable for both academics and practitioners, as it helps select appropriate solutions, highlights limitations in current approaches, and provides future directions for research and tool development.
- Oceania > Australia (0.14)
- Europe > France (0.04)
- Europe > Austria > Upper Austria > Linz (0.04)
- (4 more...)
- Research Report > New Finding (1.00)
- Overview (1.00)
- Information Technology > Services (1.00)
- Information Technology > Security & Privacy (1.00)
- Education (1.00)
- (4 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
- (2 more...)
Data Requirement Goal Modeling for Machine Learning Systems
Yamani, Asma, AlAmoudi, Nadeen, Albilali, Salma, Baslyman, Malak, Hassine, Jameleddine
Machine Learning (ML) has been integrated into various software and systems. Two main components are essential for training an ML model: the training data and the ML algorithm. Given the critical role of data in ML system development, it has become increasingly important to assess the quality of data attributes and ensure that the data meets specific requirements before its utilization. This work proposes an approach to guide non-experts in identifying data requirements for ML systems using goal modeling. In this approach, we first develop the Data Requirement Goal Model (DRGM) by surveying the white literature to identify and categorize the issues and challenges faced by data scientists and requirement engineers working on ML-related projects. An initial DRGM was built to accommodate common tasks that would generalize across projects. Then, based on insights from both white and gray literature, a customization mechanism is built to help adjust the tasks, KPIs, and goals' importance of different elements within the DRGM. The generated model can aid its users in evaluating different datasets using GRL evaluation strategies. We then validate the approach through two illustrative examples based on real-world projects. The results from the illustrative examples demonstrate that the data requirements identified by the proposed approach align with the requirements of real-world projects, demonstrating the practicality and effectiveness of the proposed framework. The proposed dataset selection customization mechanism and the proposed DRGM are helpful in guiding non-experts in identifying the data requirements for machine learning systems tailored to a specific ML problem. This approach also aids in evaluating different dataset alternatives to choose the optimum dataset for the problem. For future work, we recommend implementing tool support to generate the DRGM based on a chatbot interface.
- Asia > Middle East > Saudi Arabia > Eastern Province > Dhahran (0.14)
- North America > United States (0.04)
- Asia > Middle East > Saudi Arabia > Najran Province > Najran (0.04)
- (2 more...)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.34)
- Health & Medicine > Therapeutic Area > Immunology (0.34)