Goto

Collaborating Authors

 challenge


ARetrospectiveontheRobotAirHockey Challenge: BenchmarkingRobust, Reliable,andSafeLearning TechniquesforReal-worldRobotics

Neural Information Processing Systems

Machine learning methods have a groundbreaking impact in many application domains, but their application on real robotic platforms is still limited. Despite the many challenges associated with combining machine learning technology with robotics, robot learning remains one of the most promising directions for enhancing thecapabilities ofrobots.


Federated Hyperparameter Tuning: Challenges, Baselines, and Connections to Weight-Sharing

Neural Information Processing Systems

Tuning hyperparameters is a crucial but arduous part of the machine learning pipeline. Hyperparameter optimization is even more challenging in federated learning, where models are learned over a distributed network of heterogeneous devices; here, the need to keep data on device and perform local training makes it difficult to efficiently train and evaluate configurations. In this work, we investigate the problem of federated hyperparameter tuning. We first identify key challenges and show how standard approaches may be adapted to form baselines for the federated setting. Then, by making a novel connection to the neural architecture search technique of weight-sharing, we introduce a new method, FedEx, to accelerate federated hyperparameter tuning that is applicable to widely-used federated optimization methods such as FedAvg and recent variants. Theoretically, we show that a FedEx variant correctly tunes the on-device learning rate in the setting of online convex optimization across devices. Empirically, we show that FedEx can outperform natural baselines for federated hyperparameter tuning by several percentage points on the Shakespeare, FEMNIST, and CIFAR-10 benchmarks--obtaining higher accuracy using the same training budget.


Computing in the Arab World: Innovations, Challenges, and Advances amidst a Rich Mosaic of Scientific Activity

Communications of the ACM

Membership in ACM includes a subscription to Communications of the ACM (CACM), the computing industry's most trusted source for staying connected to the world of advanced computing. The Regional Special Section of the Arab World highlights some of the region's exciting, innovative, and socially relevant advances in computing and its applications. It is with great pleasure that we present this Communications of the ACM Regional Special Section of the Arab World. In this second edition, we highlight some of the region's exciting, innovative, and socially relevant advances in computing and its applications. The Arab world is home to a rich mosaic of cultures, histories, and geographies, stretching from the Atlantic Ocean to the Gulf.


Most Influential Subset Selection: Challenges, Promises, and Beyond

Neural Information Processing Systems

How can we attribute the behaviors of machine learning models to their training data? While the classic influence function sheds light on the impact of individual samples, it often fails to capture the more complex and pronounced collective influence of a set of samples. To tackle this challenge, we study the Most Influential Subset Selection (MISS) problem, which aims to identify a subset of training samples with the greatest collective influence. We conduct a comprehensive analysis of the prevailing approaches in MISS, elucidating their strengths and weaknesses. Our findings reveal that influence-based greedy heuristics, a dominant class of algorithms in MISS, can provably fail even in linear regression.


A Taxonomy of Challenges to Curating Fair Datasets

Neural Information Processing Systems

Despite extensive efforts to create fairer machine learning (ML) datasets, there remains a limited understanding of the practical aspects of dataset curation. Drawing from interviews with 30 ML dataset curators, we present a comprehensive taxonomy of the challenges and trade-offs encountered throughout the dataset curation lifecycle. Our findings underscore overarching issues within the broader fairness landscape that impact data curation. We conclude with recommendations aimed at fostering systemic changes to better facilitate fair dataset curation practices.


When Your AIs Deceive You: Challenges of Partial Observability in Reinforcement Learning from Human Feedback

Neural Information Processing Systems

Past analyses of reinforcement learning from human feedback (RLHF) assume that the human evaluators fully observe the environment. What happens when human feedback is based only on partial observations? We formally define two failure cases: deceptive inflation and overjustification. Modeling the human as Boltzmann-rational w.r.t. a belief over trajectories, we prove conditions under which RLHF is guaranteed to result in policies that deceptively inflate their performance, overjustify their behavior to make an impression, or both. Under the new assumption that the human's partial observability is known and accounted for, we then analyze how much information the feedback process provides about the return function.


Human Digital Twins in Personalized Healthcare: An Overview and Future Perspectives

Mokhtari, Melvin

arXiv.org Artificial Intelligence

This evolution indicates an expansion from industrial uses into diverse fields, including healthcare [61], [59]. The core functionalities of digital twins include an accurate mirroring of their physical counterparts, capturing all associated processes in a data-driven manner, maintaining a continuous connection that synchronizes with the real-time state of their physical twins, and simulating physical behavior for predictive analysis [85]. In the context of healthcare, a novel extension of this technology manifests in the form of Human Digital Twins (HDTs), designed to provide a comprehensive digital mirror of individual patients. HDTs not only represent physical attributes but also integrate dynamic changes across molecular, physiological, and behavioral dimensions. This advancement is aligned with a shift toward personalized healthcare (PH) paradigms, enabling tailored treatment strategies based on a patient's unique health profile, thereby enhancing preventive, diagnostic, and therapeutic processes in clinical settings [44], [50]. The personalization aspect of HDTs underscores their potential to revolutionize healthcare by facilitating precise and individualized treatment plans that optimize patient outcomes [72]. Although the potential of digital twins in healthcare has garnered much attention, practical applications remain newly developing, with critical literature highlighting that many implementations are still in exploratory stages [59]. Notably, institutions like the IEEE Computer Society and Gartner recognize this technology as a pivotal component in the ongoing evolution of healthcare systems that emphasize both precision and personalization [31], [89].


PolyPythias: Stability and Outliers across Fifty Language Model Pre-Training Runs

van der Wal, Oskar, Lesci, Pietro, Muller-Eberstein, Max, Saphra, Naomi, Schoelkopf, Hailey, Zuidema, Willem, Biderman, Stella

arXiv.org Artificial Intelligence

The stability of language model pre-training and its effects on downstream performance are still understudied. Prior work shows that the training process can yield significantly different results in response to slight variations in initial conditions, e.g., the random seed. Crucially, the research community still lacks sufficient resources and tools to systematically investigate pre-training stability, particularly for decoder-only language models. We introduce the PolyPythias, a set of 45 new training runs for the Pythia model suite: 9 new seeds across 5 model sizes, from 14M to 410M parameters, resulting in about 7k new checkpoints that we release. Using these new 45 training runs, in addition to the 5 already available, we study the effects of different initial conditions determined by the seed -- i.e., parameters' initialisation and data order -- on (i) downstream performance, (ii) learned linguistic representations, and (iii) emergence of training phases. In addition to common scaling behaviours, our analyses generally reveal highly consistent training dynamics across both model sizes and initial conditions. Further, the new seeds for each model allow us to identify outlier training runs and delineate their characteristics. Our findings show the potential of using these methods to predict training stability.


LLMs' Reshaping of People, Processes, Products, and Society in Software Development: A Comprehensive Exploration with Early Adopters

Tabarsi, Benyamin, Reichert, Heidi, Limke, Ally, Kuttal, Sandeep, Barnes, Tiffany

arXiv.org Artificial Intelligence

Large language models (LLMs) like OpenAI ChatGPT, Google Gemini, and GitHub Copilot are rapidly gaining traction in the software industry, but their full impact on software engineering remains insufficiently explored. Despite their growing adoption, there is a notable lack of formal, qualitative assessments of how LLMs are applied in real-world software development contexts. To fill this gap, we conducted semi-structured interviews with sixteen early-adopter professional developers to explore their use of LLMs throughout various stages of the software development life cycle. Our investigation examines four dimensions: people - how LLMs affect individual developers and teams; process - how LLMs alter software engineering workflows; product - LLM impact on software quality and innovation; and society - the broader socioeconomic and ethical implications of LLM adoption. Thematic analysis of our data reveals that while LLMs have not fundamentally revolutionized the development process, they have substantially enhanced routine coding tasks, including code generation, refactoring, and debugging. Developers reported the most effective outcomes when providing LLMs with clear, well-defined problem statements, indicating that LLMs excel with decomposed problems and specific requirements. Furthermore, these early-adopters identified that LLMs offer significant value for personal and professional development, aiding in learning new languages and concepts. Early-adopters, highly skilled in software engineering and how LLMs work, identified early and persisting challenges for software engineering, such as inaccuracies in generated content and the need for careful manual review before integrating LLM outputs into production environments. Our study provides a nuanced understanding of how LLMs are shaping the landscape of software development, with their benefits, limitations, and ongoing implications.


Towards Data-Efficient Language Models: A Child-Inspired Approach to Language Learning

Ghanizadeh, Mohammad Amin, Dousti, Mohammad Javad

arXiv.org Artificial Intelligence

In this work, we explain our approach employed in the BabyLM Challenge, which uses various methods of training language models (LMs) with significantly less data compared to traditional large language models (LLMs) and are inspired by how human children learn. While a human child is exposed to far less linguistic input than an LLM, they still achieve remarkable language understanding and generation abilities. To this end, we develop a model trained on a curated dataset consisting of 10 million words, primarily sourced from child-directed transcripts. The 2024 BabyLM Challenge initial dataset of 10M words is filtered to 8.5M. Next, it is supplemented with a randomly selected subset of TVR dataset consisting of 1.5M words of television dialogues. The latter dataset ensures that similar to children, the model is also exposed to language through media. Furthermore, we reduce the vocabulary size to 32,000 tokens, aligning it with the limited vocabulary of children in the early stages of language acquisition. We use curriculum learning and is able to match the baseline on certain benchmarks while surpassing the baseline on others. Additionally, incorporating common LLM training datasets, such as MADLAD-400, degrades performance. These findings underscore the importance of dataset selection, vocabulary scaling, and curriculum learning in creating more data-efficient language models that better mimic human learning processes.