Goto

Collaborating Authors

 pareto-optimal





some common concerns, including how the the user study is more nuanced than it seems, eliciting target sets (and

Neural Information Processing Systems

We thank the reviewers for their thoughtful feedback. R2, we do not know of any existing benchmarks to evaluate MCPL methodologies. We thank R2 for their encouraging remarks. We shall include the missing references in the updated draft. For general convex sets, the distance oracle may be computationally intensive.


Deviations from the Nash equilibrium and emergence of tacit collusion in a two-player optimal execution game with reinforcement learning

Lillo, Fabrizio, Macrì, Andrea

arXiv.org Machine Learning

The use of reinforcement learning algorithms in financial trading is becoming increasingly prevalent. However, the autonomous nature of these algorithms can lead to unexpected outcomes that deviate from traditional game-theoretical predictions and may even destabilize markets. In this study, we examine a scenario in which two autonomous agents, modeled with Double Deep Q-Learning, learn to liquidate the same asset optimally in the presence of market impact, using the Almgren-Chriss (2000) framework. Our results show that the strategies learned by the agents deviate significantly from the Nash equilibrium of the corresponding market impact game. Notably, the learned strategies exhibit tacit collusion, closely aligning with the Pareto-optimal solution. We further explore how different levels of market volatility influence the agents' performance and the equilibria they discover, including scenarios where volatility differs between the training and testing phases.


The Survival Bandit Problem

Riou, Charles, Honda, Junya, Sugiyama, Masashi

arXiv.org Machine Learning

We introduce and study a new variant of the multi-armed bandit problem (MAB), called the survival bandit problem (S-MAB). While in both problems, the objective is to maximize the so-called cumulative reward, in this new variant, the procedure is interrupted if the cumulative reward falls below a preset threshold. This simple yet unexplored extension of the MAB follows from many practical applications. For example, when testing two medicines against each other on voluntary patients, people's health are at stake, and it is necessary to be able to interrupt experiments if serious side effects occur or if the disease syndromes are not dissipated by the treatment. From a theoretical perspective, the S-MAB is the first variant of the MAB where the procedure may or may not be interrupted. We start by formalizing the S-MAB and we define its objective as the minimization of the so-called survival regret, which naturally generalizes the regret of the MAB. Then, we show that the objective of the S-MAB is considerably more difficult than the MAB, in the sense that contrary to the MAB, no policy can achieve a reasonably small (i.e., sublinear) survival regret. Instead, we minimize the survival regret in the sense of Pareto, i.e., we seek a policy whose cumulative reward cannot be improved for some problem instance without being sacrificed for another one. For that purpose, we identify two key components in the survival regret: the regret given no ruin (which corresponds to the regret in the MAB), and the probability that the procedure is interrupted, called the probability of ruin. We derive a lower bound on the probability of ruin, as well as policies whose probability of ruin matches the lower bound. Finally, based on a doubling trick on those policies, we derive a policy which minimizes the survival regret in the sense of Pareto, giving an answer to an open problem by Perotto et al. (COLT 2019).


Zero-Shot Neural Architecture Search: Challenges, Solutions, and Opportunities

Li, Guihong, Hoang, Duc, Bhardwaj, Kartikeya, Lin, Ming, Wang, Zhangyang, Marculescu, Radu

arXiv.org Artificial Intelligence

Abstract--Recently, zero-shot (or training-free) Neural Architecture Search (NAS) approaches have been proposed to liberate NAS from the expensive training process. The key idea behind zero-shot NAS approaches is to design proxies that can predict the accuracy of some given networks without training the network parameters. The proxies proposed so far are usually inspired by recent progress in theoretical understanding of deep learning and have shown great potential on several datasets and NAS benchmarks. This paper aims to comprehensively review and compare the state-of-the-art (SOTA) zero-shot NAS approaches, with an emphasis on their hardware awareness. To this end, we first review the mainstream zero-shot proxies and discuss their theoretical underpinnings. We then compare these zero-shot proxies through large-scale experiments and demonstrate their effectiveness in both hardware-aware and hardware-oblivious NAS scenarios. Finally, we point out several promising ideas to design better proxies. In recent years, deep neural networks have made significant via a hyper-network [11], [32], [33], [34], [35], [36], [37]. As breakthroughs in many applications, such as recommendation shown in Figure 2, one-shot NAS only needs to train a single systems, image classification, and natural language hyper-network instead of multiple candidate architectures modeling [1], [2], [3], [4], [5], [6], [7]. To automatically design whose number is usually exponentially large.


Mind the Gap: Measuring Generalization Performance Across Multiple Objectives

Feurer, Matthias, Eggensperger, Katharina, Bergman, Edward, Pfisterer, Florian, Bischl, Bernd, Hutter, Frank

arXiv.org Artificial Intelligence

Modern machine learning models are often constructed taking into account multiple objectives, e.g., minimizing inference time while also maximizing accuracy. Multi-objective hyperparameter optimization (MHPO) algorithms return such candidate models, and the approximation of the Pareto front is used to assess their performance. In practice, we also want to measure generalization when moving from the validation to the test set. However, some of the models might no longer be Pareto-optimal which makes it unclear how to quantify the performance of the MHPO method when evaluated on the test set. To resolve this, we provide a novel evaluation protocol that allows measuring the generalization performance of MHPO methods and studying its capabilities for comparing two optimization experiments.


Quantifying Complexity: An Object-Relations Approach to Complex Systems

Casey, Stephen

arXiv.org Artificial Intelligence

The best way to model, understand, and quantify the information contained in complex systems is an open question in physics, mathematics, and computer science. The uncertain relationship between entropy and complexity further complicates this question. With ideas drawn from the object-relations theory of psychology, this paper develops an object-relations model of complex systems which generalizes to systems of all types, including mathematical operations, machines, biological organisms, and social structures. The resulting Complex Information Entropy (CIE) equation is a robust method to quantify complexity across various contexts. The paper also describes algorithms to iteratively update and improve approximate solutions to the CIE equation, to recursively infer the composition of complex systems, and to discover the connections among objects across different lengthscales and timescales. Applications are discussed in the fields of engineering design, atomic and molecular physics, chemistry, materials science, neuroscience, psychology, sociology, ecology, economics, and medicine.


Generalization In Multi-Objective Machine Learning

Súkeník, Peter, Lampert, Christoph H.

arXiv.org Artificial Intelligence

Modern machine learning tasks often require considering not just one but multiple objectives. For example, besides the prediction quality, this could be the efficiency, robustness or fairness of the learned models, or any of their combinations. Multi-objective learning offers a natural framework for handling such problems without having to commit to early trade-offs. Surprisingly, statistical learning theory so far offers almost no insight into the generalization properties of multi-objective learning. In this work, we make first steps to fill this gap: we establish foundational generalization bounds for the multi-objective setting as well as generalization and excess bounds for learning with scalarizations. We also provide the first theoretical analysis of the relation between the Pareto-optimal sets of the true objectives and the Pareto-optimal sets of their empirical approximations from training data. In particular, we show a surprising asymmetry: all Pareto-optimal solutions can be approximated by empirically Pareto-optimal ones, but not vice versa.