Search
Combining Reinforcement Learning and Optimal Transport for the Traveling Salesman Problem
Goh, Yong Liang, Lee, Wee Sun, Bresson, Xavier, Laurent, Thomas, Lim, Nicholas
The traveling salesman problem is a fundamental combinatorial optimization problem with strong exact algorithms. However, as problems scale up, these exact algorithms fail to provide a solution in a reasonable time. To resolve this, current works look at utilizing deep learning to construct reasonable solutions. Such efforts have been very successful, but tend to be slow and compute intensive. This paper exemplifies the integration of entropic regularized optimal transport techniques as a layer in a deep reinforcement learning network. We show that we can construct a model capable of learning without supervision and inferences significantly faster than current autoregressive approaches. We also empirically evaluate the benefits of including optimal transport algorithms within deep learning models to enforce assignment constraints during end-to-end training.
Making use of supercomputers in financial machine learning
Cotte, Philippe, Lagier, Pierre, Margot, Vincent, Geissler, Christophe
This article is the result of a collaboration between Fujitsu and Advestis. This collaboration aims at refactoring and running an algorithm based on systematic exploration producing investment recommendations on a high-performance computer of the Fugaku type [11], to see whether a very high number of cores could allow for a deeper exploration of the data compared to a cloud machine, hopefully resulting in better predictions. We found that an increase in the number of explored rules results in a net increase in the predictive performance of the final ruleset. Also, in the particular case of this study, we found that using more than around 40 cores does not bring a significant computation time gain. However, the origin of this limitation is explained by a threshold-based search heuristic used to prune the search space. We have evidence that for similar data sets with less restrictive thresholds, the number of cores actually used could very well be much higher, allowing parallelization to have a much greater effect.
Hyper Parameter Tuning with Uninformed and Informed Search
Hyperparameters are those parameters in Machine learning algorithms that are used to control the learning process of algorithms. Hyperparameter tuning is the process of finding the best hyperparameters which help us to build more accurate machine learning models. Note: There is a difference between Model Parameters and Hyper Parameters. Model parameters are learned from data e.g. Slope and intercept in Linear Regression models, and Hyperparameters are those which we set such as L1 or L2 Regularization in Regression Model.
Congratulations to the #AAAI2022 award winners
As a key contribution, we adapt tools from the scenario approach to compute probably approximately correct (PAC) bounds on these transition probabilities, based on a finite number of samples of the noise. We capture these bounds in the transition probability intervals of a so-called interval Markov decision process (iMDP). This iMDP is robust against uncertainty in the transition probabilities, and the tightness of the probability intervals can be controlled through the number of samples. We use state-of-the-art verification techniques to provide guarantees on the iMDP, and compute a controller for which these guarantees carry over to the autonomous system. Realistic benchmarks show the practical applicability of our method, even when the iMDP has millions of states or transitions.
A Globally Convergent Evolutionary Strategy for Stochastic Constrained Optimization with Applications to Reinforcement Learning
Diouane, Youssef, Lucchi, Aurelien, Patil, Vihang
Evolutionary strategies have recently been shown to achieve competing levels of performance for complex optimization problems in reinforcement learning. In such problems, one often needs to optimize an objective function subject to a set of constraints, including for instance constraints on the entropy of a policy or to restrict the possible set of actions or states accessible to an agent. Convergence guarantees for evolutionary strategies to optimize stochastic constrained problems are however lacking in the literature. In this work, we address this problem by designing a novel optimization algorithm with a sufficient decrease mechanism that ensures convergence and that is based only on estimates of the functions. We demonstrate the applicability of this algorithm on two types of experiments: i) a control task for maximizing rewards and ii) maximizing rewards subject to a non-relaxable set of constraints.
Interactive Visual Pattern Search on Graph Data via Graph Representation Learning
Song, Huan, Dai, Zeng, Xu, Panpan, Ren, Liu
Graphs are a ubiquitous data structure to model processes and relations in a wide range of domains. Examples include control-flow graphs in programs and semantic scene graphs in images. Identifying subgraph patterns in graphs is an important approach to understanding their structural properties. We propose a visual analytics system GraphQ to support human-in-the-loop, example-based, subgraph pattern search in a database containing many individual graphs. To support fast, interactive queries, we use graph neural networks (GNNs) to encode a graph as fixed-length latent vector representation, and perform subgraph matching in the latent space. Due to the complexity of the problem, it is still difficult to obtain accurate one-to-one node correspondences in the matching results that are crucial for visualization and interpretation. We, therefore, propose a novel GNN for node-alignment called NeuroAlign, to facilitate easy validation and interpretation of the query results. GraphQ provides a visual query interface with a query editor and a multi-scale visualization of the results, as well as a user feedback mechanism for refining the results with additional constraints. We demonstrate GraphQ through two example usage scenarios: analyzing reusable subroutines in program workflows and semantic scene graph search in images. Quantitative experiments show that NeuroAlign achieves 19-29% improvement in node-alignment accuracy compared to baseline GNN and provides up to 100x speedup compared to combinatorial algorithms. Our qualitative study with domain experts confirms the effectiveness for both usage scenarios.
Code Search based on Context-aware Code Translation
Sun, Weisong, Fang, Chunrong, Chen, Yuchen, Tao, Guanhong, Han, Tingxu, Zhang, Quanjun
Code search is a widely used technique by developers during software development. It provides semantically similar implementations from a large code corpus to developers based on their queries. Existing techniques leverage deep learning models to construct embedding representations for code snippets and queries, respectively. Features such as abstract syntactic trees, control flow graphs, etc., are commonly employed for representing the semantics of code snippets. However, the same structure of these features does not necessarily denote the same semantics of code snippets, and vice versa. In addition, these techniques utilize multiple different word mapping functions that map query words/code tokens to embedding representations. This causes diverged embeddings of the same word/token in queries and code snippets. We propose a novel context-aware code translation technique that translates code snippets into natural language descriptions (called translations). The code translation is conducted on machine instructions, where the context information is collected by simulating the execution of instructions. We further design a shared word mapping function using one single vocabulary for generating embeddings for both translations and queries. We evaluate the effectiveness of our technique, called TranCS, on the CodeSearchNet corpus with 1,000 queries. Experimental results show that TranCS significantly outperforms state-of-the-art techniques by 49.31% to 66.50% in terms of MRR (mean reciprocal rank).
Information-Theoretic Analysis of Minimax Excess Risk
Hafez-Kolahi, Hassan, Moniri, Behrad, Kasaei, Shohreh
Two main concepts studied in machine learning theory are generalization gap (difference between train and test error) and excess risk (difference between test error and the minimum possible error). While information-theoretic tools have been used extensively to study the generalization gap of learning algorithms, the information-theoretic nature of excess risk has not yet been fully investigated. In this paper, some steps are taken toward this goal. We consider the frequentist problem of minimax excess risk as a zero-sum game between algorithm designer and the world. Then, we argue that it is desirable to modify this game in a way that the order of play can be swapped. We prove that, under some regularity conditions, if the world and designer can play randomly the duality gap is zero and the order of play can be changed. In this case, a Bayesian problem surfaces in the dual representation. This makes it possible to utilize recent information-theoretic results on minimum excess risk in Bayesian learning to provide bounds on the minimax excess risk. We demonstrate the applicability of the results by providing information theoretic insight on two important classes of problems: classification when the hypothesis space has finite VC-dimension, and regularized least squares.
Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium Learning from Offline Datasets
Zhong, Han, Xiong, Wei, Tan, Jiyuan, Wang, Liwei, Zhang, Tong, Wang, Zhaoran, Yang, Zhuoran
We study episodic two-player zero-sum Markov games (MGs) in the offline setting, where the goal is to find an approximate Nash equilibrium (NE) policy pair based on a dataset collected a priori. When the dataset does not have uniform coverage over all policy pairs, finding an approximate NE involves challenges in three aspects: (i) distributional shift between the behavior policy and the optimal policy, (ii) function approximation to handle large state space, and (iii) minimax optimization for equilibrium solving. We propose a pessimism-based algorithm, dubbed as pessimistic minimax value iteration (PMVI), which overcomes the distributional shift by constructing pessimistic estimates of the value functions for both players and outputs a policy pair by solving NEs based on the two value functions. Furthermore, we establish a data-dependent upper bound on the suboptimality which recovers a sublinear rate without the assumption on uniform coverage of the dataset. We also prove an information-theoretical lower bound, which suggests that the data-dependent term in the upper bound is intrinsic. Our theoretical results also highlight a notion of "relative uncertainty", which characterizes the necessary and sufficient condition for achieving sample efficiency in offline MGs. To the best of our knowledge, we provide the first nearly minimax optimal result for offline MGs with function approximation.