AITopics

2102.09198

Country:

North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Akagi, Yasunori, Marumo, Naoki, Kim, Hideaki, Kurashima, Takeshi, Toda, Hiroyuki

Non-approximate Inference for Collective Graphical Models on Path Graphs via Discrete Difference of Convex Algorithm

arXiv.org Machine LearningFeb-18-2021

The importance of aggregated count data, which is calculated from the data of multiple individuals, continues to increase. Collective Graphical Model (CGM) is a probabilistic approach to the analysis of aggregated data. One of the most important operations in CGM is maximum a posteriori (MAP) inference of unobserved variables under given observations. Because the MAP inference problem for general CGMs has been shown to be NP-hard, an approach that solves an approximate problem has been proposed. However, this approach has two major drawbacks. First, the quality of the solution deteriorates when the values in the count tables are small, because the approximation becomes inaccurate. Second, since continuous relaxation is applied, the integrality constraints of the output are violated. To resolve these problems, this paper proposes a new method for MAP inference for CGMs on path graphs. First we show that the MAP inference problem can be formulated as a (non-linear) minimum cost flow problem. Then, we apply Difference of Convex Algorithm (DCA), which is a general methodology to minimize a function represented as the sum of a convex function and a concave function. In our algorithm, important subroutines in DCA can be efficiently calculated by minimum convex cost flow algorithms. Experiments show that the proposed method outputs higher quality solutions than the conventional approach.

algorithm, collective graphical model, non-approximate inference, (7 more...)

2102.09191

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.65)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)

Chalumeau, Félix, Coulon, Ilan, Cappart, Quentin, Rousseau, Louis-Martin

SeaPearl: A Constraint Programming Solver guided by Reinforcement Learning

arXiv.org Artificial IntelligenceFeb-18-2021

The design of efficient and generic algorithms for solving combinatorial optimization problems has been an active field of research for many years. Standard exact solving approaches are based on a clever and complete enumeration of the solution set. A critical and non-trivial design choice with such methods is the branching strategy, directing how the search is performed. The last decade has shown an increasing interest in the design of machine learning-based heuristics to solve combinatorial optimization problems. The goal is to leverage knowledge from historical data to solve similar new instances of a problem. Used alone, such heuristics are only able to provide approximate solutions efficiently, but cannot prove optimality nor bounds on their solution. Recent works have shown that reinforcement learning can be successfully used for driving the search phase of constraint programming (CP) solvers. However, it has also been shown that this hybridization is challenging to build, as standard CP frameworks do not natively include machine learning mechanisms, leading to some sources of inefficiencies. This paper presents the proof of concept for SeaPearl, a new CP solver implemented in Julia, that supports machine learning routines in order to learn branching decisions using reinforcement learning. Support for modeling the learning component is also provided. We illustrate the modeling and solution performance of this new solver on two problems. Although not yet competitive with industrial solvers, SeaPearl aims to provide a flexible and open-source framework in order to facilitate future research in the hybridization of constraint programming and machine learning.

reinforcement, seapearl, solver, (14 more...)

arXiv.org Artificial Intelligence

2102.09193

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe > France > Bourgogne-Franche-Comté > Doubs > Besançon (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Xu, Xinping | Li, Bo (Department of Computer Science, University of Oxford) | Li, Minming (Department of Computer Science, City University of Hong Kong) | Duan, Lingjie (Singapore University of Technology and Design)

Two-facility Location Games with Minimum Distance Requirement

Journal of Artificial Intelligence ResearchFeb-17-2021

We study the mechanism design problem of a social planner for locating two facilities on a line interval [0, 1], where a set of n strategic agents report their locations and a mechanism determines the locations of the two facilities. We consider the requirement of a minimum distance 0 ≤ d ≤ 1 between the two facilities. Given the two facilities are heterogeneous, we model the cost/utility of an agent as the sum of his distances to both facilities. In the heterogeneous two-facility location game to minimize the social cost, we show that the optimal solution can be computed in polynomial time and prove that carefully choosing one optimal solution as output is strategyproof. We also design a strategyproof mechanism minimizing the maximum cost. Given the two facilities are homogeneous, we model the cost/utility of an agent as his distance to the closer facility. In the homogeneous two-facility location game for minimizing the social cost, we show that any deterministic strategyproof mechanism has unbounded approximation ratio. Moreover, in the obnoxious heterogeneous two-facility location game for maximizing the social utility, we propose new deterministic group strategyproof mechanisms with provable approximation ratios and establish a lower bound (7 − d)/6 for any deterministic strategyproof mechanism. We also design a strategyproof mechanism maximizing the minimum utility. In the obnoxious homogeneous two-facility location game for maximizing the social utility, we propose deterministic group strategyproof mechanisms with provable approximation ratios and establish a lower bound 4/3. Besides, in the two-facility location game with triple-preference, where each facility may be favorable, obnoxious, indifferent for any agent, we further motivate agents to report both their locations and preferences towards the two facilities truthfully, and design a deterministic group strategyproof mechanism with an approximation ratio 4.

location game, mechanism, two-facility location game, (14 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1.12319

AI Access Foundation

12319

Journal of Artificial Intelligence Research

Country:

Asia > Singapore (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
North America > United States > New Jersey (0.04)
(5 more...)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)

Using Distance Correlation for Efficient Bayesian Optimization

Kanazawa, Takuya

We propose a novel approach for Bayesian optimization, called $\textsf{GP-DC}$, which combines Gaussian processes with distance correlation. It balances exploration and exploitation automatically, and requires no manual parameter tuning. We evaluate $\textsf{GP-DC}$ on a number of benchmark functions and observe that it outperforms state-of-the-art methods such as $\textsf{GP-UCB}$ and max-value entropy search, as well as the classical expected improvement heuristic. We also apply $\textsf{GP-DC}$ to optimize sequential integral observations with a variable integration range and verify its empirical efficiency on both synthetic and real-world datasets.

algorithm, gp-dc, optimization, (14 more...)

2102.08993

Country:

Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
Asia > Japan > Kyūshū & Okinawa > Kyūshū > Fukuoka Prefecture > Fukuoka (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre: Research Report > Promising Solution (0.54)

Industry: Energy (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)

Nearly Optimal Regret for Learning Adversarial MDPs with Linear Function Approximation

He, Jiafan, Zhou, Dongruo, Gu, Quanquan

We study the reinforcement learning for finite-horizon episodic Markov decision processes with adversarial reward and full information feedback, where the unknown transition probability function is a linear function of a given feature mapping. We propose an optimistic policy optimization algorithm with Bernstein bonus and show that it can achieve $\tilde{O}(dH\sqrt{T})$ regret, where $H$ is the length of the episode, $T$ is the number of interaction with the MDP and $d$ is the dimension of the feature mapping. Furthermore, we also prove a matching lower bound of $\tilde{\Omega}(dH\sqrt{T})$ up to logarithmic factors. To the best of our knowledge, this is the first computationally efficient, nearly minimax optimal algorithm for adversarial Markov decision processes with linear function approximation.

algorithm, inequality hold, lemma 6, (13 more...)

2102.0894

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.54)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)

Provably Efficient Policy Gradient Methods for Two-Player Zero-Sum Markov Games

Zhao, Yulai, Tian, Yuandong, Lee, Jason D., Du, Simon S.

Policy gradient methods are widely used in solving two-player zero-sum games to achieve superhuman performance in practice. However, it remains elusive when they can provably find a near-optimal solution and how many samples and iterations are needed. The current paper studies natural extensions of Natural Policy Gradient algorithm for solving two-player zero-sum games where function approximation is used for generalization across states. We thoroughly characterize the algorithms' performance in terms of the number of samples, number of iterations, concentrability coefficients, and approximation error. To our knowledge, this is the first quantitative analysis of policy gradient methods with function approximation for two-player zero-sum Markov games.

algorithm, function approximation, two-player zero-sum markov game, (9 more...)

2102.08903

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > San Mateo County > Menlo Park (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games > Computer Games (0.67)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Tai, Kai Sheng, Bailis, Peter, Valiant, Gregory

Sinkhorn Label Allocation: Semi-Supervised Classification via Annealed Self-Training

Self-training is a standard approach to semi-supervised learning where the learner's own predictions on unlabeled data are used as supervision during training. In this paper, we reinterpret this label assignment process as an optimal transportation problem between examples and classes, wherein the cost of assigning an example to a class is mediated by the current predictions of the classifier. This formulation facilitates a practical annealing strategy for label assignment and allows for the inclusion of prior knowledge on class proportions via flexible upper bound constraints. The solutions to these assignment problems can be efficiently approximated using Sinkhorn iteration, thus enabling their use in the inner loop of standard stochastic optimization algorithms. We demonstrate the effectiveness of our algorithm on the CIFAR-10, CIFAR-100, and SVHN datasets in comparison with FixMatch, a state-of-the-art self-training algorithm. Additionally, we elucidate connections between our proposed algorithm and existing confidence thresholded self-training approaches in the context of homotopy methods in optimization. Our code is available at https://github.com/stanford-futuredata/sinkhorn-label-allocation.

algorithm, constraint, label distribution, (15 more...)

2102.08622

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method

Zhang, Junyu, Ni, Chengzhuo, Yu, Zheng, Szepesvari, Csaba, Wang, Mengdi

Policy gradient gives rise to a rich class of reinforcement learning (RL) methods, for example the REINFORCE. Yet the best known sample complexity result for such methods to find an $\epsilon$-optimal policy is $\mathcal{O}(\epsilon^{-3})$, which is suboptimal. In this paper, we study the fundamental convergence properties and sample efficiency of first-order policy optimization method. We focus on a generalized variant of policy gradient method, which is able to maximize not only a cumulative sum of rewards but also a general utility function over a policy's long-term visiting distribution. By exploiting the problem's hidden convex nature and leveraging techniques from composition optimization, we propose a Stochastic Incremental Variance-Reduced Policy Gradient (SIVR-PG) approach that improves a sequence of policies to provably converge to the global optimal solution and finds an $\epsilon$-optimal policy using $\tilde{\mathcal{O}}(\epsilon^{-2})$ samples.

arxiv preprint arxiv, estimator, optimization, (13 more...)

2102.08607

Country:

North America > Canada > Alberta (0.14)
Asia > Singapore (0.04)

Genre: Research Report (0.63)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Miller, John, Perdomo, Juan C., Zrnic, Tijana

Outside the Echo Chamber: Optimizing the Performative Risk

arXiv.org Machine LearningFeb-16-2021

In performative prediction, predictions guide decision-making and hence can influence the distribution of future data. To date, work on performative prediction has focused on finding performatively stable models, which are the fixed points of repeated retraining. However, stable solutions can be far from optimal when evaluated in terms of the performative risk, the loss experienced by the decision maker when deploying a model. In this paper, we shift attention beyond performative stability and focus on optimizing the performative risk directly. We identify a natural set of properties of the loss function and model-induced distribution shift under which the performative risk is convex, a property which does not follow from convexity of the loss alone. Furthermore, we develop algorithms that leverage our structural assumptions to optimize the performative risk with better sample efficiency than generic methods for derivative-free convex optimization.

algorithm, convex, performative risk, (16 more...)

2102.0857

Country:

North America > United States > New York (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Africa > South Sudan > Equatoria > Central Equatoria > Juba (0.04)

Genre: Research Report (0.82)

Industry: Government > Voting & Elections (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)