AITopics

2006.05482

Country:

Asia > Middle East > Israel > Haifa District > Haifa (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > Spain > Canary Islands (0.04)
(3 more...)

Genre: Research Report > New Finding (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.35)

Agrawal, Akshay, Barratt, Shane, Boyd, Stephen

Learning Convex Optimization Models

arXiv.org Machine LearningJun-18-2020

A convex optimization model predicts an output from an input by solving a convex optimization problem. The class of convex optimization models is large, and includes as special cases many well-known models like linear and logistic regression. We propose a heuristic for learning the parameters in a convex optimization model given a dataset of input-output pairs, using recently developed methods for differentiating the solution of a convex optimization problem with respect to its parameters. We describe three general classes of convex optimization models, maximum a posteriori (MAP) models, utility maximization models, and agent models, and present a numerical experiment for each.

artificial intelligence, convex optimization model, machine learning, (13 more...)

2006.04248

Country:

North America > United States > New York (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Industry: Energy (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.50)

AhmadiTeshnizi, Ali, Salehkaleybar, Saber, Kiyavash, Negar

LazyIter: A Fast Algorithm for Counting Markov Equivalent DAGs and Designing Experiments

arXiv.org Artificial IntelligenceJun-17-2020

The causal relationships among a set of random variables are commonly represented by a Directed Acyclic Graph (DAG), where there is a directed edge from variable $X$ to variable $Y$ if $X$ is a direct cause of $Y$. From the purely observational data, the true causal graph can be identified up to a Markov Equivalence Class (MEC), which is a set of DAGs with the same conditional independencies between the variables. The size of an MEC is a measure of complexity for recovering the true causal graph by performing interventions. We propose a method for efficient iteration over possible MECs given intervention results. We utilize the proposed method for computing MEC sizes and experiment design in active and passive learning settings. Compared to previous work for computing the size of MEC, our proposed algorithm reduces the time complexity by a factor of $O(n)$ for sparse graphs where $n$ is the number of variables in the system. Additionally, integrating our approach with dynamic programming, we design an optimal algorithm for passive experiment design. Experimental results show that our proposed algorithms for both computing the size of MEC and experiment design outperform the state of the art.

artificial intelligence, machine learning, optimization problem, (13 more...)

arXiv.org Artificial Intelligence

2006.0967

Country:

North America > United States (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)
Asia > Middle East > Iran > Tehran Province > Tehran (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.34)

arXiv.org Machine LearningJun-17-2020

Maximum Roaming Multi-Task Learning

Pascal, Lucas, Michiardi, Pietro, Bost, Xavier, Huet, Benoit, Zuluaga, Maria A.

Multi-task learning has gained popularity due to the advantages it provides with respect to resource usage and performance. Nonetheless, the joint optimization of parameters with respect to multiple tasks remains an active research topic. Sub-partitioning the parameters between different tasks has proven to be an efficient way to relax the optimization constraints over the shared weights, may the partitions be disjoint or overlapping. However, one drawback of this approach is that it can weaken the inductive bias generally set up by the joint task optimization. In this work, we present a novel way to partition the parameter space without weakening the inductive bias. Specifically, we propose Maximum Roaming, a method inspired by dropout that randomly varies the parameter partitioning, while forcing them to visit as many tasks as possible at a regulated frequency, so that the network fully adapts to each update. We study the properties of our method through experiments on a variety of visual multi-task data sets. Experimental results suggest that the regularization brought by roaming has more impact on performance than usual partitioning optimization strategies. The overall method is flexible, easily applicable, provides superior regularization and consistently achieves improved performances compared to recent multi-task learning formulations.

artificial intelligence, computer vision, machine learning, (15 more...)

2006.09762

Country:

Europe > France (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Ho, Chin Pang, Petrik, Marek, Wiesemann, Wolfram

Partial Policy Iteration for L1-Robust Markov Decision Processes

Robust Markov decision processes (MDPs) allow to compute reliable solutions for dynamic decision problems whose evolution is modeled by rewards and partially-known transition probabilities. Unfortunately, accounting for uncertainty in the transition probabilities significantly increases the computational complexity of solving robust MDPs, which severely limits their scalability. This paper describes new efficient algorithms for solving the common class of robust MDPs with s- and sa-rectangular ambiguity sets defined by weighted $L_1$ norms. We propose partial policy iteration, a new, efficient, flexible, and general policy iteration scheme for robust MDPs. We also propose fast methods for computing the robust Bellman operator in quasi-linear time, nearly matching the linear complexity the non-robust Bellman operator. Our experimental results indicate that the proposed methods are many orders of magnitude faster than the state-of-the-art approach which uses linear programming solvers combined with a robust value iteration.

artificial intelligence, iteration, machine learning, (18 more...)

2006.09484

Country:

Asia > China > Hong Kong (0.04)
North America > United States > New Hampshire (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.84)

Structured Stochastic Quasi-Newton Methods for Large-Scale Optimization Problems

Yang, Minghan, Xu, Dong, Li, Yongfeng, Wen, Zaiwen, Chen, Mengyun

In this paper, we consider large-scale finite-sum nonconvex problems arising from machine learning. Since the Hessian is often a summation of a relative cheap and accessible part and an expensive or even inaccessible part, a stochastic quasi-Newton matrix is constructed using partial Hessian information as much as possible. By further exploiting the low-rank structures based on the Nystr\"om approximation, the computation of the quasi-Newton direction is affordable. To make full use of the gradient estimation, we also develop an extra-step strategy for this framework. Global convergence to stationary point in expectation and local suplinear convergence rate are established under some mild assumptions. Numerical experiments on logistic regression, deep autoencoder networks and deep learning problems show that the efficiency of our proposed method is at least comparable with the state-of-the-art methods.

artificial intelligence, machine learning, optimization problem, (16 more...)

2006.09606

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States > New York (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Falkner, Jonas K., Schmidt-Thieme, Lars

Learning to Solve Vehicle Routing Problems with Time Windows through Joint Attention

Many real-world vehicle routing problems involve rich sets of constraints with respect to the capacities of the vehicles, time windows for customers etc. While in recent years first machine learning models have been developed to solve basic vehicle routing problems faster than optimization heuristics, complex constraints rarely are taken into consideration. Due to their general procedure to construct solutions sequentially route by route, these methods generalize unfavorably to such problems. In this paper, we develop a policy model that is able to start and extend multiple routes concurrently by using attention on the joint action space of several tours. In that way the model is able to select routes and customers and thus learns to make difficult trade-offs between routes. In comprehensive experiments on three variants of the vehicle routing problem with time windows we show that our model called JAMPR works well for different problem sizes and outperforms the existing state-of-the-art constructive model. For two of the three variants it also creates significantly better solutions than a comparable meta-heuristic solver.

artificial intelligence, machine learning, vehicle, (20 more...)

2006.091

Country: Europe > Germany (0.04)

Genre: Research Report (1.00)

Industry: Transportation > Freight & Logistics Services (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Hsieh, Ya-Ping, Mertikopoulos, Panayotis, Cevher, Volkan

The limits of min-max optimization algorithms: convergence to spurious non-critical sets

Compared to minimization problems, the min-max landscape in machine learning applications is considerably more convoluted because of the existence of cycles and similar phenomena. Such oscillatory behaviors are well-understood in the convex-concave regime, and many algorithms are known to overcome them. In this paper, we go beyond the convex-concave setting and we characterize the convergence properties of a wide class of zeroth-, first-, and (scalable) second-order methods in non-convex/non-concave problems. In particular, we show that these state-of-the-art min-max optimization algorithms may converge with arbitrarily high probability to attractors that are in no way min-max optimal or even stationary. Spurious convergence phenomena of this type can arise even in two-dimensional problems, a fact which corroborates the empirical evidence surrounding the formidable difficulty of training GANs.

algorithm, artificial intelligence, machine learning, (16 more...)

2006.09065

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
Asia > Middle East > Jordan (0.04)
(4 more...)

Genre: Research Report (0.82)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Nørregaard, Jeppe, Hansen, Lars Kai

Probabilistic Decoupling of Labels in Classification

A common approach, called transductive In this paper we develop a principled, probabilistic, semi-supervised learning (Zhu & Goldberg, 2009; Triguero unified approach to nonstandard classification et al., 2015), is to attempt to predict labels on the unlabelled tasks, such as semi-supervised, positiveunlabelled, dataset and then use the combined dataset to train final multi-positive-unlabelled and noisylabel models. One transductive method is self-training in which learning. We train a classifier on the given a model switches between training and relabelling its labels to predict the label-distribution.

artificial intelligence, machine learning, optimization problem, (17 more...)

2006.09046

Country:

Europe > Denmark (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > Nevada > Clark County > Las Vegas (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Tan, Yingcong, Terekhov, Daria, Delong, Andrew

Learning Linear Programs from Optimal Decisions

We propose a flexible gradient-based framework for learning linear programs from optimal decisions. Linear programs are often specified by hand, using prior knowledge of relevant costs and constraints. In some applications, linear programs must instead be learned from observations of optimal decisions. Learning from optimal decisions is a particularly challenging bi-level problem, and much of the related inverse optimization literature is dedicated to special cases. We tackle the general problem, learning all parameters jointly while allowing flexible parametrizations of costs, constraints, and loss functions. We also address challenges specific to learning linear programs, such as empty feasible regions and non-unique optimal decisions. Experiments show that our method successfully learns synthetic linear programs and minimum-cost multi-commodity flow instances for which previous methods are not directly applicable. We also provide a fast batch-mode PyTorch implementation of the homogeneous interior point algorithm, which supports gradients by implicit differentiation or backpropagation.

artificial intelligence, constraint, machine learning, (17 more...)

2006.08923

Country: North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)