AITopics

Solving linear programs by using entropic penalization has recently attracted new interest in the optimization community, since this strategy forms the basis for the fastest-known algorithms for the optimal transport problem, with many applications in modern large-scale machine learning. Crucial to these applications has been an analysis of how quickly solutions to the penalized program approach true optima to the original linear program. More than 20 years ago, Cominetti and San Mart\'in showed that this convergence is exponentially fast; however, their proof is asymptotic and does not give any indication of how accurately the entropic program approximates the original program for any particular choice of the penalization parameter. We close this long-standing gap in the literature regarding entropic penalization by giving a new proof of the exponential convergence, valid for any linear program. Our proof is non-asymptotic, yields explicit constants, and has the virtue of being extremely simple. We provide matching lower bounds and show that the entropic approach does not lead to a near-linear time approximation scheme for the linear assignment problem.

artificial intelligence, entropic penalty, optimization problem, (17 more...)

1806.01879

Country:

Europe (0.68)
North America > United States > Massachusetts (0.28)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Jankowiak, Martin, Obermeyer, Fritz

Pathwise Derivatives Beyond the Reparameterization Trick

We observe that gradients computed via the reparameterization trick are in direct correspondence with solutions of the transport equation in the formalism of optimal transport. We use this perspective to compute (approximate) pathwise gradients for probability distributions not directly amenable to the reparameterization trick: Gamma, Beta, and Dirichlet. We further observe that when the reparameterization trick is applied to the Cholesky-factorized multivariate Normal distribution, the resulting gradients are suboptimal in the sense of optimal transport. We derive the optimal gradients and show that they have reduced variance in a Gaussian Process regression task. We demonstrate with a variety of synthetic experiments and stochastic variational inference tasks that our pathwise gradients are competitive with other methods.

artificial intelligence, estimator, machine learning, (15 more...)

1806.01851

Country:

North America > United States (0.28)
Europe (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)

Veldt, Nate, Gleich, David, Wirth, Anthony, Saunderson, James

A Projection Method for Metric-Constrained Optimization

We outline a new approach for solving optimization problems which enforce triangle inequalities on output variables. We refer to this as metric-constrained optimization, and give several examples where problems of this form arise in machine learning applications and theoretical approximation algorithms for graph clustering. Although these problem are interesting from a theoretical perspective, they are challenging to solve in practice due to the high memory requirement of black-box solvers. In order to address this challenge we first prove that the metric-constrained linear program relaxation of correlation clustering is equivalent to a special case of the metric nearness problem. We then developed a general solver for metric-constrained linear and quadratic programs by generalizing and improving a simple projection algorithm originally developed for metric nearness. We give several novel approximation guarantees for using our framework to find lower bounds for optimal solutions to several challenging graph clustering problems. We also demonstrate the power of our framework by solving optimizing problems involving up to 10^{8} variables and 10^{11} constraints.

artificial intelligence, constraint, machine learning, (18 more...)

1806.01678

Country: North America > United States > California (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.89)

Oh, ChangYong, Gavves, Efstratios, Welling, Max

BOCK : Bayesian Optimization with Cylindrical Kernels

A major challenge in Bayesian Optimization is the boundary issue (Swersky, 2017) where an algorithm spends too many evaluations near the boundary of its search space. In this paper, we propose BOCK, Bayesian Optimization with Cylindrical Kernels, whose basic idea is to transform the ball geometry of the search space using a cylindrical transformation. Because of the transformed geometry, the Gaussian Process-based surrogate model spends less budget searching near the boundary, while concentrating its efforts relatively more near the center of the search region, where we expect the solution to be located. We evaluate BOCK extensively, showing that it is not only more accurate and efficient, but it also scales successfully to problems with a dimensionality as high as 500. We show that the better accuracy and scalability of BOCK even allows optimizing modestly sized neural network layers, as well as neural network hyperparameters.

artificial intelligence, bayesian optimization, machine learning, (17 more...)

1806.01619

Country: Europe (0.46)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.58)

Carbonnelle, Simon, De Vleeschouwer, Christophe

On layer-level control of DNN training and its impact on generalization

The generalization ability of a neural network depends on the optimization procedure used for training it. For practitioners and theoreticians, it is essential to identify which properties of the optimization procedure influence generalization. In this paper, we observe that prioritizing the training of distinct layers in a network significantly impacts its generalization ability, sometimes causing differences of up to 30% in test accuracy. In order to better monitor and control such prioritization, we propose to define layer-level training speed as the rotation rate of the layer's weight vector (denoted by layer rotation rate hereafter), and develop Layca, an optimization algorithm that enables direct control over it through each layer's learning rate parameter, without being affected by gradient propagation phenomena (e.g. vanishing gradients). We show that controlling layer rotation rates enables Layca to significantly outperform SGD with the same amount of learning rate tuning on three different tasks (up to 10% test error improvement). Furthermore, we provide experiments that suggest that several intriguing observations related to the training of deep models, i.e. the presence of plateaus in learning curves, the impact of weight decay, and the bad generalization properties of adaptive gradient methods, are all due to specific configurations of layer rotation rates. Overall, our work reveals that layer rotation rates are an important factor for generalization, and that monitoring it should be a key component of any deep learning experiment.

artificial intelligence, generalization, machine learning, (20 more...)

1806.01603

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Martins, Marcella S. R., Yafrani, Mohamed El, Santana, Roberto, Delgado, Myriam, Lüders, Ricardo, Ahiod, Belaïd

On the performance of multi-objective estimation of distribution algorithms for combinatorial problems

arXiv.org Artificial IntelligenceJun-4-2018

Fitness landscape analysis investigates features with a high influence on the performance of optimization algorithms, aiming to take advantage of the addressed problem characteristics. In this work, a fitness landscape analysis using problem features is performed for a Multi-objective Bayesian Optimization Algorithm (mBOA) on instances of MNK-landscape problem for 2, 3, 5 and 8 objectives. We also compare the results of mBOA with those provided by NSGA-III through the analysis of their estimated runtime necessary to identify an approximation of the Pareto front. Moreover, in order to scrutinize the probabilistic graphic model obtained by mBOA, the Pareto front is examined according to a probabilistic view. The fitness landscape study shows that mBOA is moderately or loosely influenced by some problem features, according to a simple and a multiple linear regression model, which is being proposed to predict the algorithms performance in terms of the estimated runtime. Besides, we conclude that the analysis of the probabilistic graphic model produced at the end of evolution can be useful to understand the convergence and diversity performances of the proposed approach.

artificial intelligence, evolutionary algorithm, machine learning, (19 more...)

arXiv.org Artificial Intelligence

1806.09935

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.70)
(2 more...)

Jiang, Dizuo, Faiz, Tasnim Ibn, Hassan, Md Mahmudul, Noor-E-Alam, Md.

A Possibility Distribution Based Multi-Criteria Decision Algorithm for Resilient Supplier Selection Problems

arXiv.org Artificial IntelligenceJun-3-2018

Resilient supplier selection problem is a key decision problem for an organization to gain competitive advantage. In the presence of multiple conflicting evaluation criteria, contradicting decision makers, and imprecise information sources, this problem becomes even more difficult to solve with the classical optimization approaches. Multi-Criteria Decision Analysis (MCDA) is a viable alternative approach for handling the imprecise information associated with the evaluation proffered by the decision makers. In this work, we present a comprehensive algorithm for ranking a set of suppliers based on aggregated information obtained from crisp numerical assessments and reliability adjusted linguistic appraisals from a group of decision makers. We adapted two popular tools - Single Valued Neutrosophic Sets (SVNS) and Interval-valued fuzzy sets (IVFS) and extended them to incorporate both crisp and linguistic evaluations from the decision makers to obtain aggregated SVNS and IVFS. This information is then used to rank the suppliers by using TOPSIS method. We present a case study to illustrate the mechanism of the proposed algorithm and show sensitivity of the supplier ranking with respect to the priorities of evaluation criteria.

artificial intelligence, criteria, fuzzy logic, (19 more...)

arXiv.org Artificial Intelligence

1806.0165

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine (0.68)
Automobiles & Trucks (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)

arXiv.org Machine LearningJun-2-2018

Minnorm training: an algorithm for training overcomplete deep neural networks

Bansal, Yamini, Advani, Madhu, Cox, David D, Saxe, Andrew M

In this work, we propose a new training method for finding minimum weight norm solutions in over-parameterized neural networks (NNs). This method seeks to improve training speed and generalization performance by framing NN training as a constrained optimization problem wherein the sum of the norm of the weights in each layer of the network is minimized, under the constraint of exactly fitting training data. It draws inspiration from support vector machines (SVMs), which are able to generalize well, despite often having an infinite number of free parameters in their primal form, and from recent theoretical generalization bounds on NNs which suggest that lower norm solutions generalize better. To solve this constrained optimization problem, our method employs Lagrange multipliers that act as integrators of error over training and identify `support vector'-like examples. The method can be implemented as a wrapper around gradient based methods and uses standard back-propagation of gradients from the NN for both regression and classification versions of the algorithm. We provide theoretical justifications for the effectiveness of this algorithm in comparison to early stopping and $L_2$-regularization using simple, analytically tractable settings. In particular, we show faster convergence to the max-margin hyperplane in a shallow network (compared to vanilla gradient descent); faster convergence to the minimum-norm solution in a linear chain (compared to $L_2$-regularization); and initialization-independent generalization performance in a deep linear network. Finally, using the MNIST dataset, we demonstrate that this algorithm can boost test accuracy and identify difficult examples in real-world datasets.

artificial intelligence, machine learning, optimization problem, (18 more...)

1806.0073

Country: North America (0.46)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.83)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.70)

Beck, Christian, Becker, Sebastian, Grohs, Philipp, Jaafari, Nor, Jentzen, Arnulf

Solving stochastic differential equations and Kolmogorov equations by means of deep learning

arXiv.org Machine LearningJun-1-2018

Stochastic differential equations (SDEs) and the Kolmogorov partial differential equations (PDEs) associated to them have been widely used in models from engineering, finance, and the natural sciences. In particular, SDEs and Kolmogorov PDEs, respectively, are highly employed in models for the approximative pricing of financial derivatives. Kolmogorov PDEs and SDEs, respectively, can typically not be solved explicitly and it has been and still is an active topic of research to design and analyze numerical methods which are able to approximately solve Kolmogorov PDEs and SDEs, respectively. Nearly all approximation methods for Kolmogorov PDEs in the literature suffer under the curse of dimensionality or only provide approximations of the solution of the PDE at a single fixed space-time point. In this paper we derive and propose a numerical approximation method which aims to overcome both of the above mentioned drawbacks and intends to deliver a numerical approximation of the Kolmogorov PDE on an entire region $[a,b]^d$ without suffering from the curse of dimensionality. Numerical results on examples including the heat equation, the Black-Scholes model, the stochastic Lorenz equation, and the Heston model suggest that the proposed approximation algorithm is quite effective in high dimensions in terms of both accuracy and speed.

approximation, differential equation, equation, (16 more...)

1806.00421

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Austria > Vienna (0.14)
North America > United States > New York (0.04)
(3 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Mathematics of Computing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.86)

Kyrillidis, Anastasios, Ubaru, Shashanka, Kollias, Georgios, Bouchard, Kristofer

Run Procrustes, Run! On the convergence of accelerated Procrustes Flow

arXiv.org Machine LearningJun-1-2018

In this work, we present theoretical results on the convergence of non-convex accelerated gradient descent in matrix factorization models. The technique is applied to matrix sensing problems with squared loss, for the estimation of a rank $r$ optimal solution $X^\star \in \mathbb{R}^{n \times n}$. We show that the acceleration leads to linear convergence rate, even under non-convex settings where the variable $X$ is represented as $U U^\top$ for $U \in \mathbb{R}^{n \times r}$. Our result has the same dependence on the condition number of the objective --and the optimal solution-- as that of the recent results on non-accelerated algorithms. However, acceleration is observed in practice, both in synthetic examples and in two real applications: neuronal multi-unit activities recovery from single electrode recordings, and quantum state tomography on quantum computing simulators.

artificial intelligence, machine learning, optimization problem, (18 more...)

1806.00534

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)