AITopics

Hernández-Orozco, Santiago, Zenil, Hector, Riedel, Jürgen, Uccello, Adam, Kiani, Narsis A., Tegnér, Jesper

Algorithmic Probability-guided Supervised Machine Learning on Non-differentiable Spaces

arXiv.org Artificial IntelligenceOct-8-2019

We show how complexity theory can be introduced in machine learning to help bring together apparently disparate areas of current research. We show that this new approach requires less training data and is more generalizable as it shows greater resilience to random attacks. We investigate the shape of the discrete algorithmic space when performing regression or classification using a loss function parametrized by algorithmic complexity, demonstrating that the property of differentiation is not necessary to achieve results similar to those obtained using differentiable programming approaches such as deep learning. In doing so we use examples which enable the two approaches to be compared (small, given the computational power required for estimations of algorithmic complexity). We find and report that (i) machine learning can successfully be performed on a non-smooth surface using algorithmic complexity; (ii) that parameter solutions can be found using an algorithmic-probability classifier, establishing a bridge between a fundamentally discrete theory of computability and a fundamentally continuous mathematical theory of optimization methods; (iii) a formulation of an algorithmically directed search technique in non-smooth manifolds can be defined and conducted; (iv) exploitation techniques and numerical methods for algorithmic search to navigate these discrete non-differentiable spaces can be performed; in application of the (a) identification of generative rules from data observations; (b) solutions to image classification problems more resilient against pixel attacks compared to neural networks; (c) identification of equation parameters from a small data-set in the presence of noise in continuous ODE system problem, (d) classification of Boolean NK networks by (1) network topology, (2) underlying Boolean function, and (3) number of incoming edges.

artificial intelligence, bdm, machine learning, (19 more...)

arXiv.org Artificial Intelligence

1910.02758

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(4 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Information Technology > Security & Privacy (0.34)
Government > Military (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Béjar, Benjamín, Dokmanić, Ivan, Vidal, René

The fastest $\ell_{1,\infty}$ prox in the west

Proximal operators are of particular interest in optimization problems dealing with non-smooth objectives because in many practical cases they lead to optimization algorithms whose updates can be computed in closed form or very efficiently. A well-known example is the proximal operator of the vector $\ell_1$ norm, which is given by the soft-thresholding operator. In this paper we study the proximal operator of the mixed $\ell_{1,\infty}$ matrix norm and show that it can be computed in closed form by applying the well-known soft-thresholding operator to each column of the matrix. However, unlike the vector $\ell_1$ norm case where the threshold is constant, in the mixed $\ell_{1,\infty}$ norm case each column of the matrix might require a different threshold and all thresholds depend on the given matrix. We propose a general iterative algorithm for computing these thresholds, as well as two efficient implementations that further exploit easy to compute lower bounds for the mixed norm of the optimal solution. Experiments on large-scale synthetic and real data indicate that the proposed methods can be orders of magnitude faster than state-of-the-art methods.

algorithm, operator, proximal operator, (17 more...)

1910.03749

Country:

North America > United States > Illinois (0.04)
North America > Canada > Alberta > Census Division No. 6 > Calgary Metropolitan Region > Calgary (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Nguyen, Tan, Ye, Nan, Bartlett, Peter L.

Learning Near-optimal Convex Combinations of Basis Models with Generalization Guarantees

The problem of learning an optimal convex combination of basis models has been studied in a number of works, with a focus on the theoretical analysis, but little investigati on on the empirical performance of the approach. In this paper, we present some new theoretical insights, and empirical resul ts that demonstrate the effectiveness of the approach. Theore ti-cally, we first consider whether we can replace convex combinations by linear combinations, and obtain convergence r e-sults similar to existing results for learning from a convex hull. We present a negative result showing that the linear hull of very simple basis functions can have unbounded capacity, an d is thus prone to overfitting. On the other hand, convex hulls are still rich but have bounded capacities. In addition, we o b-tain a generalization bound for a general class of Lipschitz loss functions. Empirically, we first discuss how a convex combination can be greedily learned with early stopping, an d how a convex combination can be non-greedily learned when the number of basis models is known a priori. Our experiments suggest that the greedy scheme is competitive with or better than several baselines, including boosting and rand om forests. The greedy algorithm requires little effort in hyp er-parameter tuning, and also seems to adapt to the underlying complexity of the problem.

algorithm, convex combination, dataset, (15 more...)

1910.03742

Country:

Oceania > Australia > Queensland (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

Schultheis, Matthias, Belousov, Boris, Abdulsamad, Hany, Peters, Jan

Receding Horizon Curiosity

Sample-efficient exploration is crucial not only for discovering rewarding experiences but also for adapting to environment changes in a task-agnostic fashion. A principled treatment of the problem of optimal input synthesis for system identification is provided within the framework of sequential Bayesian experimental design. In this paper, we present an effective trajectory-optimization-based approximate solution of this otherwise intractable problem that models optimal exploration in an unknown Markov decision process (MDP). By interleaving episodic exploration with Bayesian nonlinear system identification, our algorithm takes advantage of the inductive bias to explore in a directed manner, without assuming prior knowledge of the MDP. Empirical evaluations indicate a clear advantage of the proposed algorithm in terms of the rate of convergence and the final model fidelity when compared to intrinsic-motivation-based algorithms employing exploration bonuses such as prediction error and information gain. Moreover, our method maintains a computational advantage over a recent model-based active exploration (MAX) algorithm, by focusing on the information gain along trajectories instead of seeking a global exploration policy. A reference implementation of our algorithm and the conducted experiments is publicly available.

algorithm, exploration, optimization, (15 more...)

1910.0362

Country:

Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
(2 more...)

Improved Regret Bounds for Projection-free Bandit Convex Optimization

Garber, Dan, Kretzu, Ben

We revisit the challenge of designing online algorithms for the bandit convex optimization problem (BCO) which are also scalable to high dimensional problems. Hence, we consider algorithms that are \textit{projection-free}, i.e., based on the conditional gradient method whose only access to the feasible decision set, is through a linear optimization oracle (as opposed to other methods which require potentially much more computationally-expensive subprocedures, such as computing Euclidean projections). We present the first such algorithm that attains $O(T^{3/4})$ expected regret using only $O(T)$ overall calls to the linear optimization oracle, in expectation, where $T$ is the number of prediction rounds. This improves over the $O(T^{4/5})$ expected regret bound recently obtained by \cite{Karbasi19}, and actually matches the current best regret bound for projection-free online learning in the \textit{full information} setting.

algorithm, convex optimization, optimization, (13 more...)

1910.03374

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Asia > Middle East > Israel (0.04)
(7 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.49)

arXiv.org Artificial IntelligenceOct-8-2019

Optimal Delivery with Budget Constraint in E-Commerce Advertising

Wei, Chao, Zhang, Weiru, Sun, Shengjie, Li, Fei, Meng, Xiaonan, Hu, Yi, Wang, Hao

Online advertising in E-commerce platforms provides sellers an opportunity to achieve potential audiences with different target goals. Ad serving systems (like display and search advertising systems) that assign ads to pages should satisfy objectives such as plenty of audience for branding advertisers, clicks or conversions for performance-based advertisers, at the same time try to maximize overall revenue of the platform. In this paper, we propose an approach based on linear programming subjects to constraints in order to optimize the revenue and improve different performance goals simultaneously. We have validated our algorithm by implementing an offline simulation system in Alibaba E-commerce platform and running the auctions from online requests which takes system performance, ranking and pricing schemas into account. We have also compared our algorithm with related work, and the results show that our algorithm can effectively improve campaign performance and revenue of the platform.

algorithm, performance goal, proceedings, (12 more...)

arXiv.org Artificial Intelligence

1909.13221

Country: Asia > China > Zhejiang Province > Hangzhou (0.04)

Genre: Research Report (0.70)

Industry:

Marketing (1.00)
Information Technology > Services > e-Commerce Services (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > e-Commerce (0.94)

Bertsimas, Dimitris, Cory-Wright, Ryan

On Polyhedral and Second-Order-Cone Decompositions of Semidefinite Optimization Problems

arXiv.org Machine LearningOct-7-2019

However, it is notoriously di fficult to solve in practice, because IPMs memory requirements scale at a demanding rate. Indeed, state-of-the-art SDO solvers such as MOSEK cannot solve constrained instances of Problem (1) with n 250 variables on a standard laptop, and it is optimization folklore that there is a gap between SDOs theoretical and practical tractability. Motivated by the demanding memory requirements of IPMs, a stream of literature studies inexact methods for SDOs, which replace the semidefinite constraint with weaker yet less computationally demanding constraints. This approach was first investigated by Kim and Kojima [13], who observed that relaxing a positive semidefinite constraint to the weaker constraint that all 2 2 minors of a matrix are positive semidefinite yields a second order cone (SOC)-representable outer approximation of the positive semidefinite (PSD) cone. In a related line of work, Krishnan and Mitchell [15] propose applying Kelley [12]'s cutting plane method to generate

algorithm 1, approximation, constraint, (16 more...)

1910.03143

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Watson, Joe, Abdulsamad, Hany, Peters, Jan

Stochastic Optimal Control as Approximate Input Inference

arXiv.org Machine LearningOct-7-2019

Optimal control of stochastic nonlinear dynamical systems is a major challenge in the domain of robot learning. Given the intractability of the global control problem, state-of-the-art algorithms focus on approximate sequential optimization techniques, that heavily rely on heuristics for regularization in order to achieve stable convergence. By building upon the duality between inference and control, we develop the view of Optimal Control as Input Estimation, devising a probabilistic stochastic optimal control formulation that iteratively infers the optimal input distributions by minimizing an upper bound of the control cost. Inference is performed through Expectation Maximization and message passing on a probabilistic graphical model of the dynamical system, and time-varying linear Gaussian feedback controllers are extracted from the joint state-action distribution. This perspective incorporates uncertainty quantification, effective initialization through priors, and the principled regularization inherent to the Bayesian treatment. Moreover, it can be shown that for deterministic linearized systems, our framework derives the maximum entropy linear quadratic optimal control law. We provide a complete and detailed derivation of our probabilistic approach and highlight its advantages in comparison to other deterministic and probabilistic solvers.

controller, inference, optimal control, (15 more...)

1910.03003

Country:

Europe > Switzerland > Zürich > Zürich (0.04)
Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
Europe > Denmark (0.04)
Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)

arXiv.org Machine LearningOct-7-2019

Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization

Jain, Paras, Jain, Ajay, Nrusimha, Aniruddha, Gholami, Amir, Abbeel, Pieter, Keutzer, Kurt, Stoica, Ion, Gonzalez, Joseph E.

Modern neural networks are increasingly bottlenecked by the limited capacity of on-device GPU memory. Prior work explores dropping activations as a strategy to scale to larger neural networks under memory constraints. However, these heuristics assume uniform per-layer costs and are limited to simple architectures with linear graphs, limiting their usability. In this paper, we formalize the problem of trading-off DNN training time and memory requirements as the tensor rematerialization optimization problem, a generalization of prior checkpointing strategies. We introduce Checkmate, a system that solves for optimal schedules in reasonable times (under an hour) using off-the-shelf MILP solvers, then uses these schedules to accelerate millions of training iterations. Our method scales to complex, realistic architectures and is hardware-aware through the use of accelerator-specific, profile-based cost models. In addition to reducing training cost, Checkmate enables real-world networks to be trained with up to 5.1$\times$ larger input sizes.

batch size, graph, memory usage, (16 more...)

1910.02653

Country:

North America > United States > Rhode Island > Providence County > Providence (0.04)
North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
North America > United States > Illinois > Champaign County > Urbana (0.04)
North America > Canada > Ontario > National Capital Region > Ottawa (0.04)

Genre: Research Report (1.00)

Industry: Information Technology (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)