AITopics

1709.00537

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

A Brief Introduction to Machine Learning for Engineers

Simeone, Osvaldo

Department of Informatics, King's College London; osvaldo.simeone@kcl.ac.uk ABSTRACT This monograph aims at providing an introduction to key concepts, algorithms, and theoretical frameworks in machine learning, including supervised and unsupervised learning, statistical learning theory, probabilistic graphical models and approximate inference. The intended readership consists of electrical engineers with a background in probability and linear algebra. The treatment builds on first principles, and organizes the main ideas according to clearly defined categories, such as discriminative and generative models, frequentist and Bayesian approaches, exact and approximate inference, directed and undirected models, and convex and non-convex optimization. The mathematical framework uses information-theoretic measures as a unifying tool. The text offers simple and reproducible numerical examples providing insights into key motivations and conclusions. Rather than providing exhaustive details on the existing myriad solutions in each specific category, for which the reader is referred to textbooks and papers, this monograph is meant as an entry point for an engineer into the literature on machine learning.

artificial intelligence, bayesian inference, machine learning, (20 more...)

1709.0284

Country:

North America > United States (0.45)
Europe (0.28)

Genre:

Research Report (1.00)
Overview (1.00)
Summary/Review (0.92)
Instructional Material > Course Syllabus & Notes (0.45)

Industry:

Education (1.00)
Leisure & Entertainment (0.92)
Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(4 more...)

Joulani, Pooria, György, András, Szepesvári, Csaba

A Modular Analysis of Adaptive (Non-)Convex Optimization: Optimism, Composite Objectives, and Variational Bounds

Recently, much work has been done on extending the scope of online learning and incremental stochastic optimization algorithms. In this paper we contribute to this effort in two ways: First, based on a new regret decomposition and a generalization of Bregman divergences, we provide a self-contained, modular analysis of the two workhorses of online learning: (general) adaptive versions of Mirror Descent (MD) and the Follow-the-Regularized-Leader (FTRL) algorithms. The analysis is done with extra care so as not to introduce assumptions not needed in the proofs and allows to combine, in a straightforward way, different algorithmic ideas (e.g., adaptivity, optimism, implicit updates) and learning settings (e.g., strongly convex or composite objectives). This way we are able to reprove, extend and refine a large body of the literature, while keeping the proofs concise. The second contribution is a byproduct of this careful analysis: We present algorithms with improved variational bounds for smooth, composite objectives, including a new family of optimistic MD algorithms with only one projection step per round. Furthermore, we provide a simple extension of adaptive regret bounds to practically relevant non-convex problem settings with essentially no extra effort.

algorithm, artificial intelligence, machine learning, (18 more...)

1709.02726

Country:

North America > Canada > Alberta (0.28)
North America > United States (0.28)

Genre: Research Report (0.50)

Industry: Education > Educational Setting > Online (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)

Granziol, Diego, Roberts, Stephen

Entropic Determinants

The ability of many powerful machine learning algorithms to deal with large data sets without compromise is often hampered by computationally expensive linear algebra tasks, of which calculating the log determinant is a canonical example. In this paper we demonstrate the optimality of Maximum Entropy methods in approximating such calculations. We prove the equivalence between mean value constraints and sample expectations in the big data limit, that Covariance matrix eigenvalue distributions can be completely defined by moment information and that the reduction of the self entropy of a maximum entropy proposal distribution, achieved by adding more moments reduces the KL divergence between the proposal and true eigenvalue distribution. We empirically verify our results on a variety of SparseSuite matrices and establish best practices.

artificial intelligence, constraint, machine learning, (18 more...)

1709.02702

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Ablin, Pierre, Cardoso, Jean-François, Gramfort, Alexandre

Faster independent component analysis by preconditioning with Hessian approximations

Independent Component Analysis (ICA) is a technique for unsupervised exploration of multi-channel data that is widely used in observational sciences. In its classic form, ICA relies on modeling the data as linear mixtures of non-Gaussian independent sources. The maximization of the corresponding likelihood is a challenging problem if it has to be completed quickly and accurately on large sets of real data. We introduce the Preconditioned ICA for Real Data (Picard) algorithm, which is a relative L-BFGS algorithm preconditioned with sparse Hessian approximations. Extensive numerical comparisons to several algorithms of the same class demonstrate the superior performance of the proposed technique, especially on real data, for which the ICA model does not necessarily hold.

algorithm, artificial intelligence, machine learning, (17 more...)

1706.08171

Country: North America > United States (0.67)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Mathematics of Computing (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Jamshidi, Pooyan, Siegmund, Norbert, Velez, Miguel, Kästner, Christian, Patel, Akshay, Agarwal, Yuvraj

Transfer Learning for Performance Modeling of Configurable Systems: An Exploratory Analysis

arXiv.org Machine LearningSep-7-2017

Highly configurable software systems, such as mobile apps, compilers, and big data engines, are increasingly exposed to end users and developers on a daily basis for varying use cases. Users are interested not only in the fastest configuration, but also in whether the fastest configuration for their applications also remains the fastest when the environmental situation has been changed. For instance, a mobile developer might be interested to know if the software that she has configured to consume minimal energy on a testing platform will also remain energy efficient on the users' mobile platform; or, in general, whether the configuration will remain optimal when the software is used in a different environment (e.g., with a different workload, on different hardware). Performance models have been extensively used to learn and describe the performance behavior of configurable systems [15], [19], [21], [23], [33], [43]-[45], [54], [61], [63]. However, the exponentially growing configuration space, complex interactions, and unknown constraints among configuration options [56] often make it costly and difficult to learn an accurate and reliable performance model. Even worse, existing techniques usually consider only a fixed environment (e.g., fixed workload, fixed hardware, fixed versions of the dependent libraries); should that environment change, a new performance model may need to be learned from scratch. This strong assumption limits the reusability of performance models across environments.

artificial intelligence, configuration, machine learning, (19 more...)

1709.0228

Country: North America > United States (0.46)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

arXiv.org Artificial IntelligenceSep-7-2017

Evolution Strategies as a Scalable Alternative to Reinforcement Learning

Salimans, Tim, Ho, Jonathan, Chen, Xi, Sidor, Szymon, Sutskever, Ilya

We explore the use of Evolution Strategies (ES), a class of black box optimization algorithms, as an alternative to popular MDP-based RL techniques such as Q-learning and Policy Gradients. Experiments on MuJoCo and Atari show that ES is a viable solution strategy that scales extremely well with the number of CPUs available: By using a novel communication strategy based on common random numbers, our ES implementation only needs to communicate scalars, making it possible to scale to over a thousand parallel workers. This allows us to solve 3D humanoid walking in 10 minutes and obtain competitive results on most Atari games after one hour of training. In addition, we highlight several advantages of ES as a black box optimization technique: it is invariant to action frequency and delayed rewards, tolerant of extremely long horizons, and does not need temporal discounting or value function approximation.

evolutionary algorithm, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

1703.03864

Genre: Research Report (0.50)

Industry:

Transportation > Air (0.88)
Leisure & Entertainment > Games > Computer Games (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Boyd, Zachary, Bae, Egil, Tai, Xue-Cheng, Bertozzi, Andrea L.

Simplified Energy Landscape for Modularity Using Total Variation

arXiv.org Machine LearningSep-6-2017

Networks capture pairwise interactions between entities and are frequently used in applications such as social networks, food networks, and protein interaction networks, to name a few. Communities, cohesive groups of nodes, often form in these applications, and identifying them gives insight into the overall organization of the network. One common quality function used to identify community structure is modularity. In Hu et al. [SIAM J. App. Math., 73(6), 2013], it was shown that modularity optimization is equivalent to minimizing a particular nonconvex total variation (TV) based functional over a discrete domain. They solve this problem---assuming the number of communities is known---using a Merriman, Bence, Osher (MBO) scheme. We show that modularity optimization is equivalent to minimizing a convex TV-based functional over a discrete domain---again, assuming the number of communities is known. Furthermore, we show that modularity has no convex relaxation satisfying certain natural conditions. Despite this, we partially relax the discrete constraint using a Ginzburg Landau functional, yielding an optimization problem that is more nearly convex. We then derive an MBO algorithm with fewer parameters than in Hu et al. and which is 7 times faster at solving the associated diffusion equation due to the fact that the underlying discretization is unconditionally stable. Our numerical tests include a hyperspectral video whose associated graph has 29 million edges, which is roughly 37 times larger than was handled in the paper of Hu et al.

algorithm, artificial intelligence, optimization problem, (18 more...)

1707.09285

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > Norway (0.14)
Europe > Spain (0.14)

Genre: Research Report (0.82)

Industry: Energy > Oil & Gas (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)

Berthet, Quentin, Perchet, Vianney

Fast Rates for Bandit Optimization with Upper-Confidence Frank-Wolfe

arXiv.org Machine LearningSep-6-2017

We consider the problem of bandit optimization, inspired by stochastic optimization and online learning problems with bandit feedback. In this problem, the objective is to minimize a global loss function of all the actions, not necessarily a cumulative loss. This framework allows us to study a very general class of problems, with applications in statistics, machine learning, and other fields. To solve this problem, we analyze the Upper-Confidence Frank-Wolfe algorithm, inspired by techniques for bandits and convex optimization. We give theoretical guarantees for the performance of this algorithm over various classes of functions, and discuss the optimality of these results.

algorithm, artificial intelligence, machine learning, (16 more...)

1702.06917

Country:

Europe > United Kingdom > England (0.28)
North America > United States (0.28)

Genre: Research Report (0.82)

Industry: Education (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Mara, Alexandru, Jung, Alexander

Recovery Conditions and Sampling Strategies for Network Lasso

arXiv.org Machine LearningSep-3-2017

The network Lasso is a recently proposed convex optimization method for machine learning from massive network structured datasets, i.e., big data over networks. It is a variant of the well-known least absolute shrinkage and selection operator (Lasso), which is underlying many methods in learning and signal processing involving sparse models. Highly scalable implementations of the network Lasso can be obtained by state-of-the art proximal methods, e.g., the alternating direction method of multipliers (ADMM). By generalizing the concept of the compatibility condition put forward by van de Geer and Buehlmann as a powerful tool for the analysis of plain Lasso, we derive a sufficient condition, i.e., the network compatibility condition, on the underlying network topology such that network Lasso accurately learns a clustered underlying graph signal. This network compatibility condition relates the location of the sampled nodes with the clustering structure of the network. In particular, the NCC informs the choice of which nodes to sample, or in machine learning terms, which data points provide most information if labeled.

artificial intelligence, machine learning, optimization problem, (18 more...)

1709.01402

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.35)