AITopics

1810.0757

Country:

North America > United States (0.28)
Europe > United Kingdom (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

CHOPT : Automated Hyperparameter Optimization Framework for Cloud-Based Machine Learning Platforms

Kim, Jinwoong, Kim, Minkyu, Park, Heungseok, Kusdavletov, Ernar, Lee, Dongjun, Kim, Adrian, Kim, Ji-Hoon, Ha, Jung-Woo, Sung, Nako

Deep neural networks (DNNs) have become an essential method for solving difficult tasks in computer vision, signal processing, and natural language processing (He et al., 2016; Choi et al., 2018; Han et al., 2017; Van Den Oord et al., 2016; Seo et al., 2016; Vaswani et al., 2017). As the capabilities of deep learning have expanded with more modular architectures and advanced optimization methods, the number of hyperparameters has increased in general. This increase of hyperparameter sizes makes it more difficult for a researcher to optimize a model, wasting a lot of human resources and potentially leading unfair comparisons. This reinforces the importance of efficient automated hyperparameter tuning methods and interfaces. To address this problem, several hyperparameter optimization (HyperOpt) methods have been proposed (Jaderberg et al., 2017; Falkner et al., 2018; Li et al., 2017). These methods have many advantages such as strong final performance, parallelism, early stopping which significantly improve performance in terms of computing resource efficiency and optimization time.

artificial intelligence, machine learning, optimization problem, (16 more...)

1810.03527

Genre: Research Report (0.50)

Industry: Information Technology > Services (0.51)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Geng, Sinong, Kolar, Mladen, Koyejo, Oluwasanmi

Joint Nonparametric Precision Matrix Estimation with Confounding

We consider the problem of precision matrix estimation where, due to extraneous confounding of the underlying precision matrix, the data are independent but not identically distributed. While such confounding occurs in many scientific problems, our approach is inspired by recent neuroscientific research suggesting that brain function, as measured using functional magnetic resonance imagine (fMRI), is susceptible to confounding by physiological noise such as breathing and subject motion. Following the scientific motivation, we propose a graphical model, which in turn motivates a joint nonparametric estimator. We provide theoretical guarantees for the consistency and the convergence rate of the proposed estimator. In addition, we demonstrate that the optimization of the proposed estimator can be transformed into a series of linear programming problems, and thus be efficiently solved in parallel. Empirical results are presented using simulated and real brain imaging data, which suggest that our approach improves precision matrix estimation, as compared to baselines, when confounding is present.

artificial intelligence, machine learning, precision matrix, (14 more...)

1810.07147

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)

Maximizing Monotone DR-submodular Continuous Functions by Derivative-free Optimization

Zhang, Yibo, Qian, Chao, Tang, Ke

In this paper, we study the problem of monotone (weakly) DR-submodular continuous maximization. While previous methods require the gradient information of the objective function, we propose a derivative-free algorithm LDGM for the first time. We define $\beta$ and $\alpha$ to characterize how close a function is to continuous DR-submodulr and submodular, respectively. Under a convex polytope constraint, we prove that LDGM can achieve a $(1-e^{-\beta}-\epsilon)$-approximation guarantee after $O(1/\epsilon)$ iterations, which is the same as the best previous gradient-based algorithm. Moreover, in some special cases, a variant of LDGM can achieve a $((\alpha/2)(1-e^{-\alpha})-\epsilon)$-approximation guarantee for (weakly) submodular functions. We also compare LDGM with the gradient-based algorithm Frank-Wolfe under noise, and show that LDGM can be more robust. Empirical results on budget allocation verify the effectiveness of LDGM.

artificial intelligence, machine learning, rontier, (18 more...)

1810.06833

Country:

Europe (0.93)
North America > United States (0.46)
Asia > China (0.46)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.68)
Information Technology > Data Science (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.47)

Mishne, Gal, Chi, Eric C., Coifman, Ronald R.

Co-manifold learning with missing data

Representation learning is typically applied to only one mode of a data matrix, either its rows or columns. Yet in many applications, there is an underlying geometry to both the rows and the columns. We propose utilizing this coupled structure to perform co-manifold learning: uncovering the underlying geometry of both the rows and the columns of a given matrix, where we focus on a missing data setting. Our unsupervised approach consists of three components. We first solve a family of optimization problems to estimate a complete matrix at multiple scales of smoothness. We then use this collection of smooth matrix estimates to compute pairwise distances on the rows and columns based on a new multi-scale metric that implicitly introduces a coupling between the rows and the columns. Finally, we construct row and column representations from these multi-scale metrics. We demonstrate that our approach outperforms competing methods in both data visualization and clustering.

artificial intelligence, machine learning, row and column, (17 more...)

1810.06803

Country: North America > United States (0.68)

Genre: Research Report (0.40)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Education (0.63)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)

Lessard, Laurent, Zhang, Xuezhou, Zhu, Xiaojin

An Optimal Control Approach to Sequential Machine Teaching

Given a sequential learning algorithm and a target model, sequential machine teaching aims to find the shortest training sequence to drive the learning algorithm to the target model. We present the first principled way to find such shortest training sequences. Our key insight is to formulate sequential machine teaching as a time-optimal control problem. This allows us to solve sequential teaching by leveraging key theoretical and computational tools developed over the past 60 years in the optimal control community. Specifically, we study the Pontryagin Maximum Principle, which yields a necessary condition for optimality of a training sequence. We present analytic, structural, and numerical implications of this approach on a case study with a least-squares loss function and gradient descent learner. We compute optimal training sequences for this problem, and although the sequences seem circuitous, we find that they can vastly outperform the best available heuristics for generating training sequences.

artificial intelligence, machine learning, trajectory, (17 more...)

1810.06175

Country: North America > United States > Wisconsin (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)

Jubran, Ibrahim, Feldman, Dan

Minimizing Sum of Non-Convex but Piecewise log-Lipschitz Functions using Coresets

We suggest a new optimization technique for minimizing the sum $\sum_{i=1}^n f_i(x)$ of $n$ non-convex real functions that satisfy a property that we call piecewise log-Lipschitz. This is by forging links between techniques in computational geometry, combinatorics and convex optimization. Example applications include the first constant-factor approximation algorithms whose running-time is polynomial in $n$ for the following fundamental problems: (i) Constrained $\ell_z$ Linear Regression: Given $z>0$, $n$ vectors $p_1,\cdots,p_n$ on the plane, and a vector $b\in\mathbb{R}^n$, compute a unit vector $x$ and a permutation $\pi:[n]\to[n]$ that minimizes $\sum_{i=1}^n |p_ix-b_{\pi(i)}|^z$. (ii) Points-to-Lines alignment: Given $n$ lines $\ell_1,\cdots,\ell_n$ on the plane, compute the matching $\pi:[n]\to[n]$ and alignment (rotation matrix $R$ and a translation vector $t$) that minimize the sum of Euclidean distances \[ \sum_{i=1}^n \mathrm{dist}(Rp_i-t,\ell_{\pi(i)})^z \] between each point to its corresponding line. These problems are open even if $z=1$ and the matching $\pi$ is given. In this case, the running time of our algorithms reduces to $O(n)$ using core-sets that support: streaming, dynamic, and distributed parallel computations (e.g. on the cloud) in poly-logarithmic update time. Generalizations for handling e.g. outliers or pseudo-distances such as $M$-estimators for these problems are also provided. Experimental results show that our provable algorithms improve existing heuristics also in practice. A demonstration in the context of Augmented Reality show how such algorithms may be used in real-time systems.

artificial intelligence, machine learning, optimization problem, (19 more...)

1807.08446

Country:

Asia (0.67)
North America > United States (0.46)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Balcan, Maria-Florina, Nagarajan, Vaishnavh, Vitercik, Ellen, White, Colin

Learning-Theoretic Foundations of Algorithm Configuration for Combinatorial Partitioning Problems

arXiv.org Artificial IntelligenceOct-16-2018

Max-cut, clustering, and many other partitioning problems that are of significant importance to machine learning and other scientific fields are NP-hard, a reality that has motivated researchers to develop a wealth of approximation algorithms and heuristics. Although the best algorithm to use typically depends on the specific application domain, a worst-case analysis is often used to compare algorithms. This may be misleading if worst-case instances occur infrequently, and thus there is a demand for optimization methods which return the algorithm configuration best suited for the given application's typical inputs. We address this problem for clustering, max-cut, and other partitioning problems, such as integer quadratic programming, by designing computationally efficient and sample efficient learning algorithms which receive samples from an application-specific distribution over problem instances and learn a partitioning algorithm with high expected performance. Our algorithms learn over common integer quadratic programming and clustering algorithm families: SDP rounding algorithms and agglomerative clustering algorithms with dynamic programming. For our sample complexity analysis, we provide tight bounds on the pseudodimension of these algorithm classes, and show that surprisingly, even for classes of algorithms parameterized by a single parameter, the pseudo-dimension is superconstant. In this way, our work both contributes to the foundations of algorithm configuration and pushes the boundaries of learning theory, since the algorithm classes we analyze consist of multi-stage optimization procedures and are significantly more complex than classes typically studied in learning theory.

algorithm, artificial intelligence, machine learning, (20 more...)

arXiv.org Artificial Intelligence

1611.04535

Genre:

Research Report (0.63)
Workflow (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

arXiv.org Machine LearningOct-15-2018

Predictor-Corrector Policy Optimization

Cheng, Ching-An, Yan, Xinyan, Ratliff, Nathan, Boots, Byron

We present a predictor-corrector framework, called PicCoLO, that can transform a first-order model-free reinforcement or imitation learning algorithm into a new hybrid method that leverages predictive models to accelerate policy learning. The new "PicCoLOed" algorithm optimizes a policy by recursively repeating two steps: In the Prediction Step, the learner uses a model to predict the unseen future gradient and then applies the predicted estimate to update the policy; in the Correction Step, the learner runs the updated policy in the environment, receives the true gradient, and then corrects the policy using the gradient error. Unlike previous algorithms, PicCoLO corrects for the mistakes of using imperfect predicted gradients and hence does not suffer from model bias. The development of PicCoLO is made possible by a novel reduction from predictable online learning to adversarial online learning, which provides a systematic way to modify existing first-order algorithms to achieve the optimal regret with respect to predictable information. We show, in both theory and simulation, that the convergence rate of several first-order model-free algorithms can be improved by PicCoLO.

algorithm, artificial intelligence, machine learning, (16 more...)

1810.06509

Genre:

Research Report (1.00)
Overview (0.93)

Industry: Education > Educational Setting > Online (0.56)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.56)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Froese, Vincent, Jain, Brijnesh, Niedermeier, Rolf, Renken, Malte

Comparing Temporal Graphs Using Dynamic Time Warping

arXiv.org Machine LearningOct-15-2018

The connections within many real-world networks change over time. Thus, there has been a recent boom in studying temporal graphs. Recognizing patterns in temporal graphs requires a similarity measure to compare different temporal graphs. To this end, we initiate the study of dynamic time warping (an established concept for mining time series data) on temporal graphs. We propose the dynamic temporal graph warping distance (dtgw) to determine the (dis-)similarity of two temporal graphs. Our novel measure is flexible and can be applied in various application domains. We show that computing the dtgw-distance is a challenging (NP-hard) optimization problem and identify some polynomial-time solvable special cases. Moreover, we develop a quadratic programming formulation and an efficient heuristic. Preliminary experiments indicate that the heuristic performs very well and that our concept yields meaningful results on real-world instances.

data mining, machine learning, temporal graph, (17 more...)

1810.0624

Genre: Research Report (0.40)

Industry: Information Technology (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)