AITopics

1602.04391

Country: North America > United States > California > Santa Clara County (0.28)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Fusi, Nicolo, Elibol, Huseyn Melih

Probabilistic Matrix Factorization for Automated Machine Learning

In order to achieve state-of-the-art performance, modern machine learning techniques require careful data pre-processing and hyperparameter tuning. Moreover, given the ever increasing number of machine learning models being developed, model selection is becoming increasingly important. Automating the selection and tuning of machine learning pipelines consisting of data pre-processing methods and machine learning models, has long been one of the goals of the machine learning community. In this paper, we tackle this meta-learning task by combining ideas from collaborative filtering and Bayesian optimization. Using probabilistic matrix factorization techniques and acquisition functions from Bayesian optimization, we exploit experiments performed in hundreds of different datasets to guide the exploration of the space of possible pipelines. In our experiments, we show that our approach quickly identifies high-performing pipelines across a wide range of datasets, significantly outperforming the current state-of-the-art.

artificial intelligence, machine learning, pipeline, (16 more...)

1705.05355

Country: North America > United States (0.29)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Wimalawarne, Kishan, Yamada, Makoto, Mamitsuka, Hiroshi

Convex Coupled Matrix and Tensor Completion

We propose a set of convex low rank inducing norms for a coupled matrices and tensors (hereafter coupled tensors), which shares information between matrices and tensors through common modes. More specifically, we propose a mixture of the overlapped trace norm and the latent norms with the matrix trace norm, and then, we propose a new completion algorithm based on the proposed norms. A key advantage of the proposed norms is that it is convex and can find a globally optimal solution, while existing methods for coupled learning are non-convex. Furthermore, we analyze the excess risk bounds of the completion model regularized by our proposed norms which show that our proposed norms can exploit the low rankness of coupled tensors leading to better bounds compared to uncoupled norms. Through synthetic and real-world data experiments, we show that the proposed completion algorithm compares favorably with existing completion algorithms.

artificial intelligence, machine learning, tensor, (19 more...)

1705.05197

Country: Asia > Japan (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Jung, Alexander, Hero, Alfred O. III, Mara, Alexandru, Jahromi, Saeed

Semi-Supervised Learning via Sparse Label Propagation

This work proposes a novel method for semi-supervised learning from partially labeled massive network-structured datasets, i.e., big data over networks. We model the underlying hypothesis, which relates data points to labels, as a graph signal, defined over some graph (network) structure intrinsic to the dataset. Following the key principle of supervised learning, i.e., "similar inputs yield similar outputs", we require the graph signals induced by labels to have small total variation. Accordingly, we formulate the problem of learning the labels of data points as a non-smooth convex optimization problem which amounts to balancing between the empirical loss, i.e., the discrepancy with some partially available label information, and the smoothness quantified by the total variation of the learned graph signal. We solve this optimization problem by appealing to a recently proposed preconditioned variant of the popular primal-dual method by Pock and Chambolle, which results in a sparse label propagation algorithm. This learning algorithm allows for a highly scalable implementation as message passing over the underlying data graph. By applying concepts of compressed sensing to the learning problem, we are also able to provide a transparent sufficient condition on the underlying network structure such that accurate learning of the labels is possible. We also present an implementation of the message passing formulation allows for a highly scalable implementation in big data frameworks.

artificial intelligence, inductive learning, machine learning, (14 more...)

1612.01414

Country: North America > United States > Michigan (0.28)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.72)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)

Yu, Jiaqian, Blaschko, Matthew

The Lov\'asz Hinge: A Novel Convex Surrogate for Submodular Losses

Learning with non-modular losses is an important problem when sets of predictions are made simultaneously. The main tools for constructing convex surrogate loss functions for set prediction are margin rescaling and slack rescaling. In this work, we show that these strategies lead to tight convex surrogates iff the underlying loss function is increasing in the number of incorrect predictions. However, gradient or cutting-plane computation for these functions is NP-hard for non-supermodular loss functions. We propose instead a novel surrogate loss function for submodular losses, the Lov\'asz hinge, which leads to O(p log p) complexity with O(p) oracle accesses to the loss function to compute a gradient or cutting-plane. We prove that the Lov\'asz hinge is convex and yields an extension. As a result, we have developed the first tractable convex surrogates in the literature for submodular losses. We demonstrate the utility of this novel convex surrogate through several set prediction tasks, including on the PASCAL VOC and Microsoft COCO datasets.

artificial intelligence, machine learning, optimization problem, (18 more...)

1512.07797

Country: Europe (0.46)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

@machinelearnbotMay-14-2017, 18:45:11 GMT

Data Analysis Method: Mathematics Optimization to Build Decision Making

Optimization is a problem associated with the best decision that is effective and efficient decisions whether it is worth maximum or minimum by way of determining a satisfactory solution. Optimization is not a new science. It has grown even since Newton in the 17th century discovered how to count roots. Currently the science of optimization is still evolving in terms of techniques and applications. Many cases or problems in everyday life that involve optimization to solve them.

artificial intelligence, optimization, social media, (7 more...)

@machinelearnbot

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

arXiv.org Machine LearningMay-14-2017

Discrete Sequential Prediction of Continuous Actions for Deep RL

Metz, Luke, Ibarz, Julian, Jaitly, Navdeep, Davidson, James

It has long been assumed that high dimensional continuous control problems cannot be solved effectively by discretizing individual dimensions of the action space due to the exponentially large number of bins over which policies would have to be learned. In this paper, we draw inspiration from the recent success of sequence-to-sequence models for structured prediction problems to develop policies over discretized spaces. Central to this method is the realization that complex functions over high dimensional spaces can be modeled by neural networks that use next step prediction. Specifically, we show how Q-values and policies over continuous spaces can be modeled using a next step prediction model over discretized dimensions. With this parameterization, it is possible to both leverage the compositional structure of action spaces during learning, as well as compute maxima over action spaces (approximately). On a simple example task we demonstrate empirically that our method can perform global search, which effectively gets around the local optimization issues that plague DDPG and NAF. We apply the technique to off-policy (Q-learning) methods and show that our method can achieve the state-of-the-art for off-policy methods on several continuous control tasks.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

1705.05035

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Javadi, Hamid, Montanari, Andrea

Non-negative Matrix Factorization via Archetypal Analysis

arXiv.org Machine LearningMay-8-2017

Given a collection of data points, non-negative matrix factorization (NMF) suggests to express them as convex combinations of a small set of `archetypes' with non-negative entries. This decomposition is unique only if the true archetypes are non-negative and sufficiently sparse (or the weights are sufficiently sparse), a regime that is captured by the separability condition and its generalizations. In this paper, we study an approach to NMF that can be traced back to the work of Cutler and Breiman (1994) and does not require the data to be separable, while providing a generally unique decomposition. We optimize the trade-off between two objectives: we minimize the distance of the data points from the convex envelope of the archetypes (which can be interpreted as an empirical risk), while minimizing the distance of the archetypes from the convex envelope of the data (which can be interpreted as a data-dependent regularization). The archetypal analysis method of (Cutler, Breiman, 1994) is recovered as the limiting case in which the last term is given infinite weight. We introduce a `uniqueness condition' on the data which is necessary for exactly recovering the archetypes from noiseless data. We prove that, under uniqueness (plus additional regularity conditions on the geometry of the archetypes), our estimator is robust. While our approach requires solving a non-convex optimization problem, we find that standard optimization methods succeed in finding good solutions both for real and synthetic data.

artificial intelligence, conv, machine learning, (16 more...)

1705.02994

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.86)

arXiv.org Machine LearningMay-8-2017

Machine Learning with World Knowledge: The Position and Survey

Song, Yangqiu, Roth, Dan

Machine learning has become pervasive in multiple domains, impacting a wide variety of applications, such as knowledge discovery and data mining, natural language processing, information retrieval, computer vision, social and health informatics, ubiquitous computing, etc. Two essential problems of machine learning are how to generate features and how to acquire labels for machines to learn. Particularly, labeling large amount of data for each domain-specific problem can be very time consuming and costly. It has become a key obstacle in making learning protocols realistic in applications. In this paper, we will discuss how to use the existing general-purpose world knowledge to enhance machine learning processes, by enriching the features or reducing the labeling work. We start from the comparison of world knowledge with domain-specific knowledge, and then introduce three key problems in using world knowledge in learning processes, i.e., explicit and implicit feature representation, inference for knowledge linking and disambiguation, and learning with direct or indirect supervision. Finally we discuss the future directions of this research topic.

information retrieval, machine learning, natural language, (22 more...)

1705.02908

Country: North America > United States > Illinois (0.46)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Government > Regional Government > North America Government > United States Government (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(7 more...)

Jiang, Daniel R., Powell, Warren B.

Risk-Averse Approximate Dynamic Programming with Quantile-Based Risk Measures

arXiv.org Artificial IntelligenceMay-8-2017

In this paper, we consider a finite-horizon Markov decision process (MDP) for which the objective at each stage is to minimize a quantile-based risk measure (QBRM) of the sequence of future costs; we call the overall objective a dynamic quantile-based risk measure (DQBRM). In particular, we consider optimizing dynamic risk measures where the one-step risk measures are QBRMs, a class of risk measures that includes the popular value at risk (VaR) and the conditional value at risk (CVaR). Although there is considerable theoretical development of risk-averse MDPs in the literature, the computational challenges have not been explored as thoroughly. We propose data-driven and simulation-based approximate dynamic programming (ADP) algorithms to solve the risk-averse sequential decision problem. We address the issue of inefficient sampling for risk applications in simulated settings and present a procedure, based on importance sampling, to direct samples toward the "risky region" as the ADP algorithm progresses. Finally, we show numerical results of our algorithms in the context of an application involving risk-averse bidding for energy storage.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

1509.0192

Country: North America > United States (0.46)

Genre: Research Report (0.40)

Industry:

Energy > Energy Storage (0.66)
Banking & Finance > Trading (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)