AITopics

doi: 10.1109/TSP.2016.2546222

1506.05215

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Data Science (0.68)
Information Technology > Artificial Intelligence > Machine Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

arXiv.org Machine LearningJun-13-2015

A General Framework for Fast Stagewise Algorithms

Tibshirani, Ryan J.

Forward stagewise regression follows a very simple strategy for constructing a sequence of sparse regression estimates: it starts with all coefficients equal to zero, and iteratively updates the coefficient (by a small amount $\epsilon$) of the variable that achieves the maximal absolute inner product with the current residual. This procedure has an interesting connection to the lasso: under some conditions, it is known that the sequence of forward stagewise estimates exactly coincides with the lasso path, as the step size $\epsilon$ goes to zero. Furthermore, essentially the same equivalence holds outside of least squares regression, with the minimization of a differentiable convex loss function subject to an $\ell_1$ norm constraint (the stagewise algorithm now updates the coefficient corresponding to the maximal absolute component of the gradient). Even when they do not match their $\ell_1$-constrained analogues, stagewise estimates provide a useful approximation, and are computationally appealing. Their success in sparse modeling motivates the question: can a simple, effective strategy like forward stagewise be applied more broadly in other regularization settings, beyond the $\ell_1$ norm and sparsity? The current paper is an attempt to do just this. We present a general framework for stagewise estimation, which yields fast algorithms for problems such as group-structured learning, matrix completion, image denoising, and more.

algorithm, artificial intelligence, machine learning, (17 more...)

1408.5801

Country: North America > United States (0.67)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

Hage, Clemens, Kleinsteuber, Martin

Robust Structured Low-Rank Approximation on the Grassmannian

arXiv.org Machine LearningJun-12-2015

Over the past years Robust PCA has been established as a standard tool for reliable low-rank approximation of matrices in the presence of outliers. Recently, the Robust PCA approach via nuclear norm minimization has been extended to matrices with linear structures which appear in applications such as system identification and data series analysis. At the same time it has been shown how to control the rank of a structured approximation via matrix factorization approaches. The drawbacks of these methods either lie in the lack of robustness against outliers or in their static nature of repeated batch-processing. We present a Robust Structured Low-Rank Approximation method on the Grassmannian that on the one hand allows for fast re-initialization in an online setting due to subspace identification with manifolds, and that is robust against outliers due to a smooth approximation of the $\ell_p$-norm cost function on the other hand. The method is evaluated in online time series forecasting tasks on simulated and real-world data.

approximation, artificial intelligence, machine learning, (15 more...)

1506.03958

Country: North America > United States (1.00)

Genre: Research Report (0.64)

Industry:

Transportation (0.71)
Government > Regional Government > North America Government > United States Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.34)

Clevert, Djork-Arné, Mayr, Andreas, Unterthiner, Thomas, Hochreiter, Sepp

Rectified Factor Networks

arXiv.org Machine LearningJun-11-2015

We propose rectified factor networks (RFNs) to efficiently construct very sparse, non-linear, high-dimensional representations of the input. RFN models identify rare and small events in the input, have a low interference between code units, have a small reconstruction error, and explain the data covariance structure. RFN learning is a generalized alternating minimization algorithm derived from the posterior regularization method which enforces non-negative and normalized posterior means. We proof convergence and correctness of the RFN learning algorithm. On benchmarks, RFNs are compared to other unsupervised methods like autoencoders, RBMs, factor analysis, ICA, and PCA. In contrast to previous sparse coding methods, RFNs yield sparser codes, capture the data's covariance structure more precisely, and have a significantly smaller reconstruction error. We test RFNs as pretraining technique for deep networks on different vision datasets, where RFNs were superior to RBMs and autoencoders. On gene expression data from two pharmaceutical drug discovery studies, RFNs detected small and rare gene modules that revealed highly relevant new biological insights which were so far missed by other unsupervised methods.

artificial intelligence, constraint, machine learning, (18 more...)

1502.06464

Country: North America > United States > Massachusetts (0.27)

Genre: Research Report (0.64)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

arXiv.org Machine LearningJun-10-2015

Large-scale randomized-coordinate descent methods with non-separable linear constraints

Reddi, Sashank, Hefny, Ahmed, Downey, Carlton, Dubey, Avinava, Sra, Suvrit

We develop randomized (block) coordinate descent (CD) methods for linearly constrained convex optimization. Unlike most CD methods, we do not assume the constraints to be separable, but let them be coupled linearly. To our knowledge, ours is the first CD method that allows linear coupling constraints, without making the global iteration complexity have an exponential dependence on the number of constraints. We present algorithms and analysis for four key problem scenarios: (i) smooth; (ii) smooth + nonsmooth separable; (iii) asynchronous parallel; and (iv) stochastic. We illustrate empirical behavior of our algorithms by simulation experiments.

constraint, data mining, machine learning, (19 more...)

1409.2617

Country: North America > United States (0.46)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Data Science > Data Mining (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.47)

arXiv.org Machine LearningJun-9-2015

Clustering by transitive propagation

Kumar, Vijay, Levy, Dan

We present a global optimization algorithm for clustering data given the ratio of likelihoods that each pair of data points is in the same cluster or in different clusters. To define a clustering solution in terms of pairwise relationships, a necessary and sufficient condition is that belonging to the same cluster satisfies transitivity. We define a global objective function based on pairwise likelihood ratios and a transitivity constraint over all triples, assigning an equal prior probability to all clustering solutions. We maximize the objective function by implementing max-sum message passing on the corresponding factor graph to arrive at an O(N^3) algorithm. Lastly, we demonstrate an application inspired by mutational sequencing for decoding random binary words transmitted through a noisy channel.

algorithm, configuration, ijk, (14 more...)

1506.03072

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.86)

Heinze, Christina, McWilliams, Brian, Meinshausen, Nicolai, Krummenacher, Gabriel

LOCO: Distributing Ridge Regression with Random Projections

We propose Loco, an algorithm for large-scale ridge regression which distributes the features across workers on a cluster. Important dependencies between variables are preserved using structured random projections which are cheap to compute and must only be communicated once. We show that Loco obtains a solution which is close to the exact ridge regression solution in the fixed design setting. We verify this experimentally in a simulation study as well as an application to climate prediction. Furthermore, we show that Loco achieves significant speedups compared with a state-of-the-art distributed algorithm on a large-scale regression problem.

artificial intelligence, coefficient, machine learning, (18 more...)

1406.3469

Country: North America > United States (0.92)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Ma, Chenxin, Tappenden, Rachael, Takáč, Martin

Linear Convergence of the Randomized Feasible Descent Method Under the Weak Strong Convexity Assumption

In this paper we generalize the framework of the feasible descent method (FDM) to a randomized (R-FDM) and a coordinate-wise random feasible descent method (RC-FDM) framework. We show that the famous SDCA algorithm for optimizing the SVM dual problem, or the stochastic coordinate descent method for the LASSO problem, fits into the framework of RC-FDM. We prove linear convergence for both R-FDM and RC-FDM under the weak strong convexity assumption. Moreover, we show that the duality gap converges linearly for RC-FDM, which implies that the duality gap also converges linearly for SDCA applied to the SVM dual problem.

artificial intelligence, descent method, machine learning, (15 more...)

1506.0253

Country: North America > United States (0.28)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.30)

Tamar, Aviv, Chow, Yinlam, Ghavamzadeh, Mohammad, Mannor, Shie

Policy Gradient for Coherent Risk Measures

Several authors have recently developed risk-sensitive policy gradient methods that augment the standard expected cost minimization problem with a measure of variability in cost. These studies have focused on specific risk-measures, such as the variance or conditional value at risk (CVaR). In this work, we extend the policy gradient method to the whole class of coherent risk measures, which is widely accepted in finance and operations research, among other fields. We consider both static and time-consistent dynamic risk measures. For static risk measures, our approach is in the spirit of policy gradient algorithms and combines a standard sampling approach with convex programming. For dynamic risk measures, our approach is actor-critic style and involves explicit approximation of value function. Most importantly, our contribution presents a unified approach to risk-sensitive reinforcement learning that generalizes and extends previous results.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

1502.03919

Country: North America (0.28)

Genre: Research Report (0.64)

Industry: Banking & Finance (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)

Contal, Emile, Perchet, Vianney, Vayatis, Nicolas

Gaussian Process Optimization with Mutual Information

In this paper, we analyze a generic algorithm scheme for sequential global optimization using Gaussian processes. The upper bounds we derive on the cumulative regret for this generic algorithm improve by an exponential factor the previously known bounds for algorithms like GP-UCB. We also introduce the novel Gaussian Process Mutual Information algorithm (GP-MI), which significantly improves further these upper bounds for the cumulative regret. We confirm the efficiency of this algorithm on synthetic and real tasks against the natural competitor, GP-UCB, and also the Expected Improvement heuristic. After the publication of our article, we found an error in the proof of Lemma 1 which invalidates the main theorem.

artificial intelligence, machine learning, modeling & simulation, (17 more...)

1311.4825

Country: Europe (0.46)

Genre: Research Report (0.50)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.57)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Modeling & Simulation (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.89)