AITopics

1404.7796

Country: Europe > France (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)

Zhang, Ziming, Saligrama, Venkatesh

RAPID: Rapidly Accelerated Proximal Gradient Algorithms for Convex Minimization

arXiv.org Machine LearningJun-18-2014

In this paper, we propose a new algorithm to speed-up the convergence of accelerated proximal gradient (APG) methods. In order to minimize a convex function $f(\mathbf{x})$, our algorithm introduces a simple line search step after each proximal gradient step in APG so that a biconvex function $f(\theta\mathbf{x})$ is minimized over scalar variable $\theta>0$ while fixing variable $\mathbf{x}$. We propose two new ways of constructing the auxiliary variables in APG based on the intermediate solutions of the proximal gradient and the line search steps. We prove that at arbitrary iteration step $t (t\geq1)$, our algorithm can achieve a smaller upper-bound for the gap between the current and optimal objective values than those in the traditional APG methods such as FISTA, making it converge faster in practice. In fact, our algorithm can be potentially applied to many important convex optimization problems, such as sparse linear regression and kernel SVMs. Our experimental results clearly demonstrate that our algorithm converges faster than APG in all of the applications above, even comparable to some sophisticated solvers.

algorithm, artificial intelligence, machine learning, (15 more...)

1406.4445

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.35)

arXiv.org Artificial IntelligenceJun-18-2014

Exact Decoding on Latent Variable Conditional Models is NP-Hard

Sun, Xu

Latent variable conditional models, including the latent conditional random fields as a special case, are popular models for many natural language processing and vision processing tasks. The computational complexity of the exact decoding/inference in latent conditional random fields is unclear. In this paper, we try to clarify the computational complexity of the exact decoding. We analyze the complexity and demonstrate that it is an NP-hard problem even on a sequential labeling setting. Furthermore, we propose the latent-dynamic inference (LDI-Naive) method and its bounded version (LDI-Bounded), which are able to perform exact-inference or almost-exact-inference by using top-$n$ search and dynamic programming.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

1406.4682

Country: Europe (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Choi, Joon Hee, Vishwanathan, S. V. N.

DFacTo: Distributed Factorization of Tensors

We present a technique for significantly speeding up Alternating Least Squares (ALS) and Gradient Descent (GD), two widely used algorithms for tensor factorization. By exploiting properties of the Khatri-Rao product, we show how to efficiently address a computationally challenging sub-step of both algorithms. Our algorithm, DFacTo, only requires two sparse matrix-vector products and is easy to parallelize. DFacTo is not only scalable but also on average 4 to 10 times faster than competing algorithms on a variety of datasets. For instance, DFacTo only takes 480 seconds on 4 machines to perform one iteration of the ALS algorithm and 1,143 seconds to perform one iteration of the GD algorithm on a 6.5 million x 2.5 million x 1.5 million dimensional tensor with 1.2 billion non-zero entries.

algorithm, dataset, dfacto, (14 more...)

1406.4519

Country:

Africa > Senegal > Kolda Region > Kolda (0.04)
North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.89)

Cuturi, Marco, Doucet, Arnaud

Fast Computation of Wasserstein Barycenters

We present new algorithms to compute the mean of a set of empirical probability measures under the optimal transport metric. This mean, known as the Wasserstein barycenter, is the measure that minimizes the sum of its Wasserstein distances to each element in that set. We propose two original algorithms to compute Wasserstein barycenters that build upon the subgradient method. A direct implementation of these algorithms is, however, too costly because it would require the repeated resolution of large primal and dual optimal transport problems to compute subgradients. Extending the work of Cuturi (2013), we propose to smooth the Wasserstein distance used in the definition of Wasserstein barycenters with an entropic regularizer and recover in doing so a strictly convex objective whose gradients can be computed for a considerably cheaper computational cost using matrix scaling algorithms. We use these algorithms to visualize a large family of images and to solve a constrained clustering problem.

algorithm, barycenter, wasserstein barycenter, (12 more...)

1310.4375

Country:

Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Primitives for Dynamic Big Model Parallelism

Lee, Seunghak, Kim, Jin Kyu, Zheng, Xun, Ho, Qirong, Gibson, Garth A., Xing, Eric P.

When training large machine learning models with many variables or parameters, a single machine is often inadequate since the model may be too large to fit in memory, while training can take a long time even with stochastic updates. A natural recourse is to turn to distributed cluster computing, in order to harness additional memory and processors. However, naive, unstructured parallelization of ML algorithms can make inefficient use of distributed memory, while failing to obtain proportional convergence speedups - or can even result in divergence. We develop a framework of primitives for dynamic model-parallelism, STRADS, in order to explore partitioning and update scheduling of model variables in distributed ML algorithms - thus improving their memory efficiency while presenting new opportunities to speed up convergence without compromising inference correctness. We demonstrate the efficacy of model-parallel algorithms implemented in STRADS versus popular implementations for Topic Modeling, Matrix Factorization and Lasso.

artificial intelligence, machine learning, model variable, (17 more...)

1406.458

Country: North America > United States (0.68)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Chwialkowski, Kacper, Gretton, Arthur

A Kernel Independence Test for Random Processes

A new non parametric approach to the problem of testing the independence of two random process is developed. The test statistic is the Hilbert Schmidt Independence Criterion (HSIC), which was used previously in testing independence for i.i.d pairs of variables. The asymptotic behaviour of HSIC is established when computed from samples drawn from random processes. It is shown that earlier bootstrap procedures which worked in the i.i.d. case will fail for random processes, and an alternative consistent estimate of the p-values is proposed. Tests on artificial data and real-world Forex data indicate that the new test procedure discovers dependence which is missed by linear approaches, while the earlier bootstrap procedure returns an elevated number of false positives. The code is available online: https://github.com/kacperChwialkowski/HSIC .

artificial intelligence, dependence, machine learning, (14 more...)

1402.4501

Genre: Research Report > Experimental Study (0.34)

Industry: Banking & Finance > Trading (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.87)

Swersky, Kevin, Snoek, Jasper, Adams, Ryan Prescott

Freeze-Thaw Bayesian Optimization

arXiv.org Machine LearningJun-15-2014

In this paper we develop a dynamic form of Bayesian optimization for machine learning models with the goal of rapidly finding good hyperparameter settings. Our method uses the partial information gained during the training of a machine learning model in order to decide whether to pause training and start a new model, or resume the training of a previously-considered model. We specifically tailor our method to machine learning problems by developing a novel positive-definite covariance kernel to capture a variety of training curves. Furthermore, we develop a Gaussian process prior that scales gracefully with additional temporal observations. Finally, we provide an information-theoretic framework to automate the decision process. Experiments on several common machine learning models show that our approach is extremely effective in practice.

artificial intelligence, machine learning, optimization, (16 more...)

1406.3896

Country:

North America > Canada > Alberta (0.28)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (1.00)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Wang, Weiran, Carreira-Perpiñán, Miguel Á.

The Laplacian K-modes algorithm for clustering

arXiv.org Machine LearningJun-15-2014

In addition to finding meaningful clusters, centroid-based clustering algorithms such as K-means or mean-shift should ideally find centroids that are valid patterns in the input space, representative of data in their cluster. This is challenging with data having a nonconvex or manifold structure, as with images or text. We introduce a new algorithm, Laplacian K-modes, which naturally combines three powerful ideas in clustering: the explicit use of assignment variables (as in K-means); the estimation of cluster centroids which are modes of each cluster's density estimate (as in mean-shift); and the regularizing effect of the graph Laplacian, which encourages similar assignments for nearby points (as in spectral clustering). The optimization algorithm alternates an assignment step, which is a convex quadratic program, and a mean-shift step, which separates for each cluster centroid. The algorithm finds meaningful density estimates for each cluster, even with challenging problems where the clusters have manifold structure, are highly nonconvex or in high dimension. It also provides centroids that are valid patterns, truly representative of their cluster (unlike K-means), and an out-of-sample mapping that predicts soft assignments for a new point.

algorithm, artificial intelligence, machine learning, (18 more...)

1406.3895

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

arXiv.org Machine LearningJun-15-2014

An Incremental Reseeding Strategy for Clustering

Bresson, Xavier, Hu, Huiyi, Laurent, Thomas, Szlam, Arthur, von Brecht, James

In this work we propose a simple and easily parallelizable algorithm for multiway graph partitioning. The algorithm alternates between three basic components: diffusing seed vertices over the graph, thresholding the diffused seeds, and then randomly reseeding the thresholded clusters. We demonstrate experimentally that the proper combination of these ingredients leads to an algorithm that achieves state-of-the-art performance in terms of cluster purity on standard benchmarks datasets. Moreover, the algorithm runs an order of magnitude faster than the other algorithms that achieve comparable results in terms of accuracy. We also describe a coarsen, cluster and refine approach similar to GRACLUS and METIS that removes an additional order of magnitude from the runtime of our algorithm while still maintaining competitive accuracy.

artificial intelligence, data mining, machine learning, (18 more...)

1406.3837

Country: North America > United States > California (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science > Data Mining (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)