AITopics

1403.755

Country: North America > United States (1.00)

Genre:

Research Report > New Finding (0.48)
Research Report > Experimental Study (0.34)

Industry: Government > Regional Government > North America Government > United States Government (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Smith, Michael R., Mitchell, Logan, Giraud-Carrier, Christophe, Martinez, Tony

Recommending Learning Algorithms and Their Associated Hyperparameters

arXiv.org Machine LearningJul-7-2014

The success of machine learning on a given task dependson, among other things, which learning algorithm is selected and its associated hyperparameters. Selecting an appropriate learning algorithm and setting its hyperparameters for a given data set can be a challenging task, especially for users who are not experts in machine learning. Previous work has examined using meta-features to predict which learning algorithm and hyperparameters should be used. However, choosing a set of meta-features that are predictive of algorithm performance is difficult. Here, we propose to apply collaborative filtering techniques to learning algorithm and hyperparameter selection, and find that doing so avoids determining which meta-features to use and outperforms traditional meta-learning approaches in many cases.

algorithm, artificial intelligence, machine learning, (14 more...)

1407.189

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.96)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.70)

Wang, Huahua, Banerjee, Arindam

Bregman Alternating Direction Method of Multipliers

arXiv.org Machine LearningJul-7-2014

The mirror descent algorithm (MDA) generalizes gradient descent by using a Bregman divergence to replace squared Euclidean distance. In this paper, we similarly generalize the alternating direction method of multipliers (ADMM) to Bregman ADMM (BADMM), which allows the choice of different Bregman divergences to exploit the structure of problems. BADMM provides a unified framework for ADMM and its variants, including generalized ADMM, inexact ADMM and Bethe ADMM. We establish the global convergence and the $O(1/T)$ iteration complexity for BADMM. In some cases, BADMM can be faster than ADMM by a factor of $O(n/\log(n))$. In solving the linear program of mass transportation problem, BADMM leads to massive parallelism and can easily run on GPU. BADMM is several times faster than highly optimized commercial software Gurobi.

artificial intelligence, divergence, machine learning, (16 more...)

1306.3203

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)

arXiv.org Machine LearningJul-6-2014

Kinetic Energy Plus Penalty Functions for Sparse Estimation

Zhang, Zhihua, Zhao, Shibo, Shen, Zebang, Zhou, Shuchang

In this paper we propose and study a family of sparsity-inducing penalty functions. Since the penalty functions are related to the kinetic energy in special relativity, we call them \emph{kinetic energy plus} (KEP) functions. We construct the KEP function by using the concave conjugate of a $\chi^2$-distance function and present several novel insights into the KEP function with $q=1$. In particular, we derive a thresholding operator based on the KEP function, and prove its mathematical properties and asymptotic properties in sparsity modeling. Moreover, we show that a coordinate descent algorithm is especially appropriate for the KEP function. Additionally, we discuss the relationship of KEP with the penalty functions $\ell_{1/2}$ and MCP. The theoretical and empirical analysis validates that the KEP function is effective and efficient in high-dimensional data modeling.

artificial intelligence, machine learning, penalty, (16 more...)

1307.5601

Country:

Asia > China (0.46)
North America > United States > California (0.28)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Harandi, Mehrtash T., Salzmann, Mathieu, Jayasumana, Sadeep, Hartley, Richard, Li, Hongdong

Expanding the Family of Grassmannian Kernels: An Embedding Perspective

arXiv.org Machine LearningJul-4-2014

Modeling videos and image-sets as linear subspaces has proven beneficial for many visual recognition tasks. However, it also incurs challenges arising from the fact that linear subspaces do not obey Euclidean geometry, but lie on a special type of Riemannian manifolds known as Grassmannian. To leverage the techniques developed for Euclidean spaces (e.g., support vector machines) with subspaces, several recent studies have proposed to embed the Grassmannian into a Hilbert space by making use of a positive definite kernel. Unfortunately, only two Grassmannian kernels are known, none of which -as we will show-is universal, which limits their ability to approximate a target function arbitrarily well. Here, we introduce several positive definite Grassmannian kernels, including universal ones, and demonstrate their superiority over previously-known kernels in various tasks, such as classification, clustering, sparse coding and hashing.

artificial intelligence, kernel, machine learning, (16 more...)

1407.1123

Country: Oceania > Australia (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.68)

Tulabandhula, Theja, Rudin, Cynthia

Robust Optimization using Machine Learning for Uncertainty Sets

arXiv.org Machine LearningJul-3-2014

Our goal is to build robust optimization problems for making decisions based on complex data from the past. In robust optimization (RO) generally, the goal is to create a policy for decision-making that is robust to our uncertainty about the future. In particular, we want our policy to best handle the the worst possible situation that could arise, out of an uncertainty set of possible situations. Classically, the uncertainty set is simply chosen by the user, or it might be estimated in overly simplistic ways with strong assumptions; whereas in this work, we learn the uncertainty set from data collected in the past. The past data are drawn randomly from an (unknown) possibly complicated high-dimensional distribution. We propose a new uncertainty set design and show how tools from statistical learning theory can be employed to provide probabilistic guarantees on the robustness of the policy.

artificial intelligence, machine learning, probability, (16 more...)

1407.1097

Genre:

Research Report (0.50)
Workflow (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

On the Consistency of AUC Pairwise Optimization

Gao, Wei, Zhou, Zhi-Hua

AUC (area under ROC curve) is an important evaluation criterion, which has been popularly used in many learning tasks such as class-imbalance learning, cost-sensitive learning, learning to rank, etc. Many learning approaches try to optimize AUC, while owing to the non-convexity and discontinuousness of AUC, almost all approaches work with surrogate loss functions. Thus, the consistency of AUC is crucial; however, it has been almost untouched before. In this paper, we provide a sufficient condition for the asymptotic consistency of learning approaches based on surrogate loss functions. Based on this result, we prove that exponential loss and logistic loss are consistent with AUC, but hinge loss is inconsistent. Then, we derive the $q$-norm hinge loss and general hinge loss that are consistent with AUC. We also derive the consistent bounds for exponential loss and logistic loss, and obtain the consistent bounds for many surrogate loss functions under the non-noise setting. Further, we disclose an equivalence between the exponential surrogate loss of AUC and exponential surrogate loss of accuracy, and one straightforward consequence of such finding is that AdaBoost and RankBoost are equivalent.

consistency, exponential loss, surrogate loss, (15 more...)

1208.0645

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Washington > King County > Bellevue (0.04)
North America > United States > New York (0.04)
(14 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)

Structured Learning via Logistic Regression

Domke, Justin

A successful approach to structured learning is to write the learning objective as a joint function of linear parameters and inference messages, and iterate between updates to each. This paper observes that if the inference problem is "smoothed" through the addition of entropy terms, for fixed messages, the learning objective reduces to a traditional (non-structured) logistic regression problem with respect to parameters. In these logistic regression problems, each training example has a bias term determined by the current set of messages. Based on this insight, the structured energy function can be extended from linear factors to any function class where an "oracle" exists to minimize a logistic loss.

artificial intelligence, example prediction, machine learning, (18 more...)

1407.0754

Genre: Research Report > New Finding (0.56)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.83)

Boullé, Marc, Guigourès, Romain, Rossi, Fabrice

Nonparametric Hierarchical Clustering of Functional Data

In this paper, we deal with the problem of curves clustering. We propose a nonparametric method which partitions the curves into clusters and discretizes the dimensions of the curve points into intervals. The cross-product of these partitions forms a data-grid which is obtained using a Bayesian model selection approach while making no assumptions regarding the curves. Finally, a post-processing technique, aiming at reducing the number of clusters in order to improve the interpretability of the clustering, is proposed. It consists in optimally merging the clusters step by step, which corresponds to an agglomerative hierarchical classification whose dissimilarity measure is the variation of the criterion. Interestingly this measure is none other than the sum of the Kullback-Leibler divergences between clusters distributions before and after the merges. The practical interest of the approach for functional data exploratory analysis is presented and compared with an alternative approach on an artificial and a real world data set.

artificial intelligence, bayesian inference, machine learning, (16 more...)

doi: 10.1007/978-3-319-02999-3_2

1407.0612

Country: North America > United States (0.46)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.88)

How Many Dissimilarity/Kernel Self Organizing Map Variants Do We Need?

Rossi, Fabrice

In numerous applicative contexts, data are too rich and too complex to be represented by numerical vectors. A general approach to extend machine learning and data mining techniques to such data is to really on a dissimilarity or on a kernel that measures how different or similar two objects are. This approach has been used to define several variants of the Self Organizing Map (SOM). This paper reviews those variants in using a common set of notations in order to outline differences and similarities between them.

artificial intelligence, machine learning, survey article, (18 more...)

doi: 10.1007/978-3-319-07695-9_1

1407.0611

Country:

Europe (1.00)
North America > United States > California (0.28)

Genre:

Research Report (0.90)
Overview (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)