AITopics

In this paper, we examine the generalization error of regularized distance metric learning. We show that with appropriate constraints, the generalization error of regularized distance metric learning could be independent from the dimensionality, making it suitable for handling high dimensional data. In addition, we present an efficient online learning algorithm for regularized distance metric learning. Our empirical studies with data classification and face recognition show that the proposed algorithm is (i) effective for distance metric learning when compared to the state-of-the-art methods, and (ii) efficient and robust for high dimensional data.

artificial intelligence, machine learning, metric learning, (15 more...)

Country: North America > United States > Michigan (0.28)

Genre: Research Report (0.88)

Industry:

Health & Medicine (0.68)
Government (0.47)
Education > Educational Setting > Online (0.36)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.67)

Hu, Chonghai, Pan, Weike, Kwok, James T.

Accelerated Gradient Methods for Stochastic Optimization and Online Learning

Regularized risk minimization often involves non-smooth optimization, either because of the loss function (e.g., hinge loss) or the regularizer (e.g., $\ell_1$-regularizer). Gradient descent methods, though highly scalable and easy to implement, are known to converge slowly on these problems. In this paper, we develop novel accelerated gradient methods for stochastic optimization while still preserving their computational simplicity and scalability. The proposed algorithm, called SAGE (Stochastic Accelerated GradiEnt), exhibits fast convergence rates on stochastic optimization with both convex and strongly convex objectives. Experimental results show that SAGE is faster than recent (sub)gradient methods including FOLOS, SMIDAS and SCD. Moreover, SAGE can also be extended for online learning, resulting in a simple but powerful algorithm.

algorithm, artificial intelligence, machine learning, (14 more...)

Country:

North America > United States (0.68)
Asia > China > Hong Kong (0.14)

Genre: Research Report (0.34)

Industry: Education > Educational Setting > Online (0.61)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.50)

Periodic Step Size Adaptation for Single Pass On-line Learning

Hsu, Chun-nan, Chang, Yu-ming, Huang, Hanshen, Lee, Yuh-jye

It has been established that the second-order stochastic gradient descent (2SGD) method can potentially achieve generalization performance as well as empirical optimum in a single pass (i.e., epoch) through the training examples. However, 2SGD requires computing the inverse of the Hessian matrix of the loss function, which is prohibitively expensive. This paper presents Periodic Step-size Adaptation (PSA), which approximates the Jacobian matrix of the mapping function and explores a linear relation between the Jacobian and Hessian to approximate the Hessian periodically and achieve near-optimal results in experiments on a wide variety of models and tasks.

artificial intelligence, machine learning, psa, (18 more...)

Country: North America > United States (0.69)

Genre:

Instructional Material > Online (0.50)
Research Report > New Finding (0.47)

Industry: Education > Educational Setting > Online (0.87)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Hsu, Anne, Griffiths, Thomas L.

Differential Use of Implicit Negative Evidence in Generative and Discriminative Language Learning

A classic debate in cognitive science revolves around understanding how children learn complex linguistic rules, such as those governing restrictions on verb alternations, without negative evidence. Traditionally, formal learnability arguments have been used to claim that such learning is impossible without the aid of innate language-specific knowledge. However, recently, researchers have shown that statistical models are capable of learning complex rules from only positive evidence. These two kinds of learnability analyses differ in their assumptions about the role of the distribution from which linguistic input is generated. The former analyses assume that learners seek to identify grammatical sentences in a way that is robust to the distribution from which the sentences are generated, analogous to discriminative approaches in machine learning. The latter assume that learners are trying to estimate a generative model, with sentences being sampled from that model. We show that these two learning approaches differ in their use of implicit negative evidence -- the absence of a sentence -- when learning verb alternations, and demonstrate that human learners can produce results consistent with the predictions of both approaches, depending on the context in which the learning problem is presented.

artificial intelligence, machine learning, sentence structure, (15 more...)

Country: North America > United States > California > Alameda County > Berkeley (0.14)

Genre: Research Report > New Finding (0.96)

Industry: Education > Curriculum > Subject-Specific Education (0.43)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.69)

Singer, Yoram, Duchi, John C.

Efficient Learning using Forward-Backward Splitting

We describe, analyze, and experiment with a new framework for empirical loss minimization with regularization. Our algorithmic framework alternates between two phases. On each iteration we first perform an {\em unconstrained} gradient descent step. We then cast and solve an instantaneous optimization problem that trades off minimization of a regularization term while keeping close proximity to the result of the first phase. This yields a simple yet effective algorithm for both batch penalized risk minimization and online learning. Furthermore, the two phase approach enables sparse solutions when used in conjunction with regularization functions that promote sparsity, such as $\ell_1$. We derive concrete and very simple algorithms for minimization of loss functions with $\ell_1$, $\ell_2$, $\ell_2^2$, and $\ell_\infty$ regularization. We also show how to construct efficient algorithms for mixed-norm $\ell_1/\ell_q$ regularization. We further extend the algorithms and give efficient implementations for very high-dimensional data with sparsity. We demonstrate the potential of the proposed framework in experiments with synthetic and natural datasets.

artificial intelligence, machine learning, regularization, (14 more...)

Industry: Education (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Distribution-Calibrated Hierarchical Classification

Dekel, Ofer

While many advances have already been made in hierarchical classification learning, wetake a step back and examine how a hierarchical classification problem should be formally defined. We pay particular attention to the fact that many arbitrary decisionsgo into the design of the label taxonomy that is given with the training data. Moreover, many hand-designed taxonomies are unbalanced and misrepresent the class structure in the underlying data distribution. We attempt to correct these problems by using the data distribution itself to calibrate the hierarchical classificationloss function. This distribution-based correction must be done with care, to avoid introducing unmanageable statistical dependencies into the learning problem. This leads us off the beaten path of binomial-type estimation andinto the unfamiliar waters of geometric-type estimation. In this paper, we present a new calibrated definition of statistical risk for hierarchical classification, anunbiased estimator for this risk, and a new algorithmic reduction from hierarchical classification to cost-sensitive classification.

artificial intelligence, machine learning, taxonomy, (15 more...)

Country: North America > United States (0.28)

Industry:

Leisure & Entertainment (0.47)
Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Crammer, Koby, Kulesza, Alex, Dredze, Mark

Adaptive Regularization of Weight Vectors

We present AROW, a new online learning algorithm that combines several properties of successful : large margin training, confidence weighting, and the capacity to handle non-separable data. AROW performs adaptive regularization of the prediction function upon seeing each new instance, allowing it to perform especially well in the presence of label noise. We derive a mistake bound, similar in form to the second order perceptron bound, which does not assume separability. We also relate our algorithm to recent confidence-weighted online learning techniques and empirically show that AROW achieves state-of-the-art performance and notable robustness in the case of non-separable data.

algorithm, artificial intelligence, machine learning, (18 more...)

Country: North America > United States > Pennsylvania (0.28)

Industry: Education > Educational Setting > Online (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.35)

Manifold Regularization for SIR with Rate Root-n Convergence

Bian, Wei, Tao, Dacheng

In this paper, we study the manifold regularization for the Sliced Inverse Regression (SIR). The manifold regularization improves the standard SIR in two aspects: 1) it encodes the local geometry for SIR and 2) it enables SIR to deal with transductive and semi-supervised learning problems. We prove that the proposed graph Laplacian based regularization is convergent at rate root-n. The projection directions of the regularized SIR are optimized by using a conjugate gradient method on the Grassmann manifold. Experimental results support our theory.

artificial intelligence, machine learning, regularization, (13 more...)

Country: North America > United States (0.15)

Industry: Education (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

On Learning Rotations

Arora, Raman

An algorithm is presented for online learning of rotations. The proposed algorithm involves matrix exponentiated gradient updates and is motivated by the Von Neumann divergence. The additive updates are skew-symmetric matrices with trace zero which comprise the Lie algebra of the rotation group. The orthogonality and unit determinant of the matrix parameter are preserved using matrix logarithms and exponentials and the algorithm lends itself to interesting interpretations in terms of the computational topology of the compact Lie groups. The stability and the computational complexity of the algorithm are discussed.

artificial intelligence, machine learning, matrix, (16 more...)

Country: North America > United States > Wisconsin > Dane County > Madison (0.14)

Industry: Education (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)