AITopics

1410.3595

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.72)

İrsoy, Ozan, Alpaydın, Ethem

Autoencoder Trees

arXiv.org Machine LearningSep-25-2014

We discuss an autoencoder model in which the encoding and decoding functions are implemented by decision trees. We use the soft decision tree where internal nodes realize soft multivariate splits given by a gating function and the overall output is the average of all leaves weighted by the gating values on their path. The encoder tree takes the input and generates a lower dimensional representation in the leaves and the decoder tree takes this and reconstructs the original input. Exploiting the continuity of the trees, autoencoder trees are trained with stochastic gradient descent. On handwritten digit and news data, we see that the autoencoder trees yield good reconstruction error compared to traditional autoencoder percep-trons. We also see that the autoencoder tree captures hierarchical representations at different granularities of the data on its different levels and the leaves capture the localities in the input space.

artificial intelligence, machine learning, representation, (15 more...)

1409.7461

Country: Asia > Middle East (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.71)

Mokhtari, Aryan, Ribeiro, Alejandro

Global Convergence of Online Limited Memory BFGS

arXiv.org Machine LearningSep-6-2014

Global convergence of an online (stochastic) limited memory version of the Broyden-Fletcher- Goldfarb-Shanno (BFGS) quasi-Newton method for solving optimization problems with stochastic objectives that arise in large scale machine learning is established. Lower and upper bounds on the Hessian eigenvalues of the sample functions are shown to suffice to guarantee that the curvature approximation matrices have bounded determinants and traces, which, in turn, permits establishing convergence to optimal arguments with probability 1. Numerical experiments on support vector machines with synthetic data showcase reductions in convergence time relative to stochastic gradient descent algorithms as well as reductions in storage and computation relative to other online quasi-Newton methods. Experimental evaluation on a search engine advertising problem corroborates that these advantages also manifest in practical applications.

artificial intelligence, feature vector, machine learning, (15 more...)

1409.2045

Country:

North America > United States > Pennsylvania (0.27)
North America > United States > Michigan (0.27)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.54)

Duchi, John C., Jordan, Michael I., Wainwright, Martin J., Wibisono, Andre

Optimal rates for zero-order convex optimization: the power of two function evaluations

arXiv.org Machine LearningAug-20-2014

We consider derivative-free algorithms for stochastic and non-stochastic convex optimization problems that use only function values rather than gradients. Focusing on non-asymptotic bounds on convergence rates, we show that if pairs of function values are available, algorithms for $d$-dimensional optimization that use gradient estimates based on random perturbations suffer a factor of at most $\sqrt{d}$ in convergence rate over traditional stochastic gradient methods. We establish such results for both smooth and non-smooth cases, sharpening previous analyses that suggested a worse dimension dependence, and extend our results to the case of multiple ($m \ge 2$) evaluations. We complement our algorithmic development with information-theoretic lower bounds on the minimax convergence rate of such problems, establishing the sharpness of our achievable results up to constant (sometimes logarithmic) factors.

artificial intelligence, inequality, machine learning, (16 more...)

1312.2139

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

AAAI ConferencesJul-22-2014

ProPPR: Efficient First-Order Probabilistic Logic Programming for Structure Discovery, Parameter Learning, and Scalable Inference

Wang, William Yang (Carnegie Mellon University) | Mazaitis, Kathryn (Carnegie Mellon University) | Cohen, William W (Carnegie Mellon University)

A key challenge in statistical relational learning is to develop a semantically rich formalism that supports efficient probabilistic reasoning using large collections of extracted information. This paper presents a new, scalable probabilistic logic called ProPPR, which further extends stochastic logic programs (SLP) to a framework that enables efficient learning and inference on graphs: using an abductive second-order probabilistic logic, we show that first-order theories can be automatically generated via parameter learning; that in parameter learning, weight learning can be performed using parallel stochastic gradient descent with a supervised personalized PageRank algorithm; and that most importantly, queries can be approximately grounded with a small graph, and inference is independent of the size of the database.

efficient first-order probabilistic logic programming, parameter learning, structure discovery, (2 more...)

Workshops at the Twenty-Eighth AAAI Conference on Artificial Intelligence

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.53)

Zhong, Wenliang (Hong Kong University of Science and Technology) | Kwok, James T. (Hong Kong University of Science and Technology)

Gradient Descent with Proximal Average for Nonconvex and Composite Regularization

Sparse modeling has been highly successful in many real-world applications. While a lot of interests have been on convex regularization, recent studies show that nonconvexregularizers can outperform their convex counterparts in many situations.However, the resulting nonconvex optimization problems are often challenging, especiallyfor composite regularizers such as the nonconvex overlapping group lasso. In thispaper, byusing a recent mathematical tool known as the proximal average,we propose a novel proximal gradient descent method for optimization with a wide class of nonconvex and composite regularizers.Instead of directlysolving the proximal stepassociated with a composite regularizer, we average thesolutions from the proximal problems of the constituent regularizers. This simple strategy has guaranteed convergenceand low per-iteration complexity.Experimental results on a number of synthetic andreal-world data sets demonstrate the effectiveness and efficiency of theproposed optimization algorithm, and also the improved classification performanceresulting from thenonconvex regularizers.

artificial intelligence, machine learning, regularizer, (14 more...)

Twenty-Eighth AAAI Conference on Artificial Intelligence

Country:

Asia > China > Hong Kong (0.04)
North America > United States > Arizona (0.04)

Genre:

Research Report > Experimental Study (0.69)
Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.61)

Learning Relative Similarity by Stochastic Dual Coordinate Ascent

Wu, Pengcheng (Nanyang Technological University) | Ding, Yi (Nanyang Technological University) | Zhao, Peilin (Rutgers University) | Miao, Chunyan (Nanyang Technological University) | Hoi, Steven C. H. (Nanyang Technological University)

Learning relative similarity from pairwise instances is an important problem in machine learning and has a wide range of applications. Despite being studied for years, some existing methods solved by Stochastic Gradient Descent (SGD) techniques generally suffer from slow convergence. In this paper, we investigate the application of Stochastic Dual Coordinate Ascent (SDCA) technique to tackle the optimization task of relative similarity learning by extending from vector to matrix parameters. Theoretically, we prove the optimal linear convergence rate for the proposed SDCA algorithm, beating the well-known sublinear convergence rate by the previous best metric learning algorithms. Empirically, we conduct extensive experiments on both standard and large-scale data sets to validate the effectiveness of the proposed algorithm for retrieval tasks.

algorithm, artificial intelligence, machine learning, (14 more...)

Twenty-Eighth AAAI Conference on Artificial Intelligence

Country:

Asia > Singapore (0.14)
North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Fast Multi-Instance Multi-Label Learning

Huang, Sheng-Jun (Nanjing University) | Gao, Wei (Nanjing University) | Zhou, Zhi-Hua (Nanjing University)

In multi-instance multi-label learning (MIML), one object is represented by multiple instances and simultaneously associated with multiple labels. Existing MIML approaches have been found useful in many applications; however, most of them can only handle moderate-sized data. To efficiently handle large data sets, we propose the MIMLfast approach, which first constructs a low-dimensional subspace shared by all labels, and then trains label specific linear models to optimize approximated ranking loss via stochastic gradient descent. Although the MIML problem is complicated, MIMLfast is able to achieve excellent performance by exploiting label relations with shared space and discovering sub-concepts for complicated labels. Experiments show that the performance of MIMLfast is highly competitive to state-of-the-art techniques, whereas its time cost is much less; particularly, on a data set with 30K bags and 270K instances, where none of existing approaches can return results in 24 hours, MIMLfast takes only 12 minutes. Moreover, our approach is able to identify the most representative instance for each label, and thus providing a chance to understand the relation between input patterns and output semantics.

artificial intelligence, machine learning, mimlfast, (16 more...)

Twenty-Eighth AAAI Conference on Artificial Intelligence

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Dabney, William (University of Massachusetts Amherst) | Thomas, Philip (University of Massachusetts Amherst)

Natural Temporal Difference Learning

In this paper we investigate the application of natural gradient descent to Bellman error based reinforcement learning algorithms. This combination is interesting because natural gradient descent is invariant to the parameterization of the value function. This invariance property means that natural gradient descent adapts its update directions to correct for poorly conditioned representations. We present and analyze quadratic and linear time natural temporal difference learning algorithms, and prove that they are covariant. We conclude with experiments which suggest that the natural algorithms can match or outperform their non-natural counterparts using linear function approximation, and drastically improve upon their non-natural counterparts when using non-linear function approximation.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Twenty-Eighth AAAI Conference on Artificial Intelligence

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.90)

arXiv.org Machine LearningJun-15-2014

Simultaneous Model Selection and Optimization through Parameter-free Stochastic Learning

Orabona, Francesco

Stochastic gradient descent algorithms for training linear and kernel predictors are gaining more and more importance, thanks to their scalability. While various methods have been proposed to speed up their convergence, the model selection phase is often ignored. In fact, in theoretical works most of the time assumptions are made, for example, on the prior knowledge of the norm of the optimal solution, while in the practical world validation methods remain the only viable approach. In this paper, we propose a new kernel-based stochastic gradient descent algorithm that performs model selection while training, with no parameters to tune, nor any form of cross-validation. The algorithm builds on recent advancement in online learning theory for unconstrained settings, to estimate over time the right regularization in a data-dependent way. Optimal rates of convergence are proved under standard smoothness assumptions on the target function, using the range space of the fractional integral operator associated with the kernel.

algorithm, artificial intelligence, machine learning, (19 more...)

1406.3816

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Industry: Education > Educational Setting > Online (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.95)