AITopics

1509.09259

Country: Europe > Switzerland (0.15)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Erdogdu, Murat A., Montanari, Andrea

Convergence rates of sub-sampled Newton methods

arXiv.org Machine LearningDec-1-2015

We consider the problem of minimizing a sum of $n$ functions over a convex parameter set $\mathcal{C} \subset \mathbb{R}^p$ where $n\gg p\gg 1$. In this regime, algorithms which utilize sub-sampling techniques are known to be effective. In this paper, we use sub-sampling techniques together with low-rank approximation to design a new randomized batch algorithm which possesses comparable convergence rate to Newton's method, yet has much smaller per-iteration cost. The proposed algorithm is robust in terms of starting point and step size, and enjoys a composite convergence rate, namely, quadratic convergence at start and linear convergence when the iterate is close to the minimizer. We develop its theoretical analysis which also allows us to select near-optimal algorithm parameters. Our theoretical results can be used to obtain convergence rates of previously proposed sub-sampling based algorithms as well. We demonstrate how our results apply to well-known machine learning problems. Lastly, we evaluate the performance of our algorithm on several datasets under various scenarios.

algorithm, artificial intelligence, machine learning, (17 more...)

1508.0281

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.48)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.47)

Xu, Yangyang, Akrotirianakis, Ioannis, Chakraborty, Amit

Proximal gradient method for huberized support vector machine

arXiv.org Machine LearningNov-30-2015

The Support Vector Machine (SVM) has been used in a wide variety of classification problems. The original SVM uses the hinge loss function, which is non-differentiable and makes the problem difficult to solve in particular for regularized SVMs, such as with $\ell_1$-regularization. This paper considers the Huberized SVM (HSVM), which uses a differentiable approximation of the hinge loss function. We first explore the use of the Proximal Gradient (PG) method to solving binary-class HSVM (B-HSVM) and then generalize it to multi-class HSVM (M-HSVM). Under strong convexity assumptions, we show that our algorithm converges linearly. In addition, we give a finite convergence result about the support of the solution, based on which we further accelerate the algorithm by a two-stage method. We present extensive numerical experiments on both synthetic and real datasets which demonstrate the superiority of our methods over some state-of-the-art methods for both binary- and multi-class SVMs.

algorithm, dataset, prediction accuracy, (13 more...)

doi: 10.1007/s10044-015-0485-z

1511.09159

Country:

North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > United States > Texas > Harris County > Houston (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(2 more...)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.69)

Industry: Health & Medicine > Therapeutic Area > Oncology > Leukemia (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Hu, Tao, Pehlevan, Cengiz, Chklovskii, Dmitri B.

A Hebbian/Anti-Hebbian Network for Online Sparse Dictionary Learning Derived from Symmetric Matrix Factorization

arXiv.org Machine LearningNov-30-2015

Olshausen and Field (OF) proposed that neural computations in the primary visual cortex (V1) can be partially modeled by sparse dictionary learning. By minimizing the regularized representation error they derived an online algorithm, which learns Gabor-filter receptive fields from a natural image ensemble in agreement with physiological experiments. Whereas the OF algorithm can be mapped onto the dynamics and synaptic plasticity in a single-layer neural network, the derived learning rule is nonlocal - the synaptic weight update depends on the activity of neurons other than just pre- and postsynaptic ones - and hence biologically implausible. Here, to overcome this problem, we derive sparse dictionary learning from a novel cost-function - a regularized error of the symmetric factorization of the input's similarity matrix. Our algorithm maps onto a neural network of the same architecture as OF but using only biologically plausible local learning rules. When trained on natural images our network learns Gabor-filter receptive fields and reproduces the correlation among synaptic weights hard-wired in the OF network. Therefore, online symmetric matrix factorization may serve as an algorithmic theory of neural computation.

algorithm, artificial intelligence, machine learning, (14 more...)

doi: 10.1109/ACSSC.2014.7094519

1503.0069

Country: North America > United States (0.93)

Genre: Research Report (0.40)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.56)

Xu, Yangyang, Akrotirianakis, Ioannis, Chakraborty, Amit

Alternating direction method of multipliers for regularized multiclass support vector machines

The support vector machine (SVM) was originally designed for binary classifications. A lot of effort has been put to generalize the binary SVM to multiclass SVM (MSVM) which are more complex problems. Initially, MSVMs were solved by considering their dual formulations which are quadratic programs and can be solved by standard second-order methods. However, the duals of MSVMs with regularizers are usually more difficult to formulate and computationally very expensive to solve. This paper focuses on several regularized MSVMs and extends the alternating direction method of multiplier (ADMM) to these MSVMs. Using a splitting technique, all considered MSVMs are written as two-block convex programs, for which the ADMM has global convergence guarantees. Numerical experiments on synthetic and real data demonstrate the high efficiency and accuracy of our algorithms.

admm, artificial intelligence, machine learning, (14 more...)

1511.09153

Genre: Research Report (0.40)

Industry: Health & Medicine > Therapeutic Area > Oncology > Leukemia (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Ren, Mengye, Kiros, Ryan, Zemel, Richard

Exploring Models and Data for Image Question Answering

arXiv.org Artificial IntelligenceNov-29-2015

This work aims to address the problem of image-based question-answering (QA) with new models and datasets. In our work, we propose to use neural networks and visual semantic embeddings, without intermediate stages such as object detection and image segmentation, to predict answers to simple questions about images. Our model performs 1.8 times better than the only published results on an existing image QA dataset. We also present a question generation algorithm that converts image descriptions, which are widely available, into QA form. We used this algorithm to produce an order-of-magnitude larger dataset, with more evenly distributed answers. A suite of baseline results on this new dataset are also presented.

machine learning, natural language, question answering, (18 more...)

arXiv.org Artificial Intelligence

1505.02074

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
(2 more...)

Weston, Jason, Chopra, Sumit, Bordes, Antoine

Memory Networks

arXiv.org Artificial IntelligenceNov-29-2015

We describe a new class of learning models called memory networks. Memory networks reason with inference components combined with a long-term memory component; they learn how to use these jointly. The long-term memory can be read and written to, with the goal of using it for prediction. We investigate these models in the context of question answering (QA) where the long-term memory effectively acts as a (dynamic) knowledge base, and the output is a textual response. We evaluate them on a large-scale QA task, and a smaller, but more complex, toy task generated from a simulated world. In the latter, we show the reasoning power of such models by chaining multiple supporting sentences to answer questions that require understanding the intension of verbs.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

1410.3916

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.32)

Stalidis, Panagiotis, Giatsoglou, Maria, Diamantaras, Konstantinos, Sarigiannidis, George, Chatzisavvas, Konstantinos Ch.

Machine Learning Sentiment Prediction based on Hybrid Document Representation

Automated sentiment analysis and opinion mining is a complex process concerning the extraction of useful subjective information from text. The explosion of user generated content on the Web, especially the fact that millions of users, on a daily basis, express their opinions on products and services to blogs, wikis, social networks, message boards, etc., render the reliable, automated export of sentiments and opinions from unstructured text crucial for several commercial applications. In this paper, we present a novel hybrid vectorization approach for textual resources that combines a weighted variant of the popular Word2Vec representation (based on Term Frequency-Inverse Document Frequency) representation and with a Bag- of-Words representation and a vector of lexicon-based sentiment values. The proposed text representation approach is assessed through the application of several machine learning classification algorithms on a dataset that is used extensively in literature for sentiment detection. The classification accuracy derived through the proposed hybrid vectorization approach is higher than when its individual components are used for text represenation, and comparable with state-of-the-art sentiment detection methodologies.

artificial intelligence, machine learning, natural language, (21 more...)

1511.09107

Country:

North America (0.28)
Europe > Greece (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology (0.54)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
(5 more...)

Auvolat, Alex, Chandar, Sarath, Vincent, Pascal, Larochelle, Hugo, Bengio, Yoshua

Clustering is Efficient for Approximate Maximum Inner Product Search

Efficient Maximum Inner Product Search (MIPS) is an important task that has a wide applicability in recommendation systems and classification with a large number of classes. Solutions based on locality-sensitive hashing (LSH) as well as tree-based solutions have been investigated in the recent literature, to perform approximate MIPS in sublinear time. In this paper, we compare these to another extremely simple approach for solving approximate MIPS, based on variants of the k-means clustering algorithm. Specifically, we propose to train a spherical k-means, after having reduced the MIPS problem to a Maximum Cosine Similarity Search (MCSS). Experiments on two standard recommendation system benchmarks as well as on large vocabulary word embeddings, show that this simple approach yields much higher speedups, for the same retrieval precision, than current state-of-the-art hashing-based and tree-based methods. This simple method also yields more robust retrievals when the query is corrupted by noise.

artificial intelligence, machine learning, natural language, (18 more...)

1507.0591

Country: North America > Canada > Quebec (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.87)

Zhang, Yuchen, Wainwright, Martin J., Jordan, Michael I.

Optimal prediction for sparse linear models? Lower bounds for coordinate-separable M-estimators

For the problem of high-dimensional sparse linear regression, it is known that an $\ell_0$-based estimator can achieve a $1/n$ "fast" rate on the prediction error without any conditions on the design matrix, whereas in absence of restrictive conditions on the design matrix, popular polynomial-time methods only guarantee the $1/\sqrt{n}$ "slow" rate. In this paper, we show that the slow rate is intrinsic to a broad class of M-estimators. In particular, for estimators based on minimizing a least-squares cost function together with a (possibly non-convex) coordinate-wise separable regularizer, there is always a "bad" local optimum such that the associated prediction error is lower bounded by a constant multiple of $1/\sqrt{n}$. For convex regularizers, this lower bound applies to all global optima. The theory is applicable to many popular estimators, including convex $\ell_1$-based methods as well as M-estimators based on nonconvex regularizers, including the SCAD penalty or the MCP regularizer. In addition, for a broad class of nonconvex regularizers, we show that the bad local optima are very common, in that a broad class of local minimization algorithms with random initialization will typically converge to a bad solution.

artificial intelligence, estimator, machine learning, (17 more...)

1503.03188

Country: North America > United States (0.68)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)