AITopics | Statistical Learning

Collaborating Authors

Statistical Learning

News Overviews Instructional Materials AI-Alerts Classics

Super-Bit Locality-Sensitive Hashing

Ji, Jianqiu, Li, Jianmin, Yan, Shuicheng, Zhang, Bo, Tian, Qi

Neural Information Processing SystemsDec-31-2012

Sign-random-projection locality-sensitive hashing (SRP-LSH) is a probabilistic dimension reduction method which provides an unbiased estimate of angular similarity, yet suffers from the large variance of its estimation. In this work, we propose the Super-Bit locality-sensitive hashing (SBLSH). It is easy to implement, which orthogonalizes the random projection vectors in batches, and it is theoretically guaranteed that SBLSH also provides an unbiased estimate of angular similarity, yet with a smaller variance when the angle to estimate is within $(0,\pi/2]$. The extensive experiments on real data well validate that given the same length of binary code, SBLSH may achieve significant mean squared error reduction in estimating pairwise angular similarity. Moreover, SBLSH shows the superiority over SRP-LSH in approximate nearest neighbor (ANN) retrieval experiments.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.14)
Asia > China (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.34)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.30)

Add feedback

Nonparametric Max-Margin Matrix Factorization for Collaborative Prediction

Xu, Minjie, Zhu, Jun, Zhang, Bo

Neural Information Processing SystemsDec-31-2012

We present a probabilistic formulation of max-margin matrix factorization and build accordingly a nonparametric Bayesian model which automatically resolves the unknown number of latent factors. Our work demonstrates a successful example thatintegrates Bayesian nonparametrics and max-margin learning, which are conventionally two separate paradigms and enjoy complementary advantages. We develop an efficient variational algorithm for posterior inference, and our extensive empiricalstudies on large-scale MovieLens and EachMovie data sets appear to justify the aforementioned dual advantages.

artificial intelligence, machine learning, matrix factorization, (14 more...)

Neural Information Processing Systems

Country: Asia > China (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Add feedback

Learning from Distributions via Support Measure Machines

Muandet, Krikamol, Fukumizu, Kenji, Dinuzzo, Francesco, Schölkopf, Bernhard

Neural Information Processing SystemsDec-31-2012

This paper presents a kernel-based discriminative learning framework on probability measures. Rather than relying on large collections of vectorial training examples, our framework learns using a collection of probability distributions that have been constructed to meaningfully represent training data. By representing these probability distributions as mean embeddings in the reproducing kernel Hilbert space (RKHS), we are able to apply many standard kernel-based learning techniques in straightforward fashion. To accomplish this, we construct a generalization of the support vector machine (SVM) called a support measure machine (SMM). Our analyses of SMMs provides several insights into their relationship to traditional SVMs. Based on such insights, we propose a flexible SVM (Flex-SVM) that places different kernel functions on each training example. Experimental results on both synthetic and real-world data demonstrate the effectiveness of our proposed framework.

artificial intelligence, inductive learning, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.69)

Add feedback

Bethe Bounds and Approximating the Global Optimum

Weller, Adrian, Jebara, Tony

arXiv.org Machine LearningDec-31-2012

Inference in general Markov random fields (MRFs) is NP-hard, though identifying the maximum a posteriori (MAP) configuration of pairwise MRFs with submodular cost functions is efficiently solvable using graph cuts. Marginal inference, however, even for this restricted class, is in #P. We prove new formulations of derivatives of the Bethe free energy, provide bounds on the derivatives and bracket the locations of stationary points, introducing a new technique called Bethe bound propagation. Several results apply to pairwise models whether associative or not. Applying these to discretized pseudo-marginals in the associative case we present a polynomial time approximation scheme for global optimization provided the maximum degree is $O(\log n)$, and discuss several extensions.

artificial intelligence, bayesian inference, machine learning, (13 more...)

arXiv.org Machine Learning

1301.0015

Genre: Research Report (0.64)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)

Add feedback

Autonomously Learning to Visually Detect Where Manipulation Will Succeed

Nguyen, Hai, Kemp, Charles C.

arXiv.org Artificial IntelligenceDec-31-2012

Visual features can help predict if a manipulation behavior will succeed at a given location. For example, the success of a behavior that flips light switches depends on the location of the switch. Within this paper, we present methods that enable a mobile manipulator to autonomously learn a function that takes an RGB image and a registered 3D point cloud as input and returns a 3D location at which a manipulation behavior is likely to succeed. Given a pair of manipulation behaviors that can change the state of the world between two sets (e.g., light switch up and light switch down), classifiers that detect when each behavior has been successful, and an initial hint as to where one of the behaviors will be successful, the robot autonomously trains a pair of support vector machine (SVM) classifiers by trying out the behaviors at locations in the world and observing the results. When an image feature vector associated with a 3D location is provided as input to one of the SVMs, the SVM predicts if the associated manipulation behavior will be successful at the 3D location. To evaluate our approach, we performed experiments with a PR2 robot from Willow Garage in a simulated home using behaviors that flip a light switch, push a rocker-type light switch, and operate a drawer. By using active learning, the robot efficiently learned SVMs that enabled it to consistently succeed at these tasks. After training, the robot also continued to learn in order to adapt in the event of failure.

artificial intelligence, machine learning, robot, (14 more...)

arXiv.org Artificial Intelligence

1212.6837

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.89)

Add feedback

Focus of Attention for Linear Predictors

Pelossof, Raphael, Ying, Zhiliang

arXiv.org Artificial IntelligenceDec-29-2012

We present a method to stop the evaluation of a prediction process when the result of the full evaluation is obvious. This trait is highly desirable in prediction tasks where a predictor evaluates all its features for every example in large datasets. We observe that some examples are easier to classify than others, a phenomenon which is characterized by the event when most of the features agree on the class of an example. By stopping the feature evaluation when encountering an easy- to-classify example, the predictor can achieve substantial gains in computation. Our method provides a natural attention mechanism for linear predictors where the predictor concentrates most of its computation on hard-to-classify examples and quickly discards easy-to-classify ones. By modifying a linear prediction algorithm such as an SVM or AdaBoost to include our attentive method we prove that the average number of features computed is O(sqrt(n log 1/sqrt(delta))) where n is the original number of features, and delta is the error rate incurred due to early stopping. We demonstrate the effectiveness of Attentive Prediction on MNIST, Real-sim, Gisette, and synthetic datasets.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

1212.6659

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)

Add feedback

Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes

Shamir, Ohad, Zhang, Tong

arXiv.org Machine LearningDec-28-2012

Stochastic Gradient Descent (SGD) is one of the simplest and most popular stochastic optimization methods. While it has already been theoretically studied for decades, the classical analysis usually required non-trivial smoothness assumptions, which do not apply to many modern applications of SGD with non-smooth objective functions such as support vector machines. In this paper, we investigate the performance of SGD without such smoothness assumptions, as well as a running average scheme to convert the SGD iterates to a solution with optimal optimization accuracy. In this framework, we prove that after T rounds, the suboptimality of the last SGD iterate scales as O(log(T)/\sqrt{T}) for non-smooth convex objective functions, and O(log(T)/T) in the non-smooth strongly convex case. To the best of our knowledge, these are the first bounds of this kind, and almost match the minimax-optimal rates obtainable by appropriate averaging schemes. We also propose a new and simple averaging scheme, which not only attains optimal rates, but can also be easily computed on-the-fly (in contrast, the suffix averaging scheme proposed in Rakhlin et al. (2011) is not as simple to implement). Finally, we provide some experimental illustrations.

artificial intelligence, iterate, machine learning, (16 more...)

arXiv.org Machine Learning

1212.1824

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Discovering Basic Emotion Sets via Semantic Clustering on a Twitter Corpus

Bann, Eugene Yuta

arXiv.org Artificial IntelligenceDec-28-2012

A plethora of words are used to describe the spectrum of human emotions, but how many emotions are there really, and how do they interact? Over the past few decades, several theories of emotion have been proposed, each based around the existence of a set of 'basic emotions', and each supported by an extensive variety of research including studies in facial expression, ethology, neurology and physiology. Here we present research based on a theory that people transmit their understanding of emotions through the language they use surrounding emotion keywords. Using a labelled corpus of over 21,000 tweets, six of the basic emotion sets proposed in existing literature were analysed using Latent Semantic Clustering (LSC), evaluating the distinctiveness of the semantic meaning attached to the emotional label. We hypothesise that the more distinct the language is used to express a certain emotion, then the more distinct the perception (including proprioception) of that emotion is, and thus more 'basic'. This allows us to select the dimensions best representing the entire spectrum of emotion. We find that Ekman's set, arguably the most frequently used for classifying emotions, is in fact the most semantically distinct overall. Next, taking all analysed (that is, previously proposed) emotion terms into account, we determine the optimal semantically irreducible basic emotion set using an iterative LSC algorithm. Our newly-derived set (Accepting, Ashamed, Contempt, Interested, Joyful, Pleased, Sleepy, Stressed) generates a 6.1% increase in distinctiveness over Ekman's set (Angry, Disgusted, Joyful, Sad, Scared). We also demonstrate how using LSC data can help visualise emotions. We introduce the concept of an Emotion Profile and briefly analyse compound emotions both visually and mathematically.

data mining, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

1212.6527

Country:

Europe (0.92)
Asia (0.92)
North America > United States > California (0.45)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Information Technology > Services (0.92)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(5 more...)

Add feedback

Learning to Predict from Textual Data

Radinsky, K., Davidovich, S., Markovitch, S.

Journal of Artificial Intelligence ResearchDec-26-2012

Given a current news event, we tackle the problem of generating plausible predictions of future events it might cause. We present a new methodology for modeling and predicting such future news events using machine learning and data mining techniques. Our Pundit algorithm generalizes examples of causality pairs to infer a causality predictor. To obtain precisely labeled causality examples, we mine 150 years of news articles and apply semantic natural language modeling techniques to headlines containing certain predefined causality patterns. For generalization, the model uses a vast number of world knowledge ontologies. Empirical evaluation on real news articles shows that our Pundit algorithm performs as well as non-expert humans.

algorithm, prediction, proceedings, (15 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.3865

AI Access Foundation

10792

Journal of Artificial Intelligence Research

Country:

Europe > Germany (0.14)
Asia > Middle East > Iraq > Baghdad Governorate > Baghdad (0.04)
North America > United States > Texas (0.04)
(31 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Government > Military (0.68)
Government > Regional Government > North America Government > United States Government (0.68)
(3 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
(8 more...)

Add feedback

Gaussian Process Regression with Heteroscedastic or Non-Gaussian Residuals

Wang, Chunyi, Neal, Radford M.

arXiv.org Machine LearningDec-26-2012

Gaussian Process (GP) regression models typically assume that residuals are Gaussian and have the same variance for all observations. However, applications with input-dependent noise (heteroscedastic residuals) frequently arise in practice, as do applications in which the residuals do not have a Gaussian distribution. In this paper, we propose a GP Regression model with a latent variable that serves as an additional unobserved covariate for the regression. This model (which we call GPLC) allows for heteroscedasticity since it allows the function to have a changing partial derivative with respect to this unobserved covariate. With a suitable covariance function, our GPLC model can handle (a) Gaussian residuals with input-dependent variance, or (b) non-Gaussian residuals with input-dependent variance, or (c) Gaussian residuals with constant variance. We compare our model, using synthetic datasets, with a model proposed by Goldberg, Williams and Bishop (1998), which we refer to as GPLV, which only deals with case (a), as well as a standard GP model which can handle only case (c). Markov Chain Monte Carlo methods are developed for both modelsl. Experiments show that when the data is heteroscedastic, both GPLC and GPLV give better results (smaller mean squared error and negative log-probability density) than standard GP regression. In addition, when the residual are Gaussian, our GPLC model is generally nearly as good as GPLV, while when the residuals are non-Gaussian, our GPLC model is better than GPLV.

artificial intelligence, hyperparameter, machine learning, (17 more...)

arXiv.org Machine Learning

1212.6246

Country: North America > Canada > Ontario > Toronto (0.15)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.55)

Add feedback