AITopics

1402.4862

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.36)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Teso, Stefano, Sebastiani, Roberto, Passerini, Andrea

Hybrid SRL with Optimization Modulo Theories

arXiv.org Machine LearningFeb-18-2014

Generally speaking, the goal of constructive learning could be seen as, given an example set of structured objects, to generate novel objects with similar properties. From a statistical-relational learning (SRL) viewpoint, the task can be interpreted as a constraint satisfaction problem, i.e. the generated objects must obey a set of soft constraints, whose weights are estimated from the data. Traditional SRL approaches rely on (finite) First-Order Logic (FOL) as a description language, and on MAX-SAT solvers to perform inference. Alas, FOL is unsuited for con- structive problems where the objects contain a mixture of Boolean and numerical variables. It is in fact difficult to implement, e.g. linear arithmetic constraints within the language of FOL. In this paper we propose a novel class of hybrid SRL methods that rely on Satisfiability Modulo Theories, an alternative class of for- mal languages that allow to describe, and reason over, mixed Boolean-numerical objects and constraints. The resulting methods, which we call Learning Mod- ulo Theories, are formulated within the structured output SVM framework, and employ a weighted SMT solver as an optimization oracle to perform efficient in- ference and discriminative max margin weight learning. We also present a few examples of constructive learning applications enabled by our method.

artificial intelligence, constraint-based reasoning, machine learning, (17 more...)

1402.4354

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)

Klami, Arto, Bouchard, Guillaume, Tripathi, Abhishek

Group-sparse Embeddings in Collective Matrix Factorization

arXiv.org Machine LearningFeb-18-2014

CMF is a technique for simultaneously learning low-rank representations based on a collection of matrices with shared entities. A typical example is the joint modeling of user-item, item-property, and user-feature matrices in a recommender system. The key idea in CMF is that the embeddings are shared across the matrices, which enables transferring information between them. The existing solutions, however, break down when the individual matrices have low-rank structure not shared with others. In this work we present a novel CMF solution that allows each of the matrices to have a separate low-rank structure that is independent of the other matrices, as well as structures that are shared only by a subset of them. We compare MAP and variational Bayesian solutions based on alternating optimization algorithms and show that the model automatically infers the nature of each factor using group-wise sparsity. Our approach supports in a principled way continuous, binary and count observations and is efficient for sparse matrices involving missing data. We illustrate the solution on a number of examples, focusing in particular on an interesting use-case of augmented multi-view learning.

artificial intelligence, bayesian inference, machine learning, (18 more...)

1312.5921

Country: North America > United States (0.68)

Genre: Research Report (0.50)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Journal of Artificial Intelligence ResearchFeb-17-2014

An Empirical Evaluation of Ranking Measures With Respect to Robustness to Noise

Berrar, D.

Ranking measures play an important role in model evaluation and selection. Using both synthetic and real-world data sets, we investigate how different types and levels of noise affect the area under the ROC curve (AUC), the area under the ROC convex hull, the scored AUC, the Kolmogorov-Smirnov statistic, and the H-measure. In our experiments, the AUC was, overall, the most robust among these measures, thereby reinvigorating it as a reliable metric despite its well-known deficiencies. This paper also introduces a novel ranking measure, which is remarkably robust to noise yet conceptually simple.

experiment, noise, threshold, (14 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.4136

AI Access Foundation

10864

Journal of Artificial Intelligence Research

Country:

Europe > Austria > Vienna (0.14)
North America > United States > New York (0.04)
North America > United States > California > Orange County > Irvine (0.04)
(3 more...)

Genre: Research Report > New Finding (0.88)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Data Science > Data Mining (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.94)

arXiv.org Machine LearningFeb-17-2014

Continuous Learning: Engineering Super Features With Feature Algebras

Tetelman, Michael

In this paper we consider a problem of searching a space of predictive models for a given training data set. We propose an iterative procedure for deriving a sequence of improving models and a corresponding sequence of sets of non-linear features on the original input space. After a finite number of iterations N, the non-linear features become 2^N -degree polynomials on the original space. We show that in a limit of an infinite number of iterations derived non-linear features must form an associative algebra: a product of two features is equal to a linear combination of features from the same feature space for any given input point. Because each iteration consists of solving a series of convex problems that contain all previous solutions, the likelihood of the models in the sequence is increasing with each iteration while the dimension of the model parameter space is set to a limited controlled value.

artificial intelligence, iteration, machine learning, (17 more...)

1312.5398

Country: North America > United States (0.14)

Genre: Research Report (0.65)

Industry:

Education > Educational Setting > Continuing Education (0.41)
Education > Curriculum > Subject-Specific Education (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.30)

Aravkin, Aleksandr Y., Choromanska, Anna, Jebara, Tony, Kanevsky, Dimitri

Semistochastic Quadratic Bound Methods

arXiv.org Machine LearningFeb-17-2014

Partition functions arise in a variety of settings, including conditional random fields, logistic regression, and latent gaussian models. In this paper, we consider semistochastic quadratic bound (SQB) methods for maximum likelihood estimation based on partition function optimization. Batch methods based on the quadratic bound were recently proposed for this class of problems, and performed favorably in comparison to state-of-the-art techniques. Semistochastic methods fall in between batch algorithms, which use all the data, and stochastic gradient type methods, which use small random selections at each iteration. We build semistochastic quadratic bound-based methods, and prove both global convergence (to a stationary point) under very weak assumptions, and linear convergence rate under stronger assumptions on the objective. To make the proposed methods faster and more stable, we consider inexact subproblem minimization and batch-size selection schemes. The efficacy of SQB methods is demonstrated via comparison with several state-of-the-art techniques on commonly used datasets.

artificial intelligence, iteration, machine learning, (14 more...)

1309.1369

Country: Europe (0.28)

Genre:

Research Report > Promising Solution (0.55)
Research Report > New Finding (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.37)

Reverdy, Paul, Srivastava, Vaibhav, Leonard, Naomi E.

Modeling Human Decision-making in Generalized Gaussian Multi-armed Bandits

We present a formal model of human decision-making in explore-exploit tasks using the context of multi-armed bandit problems, where the decision-maker must choose among multiple options with uncertain rewards. We address the standard multi-armed bandit problem, the multi-armed bandit problem with transition costs, and the multi-armed bandit problem on graphs. We focus on the case of Gaussian rewards in a setting where the decision-maker uses Bayesian inference to estimate the reward values. We model the decision-maker's prior knowledge with the Bayesian prior on the mean reward. We develop the upper credible limit (UCL) algorithm for the standard multi-armed bandit problem and show that this deterministic algorithm achieves logarithmic cumulative expected regret, which is optimal performance for uninformative priors. We show how good priors and good assumptions on the correlation structure among arms can greatly enhance decision-making performance, even over short time horizons. We extend to the stochastic UCL algorithm and draw several connections to human decision-making behavior. We present empirical data from human experiments and show that human performance is efficiently captured by the stochastic UCL algorithm with appropriate parameters. For the multi-armed bandit problem with transition costs and the multi-armed bandit problem on graphs, we generalize the UCL algorithm to the block UCL algorithm and the graphical block UCL algorithm, respectively. We show that these algorithms also achieve logarithmic cumulative expected regret and require a sub-logarithmic expected number of transitions among arms. We further illustrate the performance of these algorithms with numerical examples.

algorithm, bayesian inference, upstream oil & gas, (20 more...)

1307.6134

Country:

North America > United States > Wisconsin (0.14)
Europe > United Kingdom > Scotland (0.14)
Europe > Spain (0.14)
(3 more...)

Genre:

Research Report > Experimental Study (0.92)
Research Report > New Finding (0.67)

Industry:

Health & Medicine (1.00)
Energy > Oil & Gas > Upstream (0.46)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.45)

Brümmer, Niko, Garcia-Romero, Daniel

Generative Modelling for Unsupervised Score Calibration

ABSTRACT Score calibration enables automatic speaker recognizers to make cost-effective accept / reject decisions. Traditional calibration requires supervised data, which is an expensive resource. We propose a 2-component GMM for unsupervised calibration and demonstrate good performance relative to a supervised baseline on NIST SRE'10 and SRE'12. A Bayesian analysis demonstrates that the uncertainty associated with the unsupervised calibration parameter estimates is surprisingly small. Index Terms-- calibration, unsupervised learning, Laplace approximation, automatic speaker recognition 1. INTRODUCTION Automatic speaker recognizers map trials to scores.

calibration, machine learning, pattern recognition, (18 more...)

1311.0707

Country:

North America > United States (0.46)
Europe (0.46)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.36)

Berglund, Mathias, Raiko, Tapani

Stochastic Gradient Estimate Variance in Contrastive Divergence and Persistent Contrastive Divergence

Contrastive Divergence (CD) and Persistent Contrastive Divergence (PCD) are popular methods for training the weights of Restricted Boltzmann Machines. However, both methods use an approximate method for sampling from the model distribution. As a side effect, these approximations yield significantly different biases and variances for stochastic gradient estimates of individual data points. It is well known that CD yields a biased gradient estimate. In this paper we however show empirically that CD has a lower stochastic gradient estimate variance than exact sampling, while the mean of subsequent PCD estimates has a higher variance than exact sampling. The results give one explanation to the finding that CD can be used with smaller minibatches or higher learning rates than PCD.

artificial intelligence, machine learning, variance, (15 more...)

1312.6002

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.52)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.39)

The Law of Total Odds

Tasche, Dirk

The law of total probability may be deployed in binary classification exercises to estimate the unconditional class probabilities if the class proportions in the training set are not representative of the population class proportions. We argue that this is not a conceptually sound approach and suggest an alternative based on the new law of total odds. We quantify the bias of the total probability estimator of the unconditional class probabilities and show that the total odds estimator is unbiased. The sample version of the total odds estimator is shown to coincide with a maximum-likelihood estimator known from the literature. The law of total odds can also be used for transforming the conditional class probabilities if independent estimates of the unconditional class probabilities of the population are available. Keywords: Total probability, likelihood ratio, Bayes' formula, binary classification, relative odds, unbiased estimator, supervised learning, dataset shift.

artificial intelligence, bayesian inference, machine learning, (19 more...)

1312.0365

Country: Europe > United Kingdom (0.14)

Genre: Research Report (0.50)

Industry: Government (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.50)