AITopics

1409.5391

Country: North America > United States > Washington > King County > Seattle (0.24)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.86)

Dahlin, Johan, Lindsten, Fredrik, Schön, Thomas B.

Particle Metropolis-Hastings using gradient and Hessian information

arXiv.org Machine LearningSep-18-2014

Particle Metropolis-Hastings (PMH) allows for Bayesian parameter inference in nonlinear state space models by combining Markov chain Monte Carlo (MCMC) and particle filtering. The latter is used to estimate the intractable likelihood. In its original formulation, PMH makes use of a marginal MCMC proposal for the parameters, typically a Gaussian random walk. However, this can lead to a poor exploration of the parameter space and an inefficient use of the generated particles. We propose a number of alternative versions of PMH that incorporate gradient and Hessian information about the posterior into the proposal. This information is more or less obtained as a byproduct of the likelihood estimation. Indeed, we show how to estimate the required information using a fixed-lag particle smoother, with a computational cost growing linearly in the number of particles. We conclude that the proposed methods can: (i) decrease the length of the burn-in phase, (ii) increase the mixing of the Markov chain at the stationary phase, and (iii) make the proposal distribution scale invariant which simplifies tuning.

algorithm, artificial intelligence, machine learning, (18 more...)

doi: 10.1007/s11222-014-9510-0

1311.0686

Country:

Europe > Sweden (0.28)
North America > United States (0.28)
Europe > United Kingdom > England (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Bloodgood, Michael, Vijay-Shanker, K.

A Method for Stopping Active Learning Based on Stabilizing Predictions and the Need for User-Adjustable Stopping

arXiv.org Machine LearningSep-17-2014

A survey of existing methods for stopping active learning (AL) reveals the needs for methods that are: more widely applicable; more aggressive in saving annotations; and more stable across changing datasets. A new method for stopping AL based on stabilizing predictions is presented that addresses these needs. Furthermore, stopping methods are required to handle a broad range of different annotation/performance tradeoff valuations. Despite this, the existing body of work is dominated by conservative methods with little (if any) attention paid to providing users with control over the behavior of stopping methods. The proposed method is shown to fill a gap in the level of aggressiveness available for stopping AL and supports providing users with control over stopping behavior.

artificial intelligence, machine learning, natural language, (18 more...)

1409.5165

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.47)

Bagnall, Anthony, Younsi, Reda

Ensembles of Random Sphere Cover Classifiers

arXiv.org Artificial IntelligenceSep-17-2014

We propose and evaluate alternative ensemble schemes for a new instance based learning classifier, the Randomised Sphere Cover (RSC) classifier. RSC fuses instances into spheres, then bases classification on distance to spheres rather than distance to instances. The randomised nature of RSC makes it ideal for use in ensembles. We propose two ensemble methods tailored to the RSC classifier; $\alpha \beta$RSE, an ensemble based on instance resampling and $\alpha$RSSE, a subspace ensemble. We compare $\alpha \beta$RSE and $\alpha$RSSE to tree based ensembles on a set of UCI datasets and demonstrates that RSC ensembles perform significantly better than some of these ensembles, and not significantly worse than the others. We demonstrate via a case study on six gene expression data sets that $\alpha$RSSE can outperform other subspace ensemble methods on high dimensional data when used in conjunction with an attribute filter. Finally, we perform a set of Bias/Variance decomposition experiments to analyse the source of improvement in comparison to a base classifier.

artificial intelligence, classifier, machine learning, (17 more...)

arXiv.org Artificial Intelligence

1409.4936

Country:

North America > United States (0.46)
Europe > United Kingdom > England (0.15)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.66)

Bresler, Guy, Gamarnik, David, Shah, Devavrat

Hardness of parameter estimation in graphical models

arXiv.org Artificial IntelligenceSep-17-2014

We consider the problem of learning the canonical parameters specifying an undirected graphical model (Markov random field) from the mean parameters. For graphical models representing a minimal exponential family, the canonical parameters are uniquely determined by the mean parameters, so the problem is feasible in principle. The goal of this paper is to investigate the computational feasibility of this statistical task. Our main result shows that parameter estimation is in general intractable: no algorithm can learn the canonical parameters of a generic pair-wise binary graphical model from the mean parameters in time bounded by a polynomial in the number of variables (unless RP = NP). Indeed, such a result has been believed to be true (see the monograph by Wainwright and Jordan (2008)) but no proof was known. Our proof gives a polynomial time reduction from approximating the partition function of the hard-core model, known to be hard, to learning approximate parameters. Our reduction entails showing that the marginal polytope boundary has an inherent repulsive property, which validates an optimization procedure over the polytope that does not use any knowledge of its structure (as required by the ellipsoid method and others).

artificial intelligence, hardcore model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

1409.3836

Country: Asia > Middle East > Jordan (0.24)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Ishiguro, Katsuhiko, Sato, Issei, Ueda, Naonori

Collapsed Variational Bayes Inference of Infinite Relational Model

The Infinite Relational Model (IRM) is a probabilistic model for relational data clustering that partitions objects into clusters based on observed relationships. This paper presents Averaged CVB (ACVB) solutions for IRM, convergence-guaranteed and practically useful fast Collapsed Variational Bayes (CVB) inferences. We first derive ordinary CVB and CVB0 for IRM based on the lower bound maximization. CVB solutions yield deterministic iterative procedures for inferring IRM given the truncated number of clusters. Our proposal includes CVB0 updates of hyperparameters including the concentration parameter of the Dirichlet Process, which has not been studied in the literature. To make the CVB more practically useful, we further study the CVB inference in two aspects. First, we study the convergence issues and develop a convergence-guaranteed algorithm for any CVB-based inferences called ACVB, which enables automatic convergence detection and frees non-expert practitioners from difficult and costly manual monitoring of inference processes. Second, we present a few techniques for speeding up IRM inferences. In particular, we describe the linear time inference of CVB0, allowing the IRM for larger relational data uses. The ACVB solutions of IRM showed comparable or better performance compared to existing inference methods in experiments, and provide deterministic, faster, and easier convergence detection.

data mining, machine learning, natural language, (20 more...)

1409.4757

Country:

Asia > Japan (0.28)
North America > United States (0.28)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology > Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.54)
(2 more...)

Bloodgood, Michael, Vijay-Shanker, K.

Taking into Account the Differences between Actively and Passively Acquired Data: The Case of Active Learning with Support Vector Machines for Imbalanced Datasets

Actively sampled data can have very different characteristics than passively sampled data. Therefore, it's promising to investigate using different inference procedures during AL than are used during passive learning (PL). This general idea is explored in detail for the focused case of AL with cost-weighted SVMs for imbalanced data, a situation that arises for many HLT tasks. The key idea behind the proposed InitPA method for addressing imbalance is to base cost models during AL on an estimate of overall corpus imbalance computed via a small unbiased sample rather than the imbalance in the labeled training data, which is the leading method used during PL.

artificial intelligence, imbalance, machine learning, (14 more...)

1409.4835

Country: North America > United States > Delaware > New Castle County > Newark (0.14)

Genre: Research Report (0.65)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Yildiz, Olcay Taner, Alpaydin, Ethem

Multivariate Comparison of Classification Algorithms

Statistical tests that compare classification algorithms are univariate and use a single performance measure, e.g., misclassification error, $F$ measure, AUC, and so on. In multivariate tests, comparison is done using multiple measures simultaneously. For example, error is the sum of false positives and false negatives and a univariate test on error cannot make a distinction between these two sources, but a 2-variate test can. Similarly, instead of combining precision and recall in $F$ measure, we can have a 2-variate test on (precision, recall). We use Hotelling's multivariate $T^2$ test for comparing two algorithms, and when we have three or more algorithms we use the multivariate analysis of variance (MANOVA) followed by pairwise post hoc tests. In our experiments, we see that multivariate tests have higher power than univariate tests, that is, they can detect differences that univariate tests cannot. We also discuss how multivariate analysis allows us to automatically extract performance measures that best distinguish the behavior of multiple algorithms.

artificial intelligence, machine learning, multivariate test, (16 more...)

1409.4566

Country: Asia > Middle East > Republic of Türkiye (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Trillos, Nicolás García, Slepčev, Dejan

Continuum limit of total variation on point clouds

We consider point clouds obtained as random samples of a measure on a Euclidean domain. A graph representing the point cloud is obtained by assigning weights to edges based on the distance between the points they connect. Our goal is to develop mathematical tools needed to study the consistency, as the number of available data points increases, of graph-based machine learning algorithms for tasks such as clustering. In particular, we study when is the cut capacity, and more generally total variation, on these graphs a good approximation of the perimeter (total variation) in the continuum setting. We address this question in the setting of $\Gamma$-convergence. We obtain almost optimal conditions on the scaling, as number of points increases, of the size of the neighborhood over which the points are connected by an edge for the $\Gamma$-convergence to hold. Taking the limit is enabled by a transportation based metric which allows to suitably compare functionals defined on different point clouds.

artificial intelligence, machine learning, sequence, (17 more...)

doi: 10.1007/s00205-015-0929-z

1403.6355

Country: North America > United States (0.45)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

arXiv.org Artificial IntelligenceSep-16-2014

ICE: Enabling Non-Experts to Build Models Interactively for Large-Scale Lopsided Problems

Simard, Patrice, Chickering, David, Lakshmiratan, Aparna, Charles, Denis, Bottou, Leon, Suarez, Carlos Garcia Jurado, Grangier, David, Amershi, Saleema, Verwey, Johan, Suh, Jina

Quick interaction between a human teacher and a learning machine presents numerous benefits and challenges when working with web-scale data. The human teacher guides the machine towards accomplishing the task of interest. The learning machine leverages big data to find examples that maximize the training value of its interaction with the teacher. When the teacher is restricted to labeling examples selected by the machine, this problem is an instance of active learning. When the teacher can provide additional information to the machine (e.g., suggestions on what examples or predictive features should be used) as the learning task progresses, then the problem becomes one of interactive learning. To accommodate the two-way communication channel needed for efficient interactive learning, the teacher and the machine need an environment that supports an interaction language. The machine can access, process, and summarize more examples than the teacher can see in a lifetime. Based on the machine's output, the teacher can revise the definition of the task or make it more precise. Both the teacher and the machine continuously learn and benefit from the interaction. We have built a platform to (1) produce valuable and deployable models and (2) support research on both the machine learning and user interface challenges of the interactive learning problem. The platform relies on a dedicated, low-latency, distributed, in-memory architecture that allows us to construct web-scale learning machines with quick interaction speed. The purpose of this paper is to describe this architecture and demonstrate how it supports our research efforts. Preliminary results are presented as illustrations of the architecture but are not the primary focus of the paper.

classifier, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

1409.4814

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry: Education > Educational Setting > Online (0.75)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.68)
Information Technology > Data Science > Data Mining > Big Data (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)