AITopics

We discuss theoretical aspects of the product rule for classification problems in supervised machine learning for the case of combining classifiers. We show that (1) the product rule arises from the MAP classifier supposing equivalent priors and conditional independence given a class; (2) under some conditions, the product rule is equivalent to minimizing the sum of the squared distances to the respective centers of the classes related with different features, such distances being weighted by the spread of the classes; (3) observing some hypothesis, the product rule is equivalent to concatenating the vectors of features. With the advance of the Machine Learning field, and the discovery of many different techniques, the subject of combining multiple learners [2] eventually drove attention, in particular the problem of combining classifiers. Many different methods appeared, and soon they were compared in terms of efficiency in solving problems. The product rule has been present in some of these works (e.g., [1, 7, 3, 6, 5, 4, 8]), in contexts ranging from the accuracy of the different combination rules to some analytical properties of the different methods.

artificial intelligence, machine learning, product rule, (15 more...)

1301.4157

Country: North America > United States (0.48)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.36)

Hage, Clemens, Kleinsteuber, Martin

Robust PCA and subspace tracking from incomplete observations using L0-surrogates

Many applications in data analysis rely on the decomposition of a data matrix into a low-rank and a sparse component. Existing methods that tackle this task use the nuclear norm and L1-cost functions as convex relaxations of the rank constraint and the sparsity measure, respectively, or employ thresholding techniques. We propose a method that allows for reconstructing and tracking a subspace of upper-bounded dimension from incomplete and corrupted observations. It does not require any a priori information about the number of outliers. The core of our algorithm is an intrinsic Conjugate Gradient method on the set of orthogonal projection matrices, the so-called Grassmannian. Non-convex sparsity measures are used for outlier detection, which leads to improved performance in terms of robustly recovering and tracking the low-rank matrix. In particular, our approach can cope with more outliers and with an underlying matrix of higher rank than other state-of-the-art methods.

artificial intelligence, data mining, machine learning, (17 more...)

1210.0805

Country:

North America > United States (0.46)
Europe > Germany (0.28)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)
Information Technology > Data Science > Data Mining (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Anandkumar, Animashree, Foster, Dean P., Hsu, Daniel, Kakade, Sham M., Liu, Yi-Kai

A Spectral Algorithm for Latent Dirichlet Allocation

The problem of topic modeling can be seen as a generalization of the clustering problem, in that it posits that observations are generated due to multiple latent factors (e.g., the words in each document are generated as a mixture of several active topics, as opposed to just one). This increased representational power comes at the cost of a more challenging unsupervised learning problem of estimating the topic probability vectors (the distributions over words for each topic), when only the words are observed and the corresponding topics are hidden. We provide a simple and efficient learning procedure that is guaranteed to recover the parameters for a wide class of mixture models, including the popular latent Dirichlet allocation (LDA) model. For LDA, the procedure correctly recovers both the topic probability vectors and the prior over the topics, using only trigram statistics (i.e., third order moments, which may be estimated with documents containing just three words). The method, termed Excess Correlation Analysis (ECA), is based on a spectral decomposition of low order moments (third and fourth order) via two singular value decompositions (SVDs). Moreover, the algorithm is scalable since the SVD operations are carried out on $k\times k$ matrices, where $k$ is the number of latent factors (e.g. the number of topics), rather than in the $d$-dimensional observed space (typically $d \gg k$).

artificial intelligence, machine learning, natural language, (20 more...)

1204.6703

Country:

Asia (1.00)
North America > United States > California (0.28)

Genre: Research Report (0.50)

Industry:

Leisure & Entertainment > Sports (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Law (0.92)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.84)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Vavoulis, Dimitrios V., Gough, Julian

Non-parametric Bayesian modelling of digital gene expression data

Next-generation sequencing technologies provide a revolutionary tool for generating gene expression data. Starting with a fixed RNA sample, they construct a library of millions of differentially abundant short sequence tags or "reads", which constitute a fundamentally discrete measure of the level of gene expression. A common limitation in experiments using these technologies is the low number or even absence of biological replicates, which complicates the statistical analysis of digital gene expression data. Analysis of this type of data has often been based on modified tests originally devised for analysing microarrays; both these and even de novo methods for the analysis of RNA-seq data are plagued by the common problem of low replication. We propose a novel, non-parametric Bayesian approach for the analysis of digital gene expression data. We begin with a hierarchical model for modelling over-dispersed count data and a blocked Gibbs sampling algorithm for inferring the posterior distribution of model parameters conditional on these counts. The algorithm compensates for the problem of low numbers of biological replicates by clustering together genes with tag counts that are likely sampled from a common distribution and using this augmented sample for estimating the parameters of this distribution. The number of clusters is not decided a priori, but it is inferred along with the remaining model parameters. We demonstrate the ability of this approach to model biological data with high fidelity by applying the algorithm on a public dataset obtained from cancerous and non-cancerous neural tissues.

artificial intelligence, machine learning, negative binomial distribution, (15 more...)

doi: 10.4172/jcsb.1000131

1301.4144

Genre: Research Report (0.82)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

de Rooij, Steven, van Erven, Tim, Grünwald, Peter D., Koolen, Wouter M.

Follow the Leader If You Can, Hedge If You Must

Follow-the-Leader (FTL) is an intuitive sequential prediction strategy that guarantees constant regret in the stochastic setting, but has terrible performance for worst-case data. Other hedging strategies have better worst-case guarantees but may perform much worse than FTL if the data are not maximally adversarial. We introduce the FlipFlop algorithm, which is the first method that provably combines the best of both worlds. As part of our construction, we develop AdaHedge, which is a new way of dynamically tuning the learning rate in Hedge without using the doubling trick. AdaHedge refines a method by Cesa-Bianchi, Mansour and Stoltz (2007), yielding slightly improved worst-case guarantees. By interleaving AdaHedge and FTL, the FlipFlop algorithm achieves regret within a constant factor of the FTL regret, without sacrificing AdaHedge's worst-case guarantees. AdaHedge and FlipFlop do not need to know the range of the losses in advance; moreover, unlike earlier methods, both have the intuitive property that the issued weights are invariant under rescaling and translation of the losses. The losses are also allowed to be negative, in which case they may be interpreted as gains.

artificial intelligence, bayesian inference, machine learning, (17 more...)

1301.0534

Country:

Europe > Netherlands (0.28)
Europe > United Kingdom (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Rizvandi, Nikzad Babaii, Taheri, Javid, Zomaya, Albert Y., Moraveji, Reza

A Study on Using Uncertain Time Series Matching Algorithms in MapReduce Applications

arXiv.org Artificial IntelligenceJan-17-2013

This paper has been originally published as "A study on using uncertain time series matching algorithms for MapReduce applications" in Journal of Concurrency and Computation: Practice and Experience - Special Issue in Cloud Computing Scalability, John Wiley Publisher. We realized that the original title is not appropriate and cannot be found by people working in this area. Therefore, this text is for changing the title but the original paper can be found at the rest of this text (starting from the next page). For citation, please cite the original title as: NB Rizvandi, J Taheri, R Moraveji, AY Zomaya, "A study on using uncertain time series matching algorithms for MapReduce applications", Journal of Concurrency and Computation: Practice and Experience - Special Issue in Cloud Computing Scalability, John Wiley Publisher (2012) A Study on Using Uncertain Time Series Matching Algorithms for MapReduce Applications Abstract--In this paper, we study CPU utilization time patterns of several MapReduce applications. After extracting running patterns of several applications, the patterns along with their statistical information are saved in a reference database to be later used to tweak system parameters to efficiently execute future unknown applications. To achieve this goal, CPU utilization patterns of new applications along with its statistical information are compared with the already known ones in the reference database to find/predict their most probable execution patterns. Because of different pattern lengths, the Dynamic Time Warping (DTW) is utilized for such comparison; a statistical analysis is then applied to DTWs' outcomes to select the most suitable candidates. Furthermore, under a hypothesis, we also proposed another algorithm to classify applications under similar CPU utilization patterns. Finally, dependency between minimum distance/maximum similarity of applications and their scalability (in both input size and number of virtual nodes) are studied.

data mining, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

1112.5505

Country:

Europe (1.00)
North America > United States (0.93)

Genre: Research Report (0.64)

Industry: Energy (0.68)

Technology:

Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining (0.94)
Information Technology > Artificial Intelligence > Natural Language (0.68)

Yu, Jingjin, LaValle, Steven M.

Planning Optimal Paths for Multiple Robots on Graphs

arXiv.org Artificial IntelligenceJan-17-2013

In this paper, we study the problem of optimal multi-robot path planning (MPP) on graphs. We propose two multiflow based integer linear programming (ILP) models that computes minimum last arrival time and minimum total distance solutions for our MPP formulation, respectively. The resulting algorithms from these ILP models are complete and guaranteed to yield true optimal solutions. In addition, our flexible framework can easily accommodate other variants of the MPP problem. Focusing on the time optimal algorithm, we evaluate its performance, both as a stand alone algorithm and as a generic heuristic for quickly solving large problem instances. Computational results confirm the effectiveness of our method.

artificial intelligence, optimization problem, robot, (17 more...)

arXiv.org Artificial Intelligence

1204.383

Country: North America > United States > Illinois (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.86)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.56)

Liu, Song, Yamada, Makoto, Collier, Nigel, Sugiyama, Masashi

Change-Point Detection in Time-Series Data by Relative Density-Ratio Estimation

arXiv.org Machine LearningJan-16-2013

The objective of change-point detection is to discover abrupt property changes lying behind time-series data. In this paper, we present a novel statistical change-point detection algorithm based on non-parametric divergence estimation between time-series samples from two retrospective segments. Our method uses the relative Pearson divergence as a divergence measure, and it is accurately and efficiently estimated by a method of direct density-ratio estimation. Through experiments on artificial and real-world datasets including human-activity sensing, speech, and Twitter messages, we demonstrate the usefulness of the proposed method.

change-point detection, survey article, upstream oil & gas, (16 more...)

doi: 10.1016/j.neunet.2013.01.012

1203.0453

Country:

North America > United States (1.00)
Asia > Japan > Honshū (0.28)
Europe > Netherlands (0.28)
(3 more...)

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas > Upstream (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

arXiv.org Artificial IntelligenceJan-16-2013

Game Networks

La Mura, Pierfrancesco

We introduce Game networks (G nets), a novel representation for multi-agent decision problems. Compared to other game-theoretic representations, such as strategic or extensive forms, G nets are more structured and more compact; more fundamentally, G nets constitute a computationally advantageous framework for strategic inference, as both probability and utility independencies are captured in the structure of the network and can be exploited in order to simplify the inference process. An important aspect of multi-agent reasoning is the identification of some or all of the strategic equilibria in a game; we present original convergence methods for strategic equilibrium which can take advantage of strategic separabilities in the G net structure in order to simplify the computations. Specifically, we describe a method which identifies a unique equilibrium as a function of the game payoffs, and one which identifies all equilibria.

equilibria, equilibrium, representation, (15 more...)

arXiv.org Artificial Intelligence

1301.387

Country: North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (0.48)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.35)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.31)

Sebban, Marc, Nock, Richard

Combining Feature and Prototype Pruning by Uncertainty Minimization

arXiv.org Machine LearningJan-16-2013

We focus in this paper on dataset reduction techniques for use in k-nearest neighbor classification. In such a context, feature and prototype selections have always been independently treated by the standard storage reduction algorithms. While this certifying is theoretically justified by the fact that each subproblem is NP-hard, we assume in this paper that a joint storage reduction is in fact more intuitive and can in practice provide better results than two independent processes. Moreover, it avoids a lot of distance calculations by progressively removing useless instances during the feature pruning. While standard selection algorithms often optimize the accuracy to discriminate the set of solutions, we use in this paper a criterion based on an uncertainty measure within a nearest-neighbor graph. This choice comes from recent results that have proven that accuracy is not always the suitable criterion to optimize. In our approach, a feature or an instance is removed if its deletion improves information of the graph. Numerous experiments are presented in this paper and a statistical analysis shows the relevance of our approach, and its tolerance in the presence of noise.

accuracy, algorithm, reduction, (16 more...)

1301.3891

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.46)