AITopics

1205.2642

Country: North America > Canada > Alberta (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Shpitser, Ilya, Pearl, Judea

Effects of Treatment on the Treated: Identification and Generalization

arXiv.org Artificial IntelligenceMay-9-2012

Many applications of causal analysis call for assessing, retrospectively, the effect of withholding an action that has in fact been implemented. This counterfactual quantity, sometimes called "effect of treatment on the treated," (ETT) have been used to to evaluate educational programs, critic public policies, and justify individual decision making. In this paper we explore the conditions under which ETT can be estimated from (i.e., identified in) experimental and/or observational studies. We show that, when the action invokes a singleton variable, the conditions for ETT identification have simple characterizations in terms of causal diagrams. We further give a graphical characterization of the conditions under which the effects of multiple treatments on the treated can be identified, as well as ways in which the ETT estimand can be constructed from both interventional and observational distributions.

artificial intelligence, causal diagram, evolutionary algorithm, (19 more...)

1205.2615

Country: North America > United States > California (0.15)

Industry:

Education (0.48)
Government (0.48)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

arXiv.org Artificial IntelligenceMay-9-2012

Measuring Inconsistency in Probabilistic Knowledge Bases

Thimm, Matthias

This paper develops an inconsistency measure on conditional probabilistic knowledge bases. The measure is based on fundamental principles for inconsistency measures and thus provides a solid theoretical framework for the treatment of inconsistencies in probabilistic expert systems. We illustrate its usefulness and immediate application on several examples and present some formal results. Building on this measure we use the Shapley value-a well-known solution for coalition games-to define a sophisticated indicator that is not only able to measure inconsistencies but to reveal the causes of inconsistencies in the knowledge base. Altogether these tools guide the knowledge engineer in his aim to restore consistency and therefore enable him to build a consistent and usable knowledge base that can be employed in probabilistic expert systems.

artificial intelligence, expert system, inconsistency, (15 more...)

1205.2613

Country:

Europe (0.28)
North America > United States > California (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)

Cortes, Corinna, Mohri, Mehryar, Rostamizadeh, Afshin

L2 Regularization for Learning Kernels

arXiv.org Machine LearningMay-9-2012

The choice of the kernel is critical to the success of many learning algorithms but it is typically left to the user. Instead, the training data can be used to learn the kernel by selecting it out of a given family, such as that of non-negative linear combinations of p base kernels, constrained by a trace or L1 regularization. This paper studies the problem of learning kernels with the same family of kernels but with an L2 regularization instead, and for regression problems. We analyze the problem of learning kernels with ridge regression. We derive the form of the solution of the optimization problem and give an efficient iterative algorithm for computing that solution. We present a novel theoretical analysis of the problem based on stability and give learning bounds for orthogonal kernels that contain only an additive term O(pp/m) when compared to the standard kernel ridge regression stability bound. We also report the results of experiments indicating that L1 regularization can lead to modest improvements for a small number of kernels, but to performance degradations in larger-scale cases. In contrast, L2 regularization never degrades performance and in fact achieves significant improvements with a large number of kernels.

artificial intelligence, kernel, machine learning, (18 more...)

1205.2653

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)

arXiv.org Machine LearningMay-9-2012

Domain Knowledge Uncertainty and Probabilistic Parameter Constraints

Mao, Yi, Lebanon, Guy

Incorporating domain knowledge into the modeling process is an effective way to improve learning accuracy. However, as it is provided by humans, domain knowledge can only be specified with some degree of uncertainty. We propose to explicitly model such uncertainty through probabilistic constraints over the parameter space. In contrast to hard parameter constraints, our approach is effective also when the domain knowledge is inaccurate and generally results in superior modeling accuracy. We focus on generative and conditional modeling where the parameters are assigned a Dirichlet or Gaussian prior and demonstrate the framework with experiments on both synthetic and real-world data.

bayesian inference, constraint, optimization problem, (20 more...)

1205.2627

Country:

Asia (0.15)
Europe (0.14)

Genre: Research Report (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
(2 more...)

Marlin, Benjamin, Schmidt, Mark, Murphy, Kevin

Group Sparse Priors for Covariance Estimation

arXiv.org Machine LearningMay-9-2012

Recently it has become popular to learn sparse Gaussian graphical models (GGMs) by imposing l1 or group l1,2 penalties on the elements of the precision matrix. Thispenalized likelihood approach results in a tractable convex optimization problem. In this paper, we reinterpret these results as performing MAP estimation under a novel prior which we call the group l1 and l1,2 positivedefinite matrix distributions. This enables us to build a hierarchical model in which the l1 regularization terms vary depending on which group the entries are assigned to, which in turn allows us to learn block structured sparse GGMs with unknown group assignments. Exact inference in this hierarchical model is intractable, due to the need to compute the normalization constant of these matrix distributions. However, we derive upper bounds on the partition functions, which lets us use fast variational inference (optimizing a lower bound on the joint posterior). We show that on two real world data sets (motion capture and financial data), our method which infers the block structure outperforms a method that uses a fixed block structure, which in turn outperforms baseline methods that ignore block structure.

banking & finance, matrix, optimization problem, (20 more...)

1205.2626

Country: North America > Canada > British Columbia (0.14)

Genre: Research Report (0.64)

Industry: Banking & Finance > Trading (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)

arXiv.org Machine LearningMay-8-2012

The Natural Gradient by Analogy to Signal Whitening, and Recipes and Tricks for its Use

Sohl-Dickstein, Jascha

The natural gradient allows for more efficient gradient descent by removing dependencies and biases inherent in a function's parameterization. Several papers present the topic thoroughly and precisely. It remains a very difficult idea to get your head around however. The intent of this note is to provide simple intuition for the natural gradient and its use. We review how an ill conditioned parameter space can undermine learning, introduce the natural gradient by analogy to the more widely understood concept of signal whitening, and present tricks and specific prescriptions for applying the natural gradient to learning problems.

artificial intelligence, gradient, machine learning, (14 more...)

1205.1828

Country: Asia (0.15)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.39)

Qian, Jing, Saligrama, Venkatesh, Zhao, Manqi

Graph-based Learning with Unbalanced Clusters

arXiv.org Machine LearningMay-8-2012

Graph construction is a crucial step in spectral clustering (SC) and graph-based semi-supervised learning (SSL). Spectral methods applied on standard graphs such as full-RBF, $\epsilon$-graphs and $k$-NN graphs can lead to poor performance in the presence of proximal and unbalanced data. This is because spectral methods based on minimizing RatioCut or normalized cut on these graphs tend to put more importance on balancing cluster sizes over reducing cut values. We propose a novel graph construction technique and show that the RatioCut solution on this new graph is able to handle proximal and unbalanced data. Our method is based on adaptively modulating the neighborhood degrees in a $k$-NN graph, which tends to sparsify neighborhoods in low density regions. Our method adapts to data with varying levels of unbalancedness and can be naturally used for small cluster detection. We justify our ideas through limit cut analysis. Unsupervised and semi-supervised experiments on synthetic and real data sets demonstrate the superiority of our method.

artificial intelligence, graph-based learning, machine learning, (17 more...)

1205.1496

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

S, Aji, Kaimal, Ramachandra

Document summarization using positive pointwise mutual information

arXiv.org Artificial IntelligenceMay-8-2012

The degree of success in document summarization processes depends on the performance of the method used in identifying significant sentences in the documents. The collection of unique words characterizes the major signature of the document, and forms the basis for Term-Sentence-Matrix (TSM). The Positive Pointwise Mutual Information, which works well for measuring semantic similarity in the Term-Sentence-Matrix, is used in our method to assign weights for each entry in the Term-Sentence-Matrix. The Sentence-Rank-Matrix generated from this weighted TSM, is then used to extract a summary from the document. Our experiments show that such a method would outperform most of the existing methods in producing summaries from large documents.

artificial intelligence, computer science & information technology, text processing, (11 more...)

1205.1638

Country:

Asia (0.70)
North America > United States > Louisiana (0.14)
North America > Canada > British Columbia (0.14)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.49)

Abdolali, Behrouz, Sameti, Hossein

A Novel Method For Speech Segmentation Based On Speakers' Characteristics

arXiv.org Artificial IntelligenceMay-8-2012

Speech Segmentation is the process change point detection for partitioning an input audio stream into regions each of which corresponds to only one audio source or one speaker. One application of this system is in Speaker Diarization systems. There are several methods for speaker segmentation; however, most of the Speaker Diarization Systems use BIC-based Segmentation methods. The main goal of this paper is to propose a new method for speaker segmentation with higher speed than the current methods - e.g. BIC - and acceptable accuracy. Our proposed method is based on the pitch frequency of the speech. The accuracy of this method is similar to the accuracy of common speaker segmentation methods. However, its computation cost is much less than theirs. We show that our method is about 2.4 times faster than the BIC-based method, while the average accuracy of pitch-based method is slightly higher than that of the BIC-based method.

acoustic processing, artificial intelligence, signal & image processing, (15 more...)

doi: 10.5121/sipij.2012.3205

1205.1794

Country:

North America > United States > New Jersey (0.14)
North America > United States > California (0.14)
Asia > Middle East > Iran (0.14)

Industry: Media (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.96)
Information Technology > Artificial Intelligence > Speech > Acoustic Processing (0.71)