Goto

Collaborating Authors

 Genre


What's in an `is about' link? Chemical diagrams and the Information Artifact Ontology

arXiv.org Artificial Intelligence

The Information Artifact Ontology is an ontology in the domain of information entities. Core to the definition of what it is to be an information entity is the claim that an information entity must be `about' something, which is encoded in an axiom expressing that all information entities are about some entity. This axiom comes into conflict with ontological realism, since many information entities seem to be about non-existing entities, such as hypothetical molecules. We discuss this problem in the context of diagrams of molecules, a kind of information entity pervasively used throughout computational chemistry. We then propose a solution that recognizes that information entities such as diagrams are expressions of diagrammatic languages. In so doing, we not only address the problem of classifying diagrams that seem to be about non-existing entities but also allow a more sophisticated categorisation of information entities.


Efficient hierarchical clustering for continuous data

arXiv.org Machine Learning

Learning hierarchical structures from observed data is a common practice in many knowledge domains. Examples include phylogenies and signaling pathways in biology, language models in linguistics, etc. Agglomerative clustering is still the most popular approach to hierarchical clustering due to its efficiency, ease of implementation and a wide range of possible distance metrics. However, because it is algorithmic in nature, there is no principled way to that agglomerative clustering can be used as a building block in more complex models. Bayesian priors for structure learning on the other hand, are perfectly suited to be employed in larger models. As an example, several authors have proposed using hierarchical structure priors to model correlation in factor models (Rai and Daume III, 2009; Henao et al., 2012; Zhang et al., 2011). Ricardo Henao is Postdoctoral Associate and Joseph E. Lucas is Assistant Research Professor at the Institute for Genome Sciences and Policy (IGSP), Duke University, Durham, NC 27710.


Structured sparsity through convex optimization

arXiv.org Machine Learning

Sparse estimation methods are aimed at using or obtaining parsimonious representations of data or models. While naturally cast as a combinatorial optimization problem, variable or feature selection admits a convex relaxation through the regularization by the $\ell_1$-norm. In this paper, we consider situations where we are not only interested in sparsity, but where some structural prior knowledge is available as well. We show that the $\ell_1$-norm can then be extended to structured norms built on either disjoint or overlapping groups of variables, leading to a flexible framework that can deal with various structures. We present applications to unsupervised learning, for structured sparse principal component analysis and hierarchical dictionary learning, and to supervised learning in the context of non-linear variable selection.


Automatic Sampling of Geographic objects

arXiv.org Artificial Intelligence

Today, one's disposes of large datasets composed of thousands of geographic objects. However, for many processes, which require the appraisal of an expert or much computational time, only a small part of these objects can be taken into account. In this context, robust sampling methods become necessary. In this paper, we propose a sampling method based on clustering techniques. Our method consists in dividing the objects in clusters, then in selecting in each cluster, the most representative objects. A case-study in the context of a process dedicated to knowledge revision for geographic data generalisation is presented. This case-study shows that our method allows to select relevant samples of objects.


A Privacy-Aware Bayesian Approach for Combining Classifier and Cluster Ensembles

arXiv.org Machine Learning

This paper introduces a privacy-aware Bayesian approach that combines ensembles of classifiers and clusterers to perform semi-supervised and transductive learning. We consider scenarios where instances and their classification/clustering results are distributed across different data sites and have sharing restrictions. As a special case, the privacy aware computation of the model when instances of the target data are distributed across different data sites, is also discussed. Experimental results show that the proposed approach can provide good classification accuracies while adhering to the data/model sharing constraints.


The Stick-Breaking Construction of the Beta Process as a Poisson Process

arXiv.org Machine Learning

We show that the stick-breaking construction of the beta process due to Paisley, et al. (2010) can be obtained from the characterization of the beta process as a Poisson process. Specifically, we show that the mean measure of the underlying Poisson process is equal to that of the beta process. We use this underlying representation to derive error bounds on truncated beta processes that are tighter than those in the literature. We also develop a new MCMC inference algorithm for beta processes, based in part on our new Poisson process construction.


The Discrete Infinite Logistic Normal Distribution

arXiv.org Machine Learning

We present the discrete infinite logistic normal distribution (DILN), a Bayesian nonparametric prior for mixed membership models. DILN is a generalization of the hierarchical Dirichlet process (HDP) that models correlation structure between the weights of the atoms at the group level. We derive a representation of DILN as a normalized collection of gamma-distributed random variables, and study its statistical properties. We consider applications to topic modeling and derive a variational inference algorithm for approximate posterior inference. We study the empirical performance of the DILN topic model on four corpora, comparing performance with the HDP and the correlated topic model (CTM). To deal with large-scale data sets, we also develop an online inference algorithm for DILN and compare with online HDP and online LDA on the Nature magazine, which contains approximately 350,000 articles.


Learning in Riemannian Orbifolds

arXiv.org Artificial Intelligence

Statistical data analysis and learning in Riemannian orbifolds is motivated by applications, where the data we want to learn on are naturally represented by finite combinatorial structures such as point patterns, trees, and graphs. Examples from structural pattern recognition that learn on structured data include estimating central points of a distribution on graphs such as the mean and median [9, 16, 15, 21], central clustering of graphs [10, 12, 13, 14, 19, 15, 23], learning graph quantization [17], and multilayer perceptrons for graphs [20]. In retrospect, the structure space framework proposed by [18] theoretically justifies the above approaches in the sense that they actually minimize an empirical risk function on structures. Since minimizing an empirical risk function is usually computationally intractable, the ultimate challenge consists in constructing efficient algorithms which are capable to return optimal or at least suboptimal solutions. From the point of view of statistical pattern recognition, however, the ultimate goal is not to determine a good solution of an empirical risk function, but rather to discover the true but unknown structure of the data with respect to its distribution.


The Complexity of Manipulating $k$-Approval Elections

arXiv.org Artificial Intelligence

An important problem in computational social choice theory is the complexity of undesirable behavior among agents, such as control, manipulation, and bribery in election systems. These kinds of voting strategies are often tempting at the individual level but disastrous for the agents as a whole. Creating election systems where the determination of such strategies is difficult is thus an important goal. An interesting set of elections is that of scoring protocols. Previous work in this area has demonstrated the complexity of misuse in cases involving a fixed number of candidates, and of specific election systems on unbounded number of candidates such as Borda. In contrast, we take the first step in generalizing the results of computational complexity of election misuse to cases of infinitely many scoring protocols on an unbounded number of candidates. Interesting families of systems include $k$-approval and $k$-veto elections, in which voters distinguish $k$ candidates from the candidate set. Our main result is to partition the problems of these families based on their complexity. We do so by showing they are polynomial-time computable, NP-hard, or polynomial-time equivalent to another problem of interest. We also demonstrate a surprising connection between manipulation in election systems and some graph theory problems.


Avian Influenza (H5N1) Expert System using Dempster-Shafer Theory

arXiv.org Artificial Intelligence

Based on Cumulative Number of Confirmed Human Cases of Avian Influenza (H5N1) Reported to World Health Organization (WHO) in the 2011 from 15 countries, Indonesia has the largest number death because Avian Influenza which 146 deaths. In this research, the researcher built an Avian Influenza (H5N1) Expert System for identifying avian influenza disease and displaying the result of identification process. In this paper, we describe five symptoms as major symptoms which include depression, combs, wattle, bluish face region, swollen face region, narrowness of eyes, and balance disorders. We use chicken as research object. Research location is in the Lampung Province, South Sumatera. The researcher reason to choose Lampung Province in South Sumatera on the basis that has a high poultry population. Dempster-Shafer theory to quantify the degree of belief as inference engine in expert system, our approach uses Dempster-Shafer theory to combine beliefs under conditions of uncertainty and ignorance, and allows quantitative measurement of the belief and plausibility in our identification result. The result reveal that Avian Influenza (H5N1) Expert System has successfully identified the existence of avian influenza and displaying the result of identification process.