Goto

Collaborating Authors

 South America


Feature Selection as State-Space Search: An Empirical Study in Clustering Problems

AAAI Conferences

In this paper we treat the problem of feature selection in unsupervised learning as a state-space search problem. We introduce three different heuristic functions and perform extensive experiments on datasets with tens, hundreds, and thousands of features. Namely, we test different search algorithms using the heuristic functions we introduce. Our results show that the heuristic search approach for feature selection in unsupervised learning problems can be far superior than traditional baselines such as PCA and random projections.


Caching in Context-Minimal OR Spaces

AAAI Conferences

In empirical studies we observed that caching can have very little impact in reducing the search effort in Branch and Bound search over context-minimal OR spaces. For example, in one of the problem domains used in our experiments we reduce only by 1% the number of nodes expanded when using caching in context-minimal OR spaces. By contrast, we reduce by 74% the number of nodes expanded when using caching in context-minimal AND/OR spaces on the same instances. In this work we document this unexpected empirical finding and provide explanations for the phenomenon.


No One SATPlan Encoding To Rule Them All

AAAI Conferences

Solving planning problems via translation to propositional satisfiability (SAT) is one of the most successful approaches to automated planning. An important aspect of this approach is the encoding, i.e., the construction of a propositional formula from a given planning problem instance. Numerous encoding schemes have been proposed in the recent years each aiming to outperform the previous encodings on the majority of the benchmark problems. In this paper we take a different approach. Instead of trying to develop a new encoding that is better for all kinds of benchmarks we take recently developed specialized encoding schemes and design a method to automatically select the proper encoding for a given planning problem instance. In the paper we also examine ranking heuristics for the Relaxed Relaxed Exists-Step encoding, which plays an important role in our algorithm. Experiments show that our new approach significantly outperforms the state-of-the-art encoding schemes when compared on the benchmarks of the 2011 International Planning Competition.


Exploring the Synergy between Two Modular Learning Techniques for Automated Planning

AAAI Conferences

In the last decade the emphasis on improving the operational performance of domain independent automated planners has been in developing complex techniques which merge a range of different strategies. This quest for operational advantage, driven by the regular international planning competitions, has not made it easy to study, understand and predict what combinations of techniques will have what effect on a planner’s behaviour in a particular application domain. In this paper, we consider two machine learning techniques for planner performance improvement, and exploit a modular approach to their combination in order to facilitate the analysis of the impact of each individual component. We believe this can contribute to the development of more transparent planning engines, which are designed using modular, interchangeable, and well-founded components. Specifically, we combined two previously unrelated learning techniques, entanglements and relational decision trees, to guide a “vanilla” search algorithm. We report on a large experimental analysis which demonstrates the effectiveness of the approach in terms of performance improvements, resulting in a very competitive planning configuration despite the use of a more modular and transparent architecture. This gives insights on the strengths and weaknesses of the considered approaches, that will help their future exploitation.


Beta diffusion trees and hierarchical feature allocations

arXiv.org Machine Learning

We define the beta diffusion tree, a random tree structure with a set of leaves that defines a collection of overlapping subsets of objects, known as a feature allocation. A generative process for the tree structure is defined in terms of particles (representing the objects) diffusing in some continuous space, analogously to the Dirichlet diffusion tree (Neal, 2003b), which defines a tree structure over partitions (i.e., non-overlapping subsets) of the objects. Unlike in the Dirichlet diffusion tree, multiple copies of a particle may exist and diffuse along multiple branches in the beta diffusion tree, and an object may therefore belong to multiple subsets of particles. We demonstrate how to build a hierarchically-clustered factor analysis model with the beta diffusion tree and how to perform inference over the random tree structures with a Markov chain Monte Carlo algorithm. We conclude with several numerical experiments on missing data problems with data sets of gene expression microarrays, international development statistics, and intranational socioeconomic measurements.


Optimizing Hybrid Spreading in Metapopulations

arXiv.org Artificial Intelligence

Epidemic spreading phenomena are ubiquitous in nature and society. Examples include the spreading of diseases, information, and computer viruses. Epidemics can spread by local spreading, where infected nodes can only infect a limited set of direct target nodes and global spreading, where an infected node can infect every other node. In reality, many epidemics spread using a hybrid mixture of both types of spreading. In this study we develop a theoretical framework for studying hybrid epidemics, and examine the optimum balance between spreading mechanisms in terms of achieving the maximum outbreak size. We show the existence of critically hybrid epidemics where neither spreading mechanism alone can cause a noticeable spread but a combination of the two spreading mechanisms would produce an enormous outbreak. Our results provide new strategies for maximising beneficial epidemics and estimating the worst outcome of damaging hybrid epidemics.


Reports of the Workshops Held at the Tenth AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment

AI Magazine

The AIIDE-14 Workshop program was held Friday and Saturday, October 3–4, 2014 at North Carolina State University in Raleigh, North Carolina. The workshop program included five workshops covering a wide range of topics. The titles of the workshops held Friday were Games and Natural Language Processing, and Artificial Intelligence in Adversarial Real-Time Games. The titles of the workshops held Saturday were Diversity in Games Research, Experimental Artificial Intelligence in Games, and Musical Metacreation. This article presents short summaries of those events.


Exploiting Semantics for Big Data Integration

AI Magazine

There is a great deal of interest in big data, focusing mostly on data set size. The use of semantics in this integration descriptions and then integrating the data within process is key to building an approach that scales this unified framework. Finally, we conclude by to large numbers of heterogeneous sources. For example, in and (4) integrate the data across sources using this our museum use case, we received data in spreadsheets model. Karma has been used on a variety of types of (figure 1), comma-separated values (CSV), data, including biological data, mobile phone data, JSON (figure 3), XML, and relational databases (figure geospatial data, and cultural heritage data. In order to illustrate the approach to integrating One challenge in integrating diverse data sources is data in Karma, we will use an example from the cultural the ability to import different data formats into a heritage domain.


Indian Buffet process for model selection in convolved multiple-output Gaussian processes

arXiv.org Machine Learning

Multi-output Gaussian processes have received increasing attention during the last few years as a natural mechanism to extend the powerful flexibility of Gaussian processes to the setup of multiple output variables. The key point here is the ability to design kernel functions that allow exploiting the correlations between the outputs while fulfilling the positive definiteness requisite for the covariance function. Alternatives to construct these covariance functions are the linear model of coregionalization and process convolutions. Each of these methods demand the specification of the number of latent Gaussian process used to build the covariance function for the outputs. We propose in this paper, the use of an Indian Buffet process as a way to perform model selection over the number of latent Gaussian processes. This type of model is particularly important in the context of latent force models, where the latent forces are associated to physical quantities like protein profiles or latent forces in mechanical systems. We use variational inference to estimate posterior distributions over the variables involved, and show examples of the model performance over artificial data, a motion capture dataset, and a gene expression dataset.


A Neurodynamical System for finding a Minimal VC Dimension Classifier

arXiv.org Machine Learning

The recently proposed Minimal Complexity Machine (MCM) finds a hyperplane classifier by minimizing an exact bound on the Vapnik-Chervonenkis (VC) dimension. The VC dimension measures the capacity of a learning machine, and a smaller VC dimension leads to improved generalization. On many benchmark datasets, the MCM generalizes better than SVMs and uses far fewer support vectors than the number used by SVMs. In this paper, we describe a neural network based on a linear dynamical system, that converges to the MCM solution. The proposed MCM dynamical system is conducive to an analogue circuit implementation on a chip or simulation using Ordinary Differential Equation (ODE) solvers. Numerical experiments on benchmark datasets from the UCI repository show that the proposed approach is scalable and accurate, as we obtain improved accuracies and fewer number of support vectors (upto 74.3% reduction) with the MCM dynamical system.Keywords.