Goto

Collaborating Authors

 Europe


Intelligent Self-Repairable Web Wrappers

arXiv.org Artificial Intelligence

The amount of information available on the Web grows at an incredible high rate. Systems and procedures devised to extract these data from Web sources already exist, and different approaches and techniques have been investigated during the last years. On the one hand, reliable solutions should provide robust algorithms of Web data mining which could automatically face possible malfunctioning or failures. On the other, in literature there is a lack of solutions about the maintenance of these systems. Procedures that extract Web data may be strictly interconnected with the structure of the data source itself; thus, malfunctioning or acquisition of corrupted data could be caused, for example, by structural modifications of data sources brought by their owners. Nowadays, verification of data integrity and maintenance are mostly manually managed, in order to ensure that these systems work correctly and reliably. In this paper we propose a novel approach to create procedures able to extract data from Web sources -- the so called Web wrappers -- which can face possible malfunctioning caused by modifications of the structure of the data source, and can automatically repair themselves.


From Cognitive Binary Logic to Cognitive Intelligent Agents

arXiv.org Artificial Intelligence

The relation between self awareness and intelligence is an open problem these days. Despite the fact that self awarness is usually related to Emotional Intelligence, this is not the case here. The problem described in this paper is how to model an agent which knows (Cognitive) Binary Logic and which is also able to pass (without any mistake) a certain family of Turing Tests designed to verify its knowledge and its discourse about the modal states of truth corresponding to well-formed formulae within the language of Propositional Binary Logic.


How Insight Emerges in a Distributed, Content-addressable Memory

arXiv.org Artificial Intelligence

We begin this chapter with the bold claim that it provides a neuroscientific explanation of the magic of creativity. Creativity presents a formidable challenge for neuroscience. Neuroscience generally involves studying what happens in the brain when someone engages in a task that involves responding to a stimulus, or retrieving information from memory and using it the right way, or at the right time. If the relevant information is not already encoded in memory, the task generally requires that the individual make systematic use of information that is encoded in memory. But creativity is different. It paradoxically involves studying how someone pulls out of their brain something that was never put into it! Moreover, it must be something both new and useful, or appropriate to the task at hand. The ability to pull out of memory something new and appropriate that was never stored there in the first place is what we refer to as the magic of creativity. Even if we are so fortunate as to determine which areas of the brain are active and how these areas interact during creative thought, we will not have an answer to the question of how the brain comes up with solutions and artworks that are new and appropriate. On the other hand, since the representational capacity of neurons emerges at a level that is higher than that of the individual neurons themselves, the inner workings of neurons is too low a level to explain the magic of creativity. Thus we look to a level that is midway between gross brain regions and neurons. Since creativity generally involves combining concepts from different domains, or seeing old ideas from new perspectives, we focus our efforts on the neural mechanisms underlying the representation of concepts and ideas. Thus we ask questions about the brain at the level that accounts for its representational capacity, i.e. at the level of distributed aggregates of neurons.


Extensional Higher-Order Logic Programming

arXiv.org Artificial Intelligence

We propose a purely extensional semantics for higher-order logic programming. In this semantics program predicates denote sets of ordered tuples, and two predicates are equal iff they are equal as sets. Moreover, every program has a unique minimum Herbrand model which is the greatest lower bound of all Herbrand models of the program and the least fixed-point of an immediate consequence operator. We also propose an SLD-resolution proof procedure which is proven sound and complete with respect to the minimum model semantics. In other words, we provide a purely extensional theoretical framework for higher-order logic programming which generalizes the familiar theory of classical (first-order) logic programming.


Random forest models of the retention constants in the thin layer chromatography

arXiv.org Artificial Intelligence

In the current study we examine an application of the machine learning methods to model the retention constants in the thin layer chromatography (TLC). This problem can be described with hundreds or even thousands of descriptors relevant to various molecular properties, most of them redundant and not relevant for the retention constant prediction. Hence we employed feature selection to significantly reduce the number of attributes. Additionally we have tested application of the bagging procedure to the feature selection. The random forest regression models were built using selected variables. The resulting models have better correlation with the experimental data than the reference models obtained with linear regression. The cross-validation confirms robustness of the models.


A fuzzified BRAIN algorithm for learning DNF from incomplete data

arXiv.org Artificial Intelligence

Aim of this paper is to address the problem of learning Boolean functions from training data with missing values. We present an extension of the BRAIN algorithm, called U-BRAIN (Uncertainty-managing Batch Relevance-based Artificial INtelligence), conceived for learning DNF Boolean formulas from partial truth tables, possibly with uncertain values or missing bits. Such an algorithm is obtained from BRAIN by introducing fuzzy sets in order to manage uncertainty. In the case where no missing bits are present, the algorithm reduces to the original BRAIN.


Using More Data to Speed-up Training Time

arXiv.org Machine Learning

In many recent applications, data is plentiful. By now, we have a rather clear understanding of how more data can be used to improve the accuracy of learning algorithms. Recently, there has been a growing interest in understanding how more data can be leveraged to reduce the required training runtime. In this paper, we study the runtime of learning as a function of the number of available training examples, and underscore the main high-level techniques. We provide some initial positive results showing that the runtime can decrease exponentially while only requiring a polynomial growth of the number of examples, and spell-out several interesting open problems.


Source Separation and Clustering of Phase-Locked Subspaces: Derivations and Proofs

arXiv.org Machine Learning

Due to space limitations, our submission "Source Separation and Clustering of Phase-Locked Subspaces", accepted for publication on the IEEE Transactions on Neural Networks in 2011, presented some results without proof. Those proofs are provided in this paper.


Fast, Linear Time Hierarchical Clustering using the Baire Metric

arXiv.org Machine Learning

The Baire metric induces an ultrametric on a dataset and is of linear computational complexity, contrasted with the standard quadratic time agglomerative hierarchical clustering algorithm. In this work we evaluate empirically this new approach to hierarchical clustering. We compare hierarchical clustering based on the Baire metric with (i) agglomerative hierarchical clustering, in terms of algorithm properties; (ii) generalized ultrametrics, in terms of definition; and (iii) fast clustering through k-means partititioning, in terms of quality of results. For the latter, we carry out an in depth astronomical study. We apply the Baire distance to spectrometric and photometric redshifts from the Sloan Digital Sky Survey using, in this work, about half a million astronomical objects. We want to know how well the (more costly to determine) spectrometric redshifts can predict the (more easily obtained) photometric redshifts, i.e. we seek to regress the spectrometric on the photometric redshifts, and we use clusterwise regression for this.


Clustering with Multi-Layer Graphs: A Spectral Perspective

arXiv.org Machine Learning

Observational data usually comes with a multimodal nature, which means that it can be naturally represented by a multi-layer graph whose layers share the same set of vertices (users) with different edges (pairwise relationships). In this paper, we address the problem of combining different layers of the multi-layer graph for improved clustering of the vertices compared to using layers independently. We propose two novel methods, which are based on joint matrix factorization and graph regularization framework respectively, to efficiently combine the spectrum of the multiple graph layers, namely the eigenvectors of the graph Laplacian matrices. In each case, the resulting combination, which we call a "joint spectrum" of multiple graphs, is used for clustering the vertices. We evaluate our approaches by simulations with several real world social network datasets. Results demonstrate the superior or competitive performance of the proposed methods over state-of-the-art technique and common baseline methods, such as co-regularization and summation of information from individual graphs.