Goto

Collaborating Authors

 Industry


Understanding the Social Cascading of Geekspeak and the Upshots for Social Cognitive Systems

arXiv.org Artificial Intelligence

Barring swarm robotics, a substantial share of current machine-human and machine-machine learning and interaction mechanisms are being developed and fed by results of agent-based computer simulations, game-theoretic models, or robotic experiments based on a dyadic communication pattern. Yet, in real life, humans no less frequently communicate in groups, and gain knowledge and take decisions basing on information cumulatively gleaned from more than one single source. These properties should be taken into consideration in the design of autonomous artificial cognitive systems construed to interact with learn from more than one contact or 'neighbour'. To this end, significant practical import can be gleaned from research applying strict science methodology to human and social phenomena, e.g. to discovery of realistic creativity potential spans, or the 'exposure thresholds' after which new information could be accepted by a cognitive agent. The results will be presented of a project analysing the social propagation of neologisms in a microblogging service. From local, low-level interactions and information flows between agents inventing and imitating discrete lexemes we aim to describe the processes of the emergence of more global systemic order and dynamics, using the latest methods of complexity science. Whether in order to mimic them, or to 'enhance' them, parameters gleaned from complexity science approaches to humans' social and humanistic behaviour should subsequently be incorporated as points of reference in the field of robotics and human-machine interaction.


A Unified Approach for Modeling and Recognition of Individual Actions and Group Activities

arXiv.org Machine Learning

Recognizing group activities is challenging due to the difficulties in isolating individual entities, finding the respective roles played by the individuals and representing the complex interactions among the participants. Individual actions and group activities in videos can be represented in a common framework as they share the following common feature: both are composed of a set of low-level features describing motions, e.g., optical flow for each pixel or a trajectory for each feature point, according to a set of composition constraints in both temporal and spatial dimensions. In this paper, we present a unified model to assess the similarity between two given individual or group activities. Our approach avoids explicit extraction of individual actors, identifying and representing the inter-person interactions. With the proposed approach, retrieval from a video database can be performed through Query-by-Example; and activities can be recognized by querying videos containing known activities. The suggested video matching process can be performed in an unsupervised manner. We demonstrate the performance of our approach by recognizing a set of human actions and football plays.


Performance Tuning Of J48 Algorithm For Prediction Of Soil Fertility

arXiv.org Machine Learning

The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use("data mining",Wikipedia). A soil test is the analysis of a soil sample to determine nutrient content, composition and other characteristics. Tests are usually performed to measure fertility and indicate deficiencies that need to be remedied ("Soil Test", Wikipedia).. In this research, soil dataset containing soil test results has been used to apply various classification techniques in data mining. Soil fertility is a crucial attribute which is considered for land evaluation, also achieving and maintaining necessary levels of fertility is important for nurturing crop production, hence this paper includes steps for building an efficient and accurate predictive model of soil fertility with the help of J48 algorithm.


Learning LiNGAM based on data with more variables than observations

arXiv.org Machine Learning

A very important topic in systems biology is developing statistical methods that automatically find causal relations in gene regulatory networks with no prior knowledge of causal connectivity. Many methods have been developed for time series data. However, discovery methods based on steady-state data are often necessary and preferable since obtaining time series data can be more expensive and/or infeasible for many biological systems. A conventional approach is causal Bayesian networks. However, estimation of Bayesian networks is ill-posed. In many cases it cannot uniquely identify the underlying causal network and only gives a large class of equivalent causal networks that cannot be distinguished between based on the data distribution. We propose a new discovery algorithm for uniquely identifying the underlying causal network of genes. To the best of our knowledge, the proposed method is the first algorithm for learning gene networks based on a fully identifiable causal model called LiNGAM. We here compare our algorithm with competing algorithms using artificially-generated data, although it is definitely better to test it based on real microarray gene expression data.


The Logical Difference for the Lightweight Description Logic EL

Journal of Artificial Intelligence Research

We study a logic-based approach to versioning of ontologies. Under this view, ontologies provide answers to queries about some vocabulary of interest. The difference between two versions of an ontology is given by the set of queries that receive different answers. We investigate this approach for terminologies given in the description logic EL extended with role inclusions and domain and range restrictions for three distinct types of queries: subsumption, instance, and conjunctive queries. In all three cases, we present polynomial-time algorithms that decide whether two terminologies give the same answers to queries over a given vocabulary and compute a succinct representation of the difference if it is non- empty. We present an implementation, CEX2, of the developed algorithms for subsumption and instance queries and apply it to distinct versions of Snomed CT and the NCI ontology.


Bayesian and L1 Approaches to Sparse Unsupervised Learning

arXiv.org Artificial Intelligence

The use of L1 regularisation for sparse learning has generated immense research interest, with successful application in such diverse areas as signal acquisition, image coding, genomics and collaborative filtering. While existing work highlights the many advantages of L1 methods, in this paper we find that L1 regularisation often dramatically underperforms in terms of predictive performance when compared with other methods for inferring sparsity. We focus on unsupervised latent variable models, and develop L1 minimising factor models, Bayesian variants of "L1", and Bayesian models with a stronger L0-like sparsity induced through spike-and-slab distributions. These spike-and-slab Bayesian factor models encourage sparsity while accounting for uncertainty in a principled manner and avoiding unnecessary shrinkage of non-zero values. We demonstrate on a number of data sets that in practice spike-and-slab Bayesian methods outperform L1 minimisation, even on a computational budget. We thus highlight the need to re-assess the wide use of L1 methods in sparsity-reliant applications, particularly when we care about generalising to previously unseen data, and provide an alternative that, over many varying conditions, provides improved generalisation performance.


Efficient Algorithm for Extremely Large Multi-task Regression with Massive Structured Sparsity

arXiv.org Machine Learning

We develop a highly scalable optimization method called "hierarchical group-thresholding" for solving a multi-task regression model with complex structured sparsity constraints on both input and output spaces. Despite the recent emergence of several efficient optimization algorithms for tackling complex sparsity-inducing regularizers, true scalability in practical high-dimensional problems where a huge amount (e.g., millions) of sparsity patterns need to be enforced remains an open challenge, because all existing algorithms must deal with ALL such patterns exhaustively in every iteration, which is computationally prohibitive. Our proposed algorithm addresses the scalability problem by screening out multiple groups of coefficients simultaneously and systematically. We employ a hierarchical tree representation of group constraints to accelerate the process of removing irrelevant constraints by taking advantage of the inclusion relationships between group sparsities, thereby avoiding dealing with all constraints in every optimization step, and necessitating optimization operation only on a small number of outstanding coefficients. In our experiments, we demonstrate the efficiency of our method on simulation datasets, and in an application of detecting genetic variants associated with gene expression traits.


A Plea for Neutral Comparison Studies in Computational Sciences

arXiv.org Machine Learning

In a context where most published articles are devoted to the development of "new methods", comparison studies are generally appreciated by readers but surprisingly given poor consideration by many scientific journals. In connection with recent articles on over-optimism and epistemology published in Bioinformatics, this letter stresses the importance of neutral comparison studies for the objective evaluation of existing methods and the establishment of standards by drawing parallels with clinical research.


Nonparametric sparsity and regularization

arXiv.org Machine Learning

It is now common to see practical applications, for example in bioinformatics and computer vision, where the dimensionality of the data is in the order of hundreds, thousands and even tens of thousands. It is known that learning in such a high dimensional regime is feasible only if the quantity to be estimated satisfies some regularity assumptions [24]. In particular, the idea behind, so called, sparsity is that the quantity of interest depends only on a few relevant variables (dimensions). In turn, this latter assumption is often at the basis of the construction of interpretable data models, since the relevant dimensions allow for a compact, hence interpretable, representation. An instance of the above situation is the problem of learning from samples a multivariate function which depends only on a (possibly small) subset of relevant variables. Detecting such variables is the problem of variable selection. Largely motivated by recent advances in compressed sensing [15, 25], the above problem has been extensively studied under the assumption that the function of interest (target function) depends linearly to the relevant variables.


Detecting Events and Patterns in Large-Scale User Generated Textual Streams with Statistical Learning Methods

arXiv.org Machine Learning

A vast amount of textual web streams is influenced by events or phenomena emerging in the real world. The social web forms an excellent modern paradigm, where unstructured user generated content is published on a regular basis and in most occasions is freely distributed. The present Ph.D. Thesis deals with the problem of inferring information - or patterns in general - about events emerging in real life based on the contents of this textual stream. We show that it is possible to extract valuable information about social phenomena, such as an epidemic or even rainfall rates, by automatic analysis of the content published in Social Media, and in particular Twitter, using Statistical Machine Learning methods. An important intermediate task regards the formation and identification of features which characterise a target event; we select and use those textual features in several linear, non-linear and hybrid inference approaches achieving a significantly good performance in terms of the applied loss function. By examining further this rich data set, we also propose methods for extracting various types of mood signals revealing how affective norms - at least within the social web's population - evolve during the day and how significant events emerging in the real world are influencing them. Lastly, we present some preliminary findings showing several spatiotemporal characteristics of this textual information as well as the potential of using it to tackle tasks such as the prediction of voting intentions.