Goto

Collaborating Authors

 Industry


Selective Transfer Learning for Cross Domain Recommendation

arXiv.org Machine Learning

Collaborative filtering (CF) aims to predict users' ratings on items according to historical user-item preference data. In many real-world applications, preference data are usually sparse, which would make models overfit and fail to give accurate predictions. Recently, several research works show that by transferring knowledge from some manually selected source domains, the data sparseness problem could be mitigated. However for most cases, parts of source domain data are not consistent with the observations in the target domain, which may misguide the target domain model building. In this paper, we propose a novel criterion based on empirical prediction error and its variance to better capture the consistency across domains in CF settings. Consequently, we embed this criterion into a boosting framework to perform selective knowledge transfer. Comparing to several state-of-the-art methods, we show that our proposed selective transfer learning framework can significantly improve the accuracy of rating prediction tasks on several real-world recommendation tasks.


Get my pizza right: Repairing missing is-a relations in ALC ontologies (extended version)

arXiv.org Artificial Intelligence

With the increased use of ontologies in semantically-enabled applications, the issue of debugging defects in ontologies has become increasingly important. These defects can lead to wrong or incomplete results for the applications. Debugging consists of the phases of detection and repairing. In this paper we focus on the repairing phase of a particular kind of defects, i.e. the missing relations in the is-a hierarchy. Previous work has dealt with the case of taxonomies. In this work we extend the scope to deal with ALC ontologies that can be represented using acyclic terminologies. We present algorithms and discuss a system. This is an extended version of [18].


Enhancing the functional content of protein interaction networks

arXiv.org Machine Learning

Protein interaction networks are a promising type of data for studying complex biological systems. However, despite the rich information embedded in these networks, they face important data quality challenges of noise and incompleteness that adversely affect the results obtained from their analysis. Here, we explore the use of the concept of common neighborhood similarity (CNS), which is a form of local structure in networks, to address these issues. Although several CNS measures have been proposed in the literature, an understanding of their relative efficacies for the analysis of interaction networks has been lacking. We follow the framework of graph transformation to convert the given interaction network into a transformed network corresponding to a variety of CNS measures evaluated. The effectiveness of each measure is then estimated by comparing the quality of protein function predictions obtained from its corresponding transformed network with those from the original network. Using a large set of S. cerevisiae interactions, and a set of 136 GO terms, we find that several of the transformed networks produce more accurate predictions than those obtained from the original network. In particular, the $HC.cont$ measure proposed here performs particularly well for this task. Further investigation reveals that the two major factors contributing to this improvement are the abilities of CNS measures, especially $HC.cont$, to prune out noisy edges and introduce new links between functionally related proteins.


A Biomimetic Approach Based on Immune Systems for Classification of Unstructured Data

arXiv.org Artificial Intelligence

In this paper we present the results of unstructured data clustering in this case a textual data from Reuters 21578 corpus with a new biomimetic approach using immune system. Before experimenting our immune system, we digitalized textual data by the n-grams approach. The novelty lies on hybridization of n-grams and immune systems for clustering. The experimental results show that the recommended ideas are promising and prove that this method can solve the text clustering problem.


Asynchronous Decentralized Algorithm for Space-Time Cooperative Pathfinding

arXiv.org Artificial Intelligence

Cooperative pathfinding is a multi-agent path planning problem where a group of vehicles searches for a corresponding set of non-conflicting space-time trajectories. Many of the practical methods for centralized solving of cooperative pathfinding problems are based on the prioritized planning strategy. However, in some domains (e.g., multi-robot teams of unmanned aerial vehicles, autonomous underwater vehicles, or unmanned ground vehicles) a decentralized approach may be more desirable than a centralized one due to communication limitations imposed by the domain and/or privacy concerns. In this paper we present an asynchronous decentralized variant of prioritized planning ADPP and its interruptible version IADPP. The algorithm exploits the inherent parallelism of distributed systems and allows for a speed up of the computation process. Unlike the synchronized planning approaches, the algorithm allows an agent to react to updates about other agents' paths immediately and invoke its local spatio-temporal path planner to find the best trajectory, as response to the other agents' choices. We provide a proof of correctness of the algorithms and experimentally evaluate them on synthetic domains.


Clustering hidden Markov models with variational HEM

arXiv.org Machine Learning

The hidden Markov model (HMM) is a widely-used generative model that copes with sequential data, assuming that each observation is conditioned on the state of a hidden Markov chain. In this paper, we derive a novel algorithm to cluster HMMs based on the hierarchical EM (HEM) algorithm. The proposed algorithm i) clusters a given collection of HMMs into groups of HMMs that are similar, in terms of the distributions they represent, and ii) characterizes each group by a "cluster center", i.e., a novel HMM that is representative for the group, in a manner that is consistent with the underlying generative model of the HMM. To cope with intractable inference in the E-step, the HEM algorithm is formulated as a variational optimization problem, and efficiently solved for the HMM case by leveraging an appropriate variational approximation. The benefits of the proposed algorithm, which we call variational HEM (VHEM), are demonstrated on several tasks involving time-series data, such as hierarchical clustering of motion capture sequences, and automatic annotation and retrieval of music and of online hand-writing data, showing improvements over current methods. In particular, our variational HEM algorithm effectively leverages large amounts of data when learning annotation models by using an efficient hierarchical estimation procedure, which reduces learning times and memory requirements, while improving model robustness through better regularization.


$QD$-Learning: A Collaborative Distributed Strategy for Multi-Agent Reinforcement Learning Through Consensus + Innovations

arXiv.org Machine Learning

The paper considers a class of multi-agent Markov decision processes (MDPs), in which the network agents respond differently (as manifested by the instantaneous one-stage random costs) to a global controlled state and the control actions of a remote controller. The paper investigates a distributed reinforcement learning setup with no prior information on the global state transition and local agent cost statistics. Specifically, with the agents' objective consisting of minimizing a network-averaged infinite horizon discounted cost, the paper proposes a distributed version of $Q$-learning, $\mathcal{QD}$-learning, in which the network agents collaborate by means of local processing and mutual information exchange over a sparse (possibly stochastic) communication network to achieve the network goal. Under the assumption that each agent is only aware of its local online cost data and the inter-agent communication network is \emph{weakly} connected, the proposed distributed scheme is almost surely (a.s.) shown to yield asymptotically the desired value function and the optimal stationary control policy at each network agent. The analytical techniques developed in the paper to address the mixed time-scale stochastic dynamics of the \emph{consensus + innovations} form, which arise as a result of the proposed interactive distributed scheme, are of independent interest.


Evaluation of a Simple, Scalable, Parallel Best-First Search Strategy

arXiv.org Artificial Intelligence

Large-scale, parallel clusters composed of commodity processors are increasingly available, enabling the use of vast processing capabilities and distributed RAM to solve hard search problems. We investigate Hash-Distributed A* (HDA*), a simple approach to parallel best-first search that asynchronously distributes and schedules work among processors based on a hash function of the search state. We use this approach to parallelize the A* algorithm in an optimal sequential version of the Fast Downward planner, as well as a 24-puzzle solver. The scaling behavior of HDA* is evaluated experimentally on a shared memory, multicore machine with 8 cores, a cluster of commodity machines using up to 64 cores, and large-scale high-performance clusters, using up to 2400 processors. We show that this approach scales well, allowing the effective utilization of large amounts of distributed memory to optimally solve problems which require terabytes of RAM. We also compare HDA* to Transposition-table Driven Scheduling (TDS), a hash-based parallelization of IDA*, and show that, in planning, HDA* significantly outperforms TDS. A simple hybrid which combines HDA* and TDS to exploit strengths of both algorithms is proposed and evaluated.


Improved Local Search in Artificial Bee Colony using Golden Section Search

arXiv.org Artificial Intelligence

Artificial bee colony (ABC), an optimization algorithm is a recent addition to the family of population based search algorithm. ABC has taken its inspiration from the collective intelligent foraging behavior of honey bees. In this study we have incorporated golden section search mechanism in the structure of basic ABC to improve the global convergence and prevent to stick on a local solution. The proposed variant is termed as ILS-ABC. Comparative numerical results with the state-of-art algorithms show the performance of the proposal when applied to the set of unconstrained engineering design problems. The simulated results show that the proposed variant can be successfully applied to solve real life problems.


Generating Approximate Solutions to the TTP using a Linear Distance Relaxation

Journal of Artificial Intelligence Research

In some domestic professional sports leagues, the home stadiums are located in cities connected by a common train line running in one direction. For these instances, we can incorporate this geographical information to determine optimal or nearly-optimal solutions to the n-team Traveling Tournament Problem (TTP), an NP-hard sports scheduling problem whose solution is a double round-robin tournament schedule that minimizes the sum total of distances traveled by all n teams. We introduce the Linear Distance Traveling Tournament Problem (LD-TTP), and solve it for n=4 and n=6, generating the complete set of possible solutions through elementary combinatorial techniques. For larger n, we propose a novel "expander construction" that generates an approximate solution to the LD-TTP. For n congruent to 4 modulo 6, we show that our expander construction produces a feasible double round-robin tournament schedule whose total distance is guaranteed to be no worse than 4/3 times the optimal solution, regardless of where the n teams are located. This 4/3-approximation for the LD-TTP is stronger than the currently best-known ratio of 5/3 + epsilon for the general TTP. We conclude the paper by applying this linear distance relaxation to general (non-linear) n-team TTP instances, where we develop fast approximate solutions by simply "assuming" the n teams lie on a straight line and solving the modified problem. We show that this technique surprisingly generates the distance-optimal tournament on all benchmark sets on 6 teams, as well as close-to-optimal schedules for larger n, even when the teams are located around a circle or positioned in three-dimensional space.