Goto

Collaborating Authors

 Country


U-Sem: Semantic Enrichment, User Modeling and Mining of Usage Data on the Social Web

arXiv.org Artificial Intelligence

With the growing popularity of Social Web applications, more and more user data is published on the Web everyday. Our research focuses on investigating ways of mining data from such platforms that can be used for modeling users and for semantically augmenting user profiles. This process can enhance adaptation and personalization in various adaptive Web-based systems. In this paper, we present the U-Sem people modeling service, a framework for the semantic enrichment and mining of people's profiles from usage data on the Social Web. We explain the architecture of our people modeling service and describe its application in an adult e-learning context as an example.


Identifying Aspects for Web-Search Queries

Journal of Artificial Intelligence Research

Many web-search queries serve as the beginning of an exploration of an unknown space of information, rather than looking for a specific web page. To answer such queries effec- tively, the search engine should attempt to organize the space of relevant information in a way that facilitates exploration. We describe the Aspector system that computes aspects for a given query. Each aspect is a set of search queries that together represent a distinct information need relevant to the original search query. To serve as an effective means to explore the space, Aspector computes aspects that are orthogonal to each other and to have high combined coverage. Aspector combines two sources of information to compute aspects. We discover candidate aspects by analyzing query logs, and cluster them to eliminate redundancies. We then use a mass-collaboration knowledge base (e.g., Wikipedia) to compute candidate aspects for queries that occur less frequently and to group together aspects that are likely to be semantically related. We present a user study that indicates that the aspects we compute are rated favorably against three competing alternatives related searches proposed by Google, cluster labels assigned by the Clusty search engine, and navigational searches proposed by Bing.


Auto-associative models, nonlinear Principal component analysis, manifolds and projection pursuit

arXiv.org Machine Learning

In this paper, auto-associative models are proposed as candidates to the generalization of Principal Component Analysis. We show that these models are dedicated to the approximation of the dataset by a manifold. Here, the word "manifold" refers to the topology properties of the structure. The approximating manifold is built by a projection pursuit algorithm. At each step of the algorithm, the dimension of the manifold is incremented. Some theoretical properties are provided. In particular, we can show that, at each step of the algorithm, the mean residuals norm is not increased. Moreover, it is also established that the algorithm converges in a finite number of steps. Some particular auto-associative models are exhibited and compared to the classical PCA and some neural networks models. Implementation aspects are discussed. We show that, in numerous cases, no optimization procedure is required. Some illustrations on simulated and real data are presented.


Regularizers for Structured Sparsity

arXiv.org Machine Learning

We study the problem of learning a sparse linear regression vector under additional conditions on the structure of its sparsity pattern. This problem is relevant in machine learning, statistics and signal processing. It is well known that a linear regression can benefit from knowledge that the underlying regression vector is sparse. The combinatorial problem of selecting the nonzero components of this vector can be "relaxed" by regularizing the squared error with a convex penalty function like the $\ell_1$ norm. However, in many applications, additional conditions on the structure of the regression vector and its sparsity pattern are available. Incorporating this information into the learning method may lead to a significant decrease of the estimation error. In this paper, we present a family of convex penalty functions, which encode prior knowledge on the structure of the vector formed by the absolute values of the regression coefficients. This family subsumes the $\ell_1$ norm and is flexible enough to include different models of sparsity patterns, which are of practical and theoretical importance. We establish the basic properties of these penalty functions and discuss some examples where they can be computed explicitly. Moreover, we present a convergent optimization algorithm for solving regularized least squares with these penalty functions. Numerical simulations highlight the benefit of structured sparsity and the advantage offered by our approach over the Lasso method and other related methods.


A Discrete Evolutionary Model for Chess Players' Ratings

arXiv.org Artificial Intelligence

The Elo system for rating chess players, also used in other games and sports, was adopted by the World Chess Federation over four decades ago. Although not without controversy, it is accepted as generally reliable and provides a method for assessing players' strengths and ranking them in official tournaments. It is generally accepted that the distribution of players' rating data is approximately normal but, to date, no stochastic model of how the distribution might have arisen has been proposed. We propose such an evolutionary stochastic model, which models the arrival of players into the rating pool, the games they play against each other, and how the results of these games affect their ratings. Using a continuous approximation to the discrete model, we derive the distribution for players' ratings at time $t$ as a normal distribution, where the variance increases in time as a logarithmic function of $t$. We validate the model using published rating data from 2007 to 2010, showing that the parameters obtained from the data can be recovered through simulations of the stochastic model. The distribution of players' ratings is only approximately normal and has been shown to have a small negative skew. We show how to modify our evolutionary stochastic model to take this skewness into account, and we validate the modified model using the published official rating data.


The Complexity of Integer Bound Propagation

Journal of Artificial Intelligence Research

Bound propagation is an important Artificial Intelligence technique used in Constraint Programming tools to deal with numerical constraints. It is typically embedded within a search procedure (branch and prune) and used at every node of the search tree to narrow down the search space, so it is critical that it be fast. The procedure invokes constraint propagators until a common fixpoint is reached, but the known algorithms for this have a pseudo-polynomial worst-case time complexity: they are fast indeed when the variables have a small numerical range, but they have the well-known problem of being prohibitively slow when these ranges are large. An important question is therefore whether strongly-polynomial algorithms exist that compute the common bound consistent fixpoint of a set of constraints. This paper answers this question. In particular we show that this fixpoint computation is in fact NP-complete, even when restricted to binary linear constraints.


From Sparse Signals to Sparse Residuals for Robust Sensing

arXiv.org Machine Learning

Recent advances in sensor technology have made it feasible to deploy a network of inexpensive sensors for carrying out synergistically even sophisticated inference tasks. In applications such as environmental monitoring, surveillance of critical infrastructure, agriculture, or medical imaging, the typical concept of operation involves a large and possibly heterogeneous set of sensors locally observing the signal of interest, and transmitting their measurements to a higher-layer agent (fusion center). This so-termed layered sensing apparatus entails three operational conditions: (c1) Each node's measurement vector comprising either a collection of scalar observations across time, or a snapshot of different sensor readings, is typically assumed to be linearly related to the unknown variable(s). Such a linear model can arise when the sensing system is viewed as a linear filter with known impulse response. Even when the underlying model is nonlinear, the observations are approximately modeled as adhering to a (multivariate) linear regression; (c2) Either because readings are costly to sense and transmit, due to delay or stationarity constraints, or simply because dimensionality reduction is invoked to cope with the "curse of dimensionality," the linear model is oftentimes under-determined, i.e., the dimension of the unknown vector is larger than that of each sensor's vector observation; and (c3) Not all sensors are reliable because failures in the sensing devices, fades of the sensor-agent communication link, physical obstruction of the scene of interest, and (un)intentional interference, all can severely deteriorate the consistency and reliability of sensor data.


Sufficient Component Analysis for Supervised Dimension Reduction

arXiv.org Machine Learning

The purpose of sufficient dimension reduction (SDR) is to find the low-dimensional subspace of input features that is sufficient for predicting output values. In this paper, we propose a novel distribution-free SDR method called sufficient component analysis (SCA), which is computationally more efficient than existing methods. In our method, a solution is computed by iteratively performing dependence estimation and maximization: Dependence estimation is analytically carried out by recently-proposed least-squares mutual information (LSMI), and dependence maximization is also analytically carried out by utilizing the Epanechnikov kernel. Through large-scale experiments on real-world image classification and audio tagging problems, the proposed method is shown to compare favorably with existing dimension reduction approaches.


Worst-Case Upper Bound for (1, 2)-QSAT

arXiv.org Artificial Intelligence

The rigorous theoretical analysis of the algorithm for a subclass of QSAT, i.e. (1, 2)-QSAT, has been proposed in the literature. (1, 2)-QSAT, first introduced in SAT'08, can be seen as quantified extended 2-CNF formulas. Until now, within our knowledge, there exists no algorithm presenting the worst upper bound for (1, 2)-QSAT. Therefore in this paper, we present an exact algorithm to solve (1, 2)-QSAT. By analyzing the algorithms, we obtain a worst-case upper bound O(1.4142m), where m is the number of clauses.


Algorithms for computing the greatest simulations and bisimulations between fuzzy automata

arXiv.org Artificial Intelligence

Recently, two types of simulations (forward and backward simulations) and four types of bisimulations (forward, backward, forward-backward, and backward-forward bisimulations) between fuzzy automata have been introduced. If there is at least one simulation/bisimulation of some of these types between the given fuzzy automata, it has been proved that there is the greatest simulation/bisimulation of this kind. In the present paper, for any of the above-mentioned types of simulations/bisimulations we provide an effective algorithm for deciding whether there is a simulation/bisimulation of this type between the given fuzzy automata, and for computing the greatest one, whenever it exists. The algorithms are based on the method developed in [J. Ignjatovi\'c, M. \'Ciri\'c, S. Bogdanovi\'c, On the greatest solutions to certain systems of fuzzy relation inequalities and equations, Fuzzy Sets and Systems 161 (2010) 3081-3113], which comes down to the computing of the greatest post-fixed point, contained in a given fuzzy relation, of an isotone function on the lattice of fuzzy relations.