Europe
U-Sem: Semantic Enrichment, User Modeling and Mining of Usage Data on the Social Web
Abel, Fabian, Celik, Ilknur, Hauff, Claudia, Hollink, Laura, Houben, Geert-Jan
With the growing popularity of Social Web applications, more and more user data is published on the Web everyday. Our research focuses on investigating ways of mining data from such platforms that can be used for modeling users and for semantically augmenting user profiles. This process can enhance adaptation and personalization in various adaptive Web-based systems. In this paper, we present the U-Sem people modeling service, a framework for the semantic enrichment and mining of people's profiles from usage data on the Social Web. We explain the architecture of our people modeling service and describe its application in an adult e-learning context as an example.
Identifying Aspects for Web-Search Queries
Wu, F., Madhavan, J., Halevy, A.
Many web-search queries serve as the beginning of an exploration of an unknown space of information, rather than looking for a specific web page. To answer such queries effec- tively, the search engine should attempt to organize the space of relevant information in a way that facilitates exploration. We describe the Aspector system that computes aspects for a given query. Each aspect is a set of search queries that together represent a distinct information need relevant to the original search query. To serve as an effective means to explore the space, Aspector computes aspects that are orthogonal to each other and to have high combined coverage. Aspector combines two sources of information to compute aspects. We discover candidate aspects by analyzing query logs, and cluster them to eliminate redundancies. We then use a mass-collaboration knowledge base (e.g., Wikipedia) to compute candidate aspects for queries that occur less frequently and to group together aspects that are likely to be semantically related. We present a user study that indicates that the aspects we compute are rated favorably against three competing alternatives related searches proposed by Google, cluster labels assigned by the Clusty search engine, and navigational searches proposed by Bing.
Auto-associative models, nonlinear Principal component analysis, manifolds and projection pursuit
Girard, Stéphane, Iovleff, Serge
In this paper, auto-associative models are proposed as candidates to the generalization of Principal Component Analysis. We show that these models are dedicated to the approximation of the dataset by a manifold. Here, the word "manifold" refers to the topology properties of the structure. The approximating manifold is built by a projection pursuit algorithm. At each step of the algorithm, the dimension of the manifold is incremented. Some theoretical properties are provided. In particular, we can show that, at each step of the algorithm, the mean residuals norm is not increased. Moreover, it is also established that the algorithm converges in a finite number of steps. Some particular auto-associative models are exhibited and compared to the classical PCA and some neural networks models. Implementation aspects are discussed. We show that, in numerous cases, no optimization procedure is required. Some illustrations on simulated and real data are presented.
A Discrete Evolutionary Model for Chess Players' Ratings
Fenner, Trevor, Levene, Mark, Loizou, George
The Elo system for rating chess players, also used in other games and sports, was adopted by the World Chess Federation over four decades ago. Although not without controversy, it is accepted as generally reliable and provides a method for assessing players' strengths and ranking them in official tournaments. It is generally accepted that the distribution of players' rating data is approximately normal but, to date, no stochastic model of how the distribution might have arisen has been proposed. We propose such an evolutionary stochastic model, which models the arrival of players into the rating pool, the games they play against each other, and how the results of these games affect their ratings. Using a continuous approximation to the discrete model, we derive the distribution for players' ratings at time $t$ as a normal distribution, where the variance increases in time as a logarithmic function of $t$. We validate the model using published rating data from 2007 to 2010, showing that the parameters obtained from the data can be recovered through simulations of the stochastic model. The distribution of players' ratings is only approximately normal and has been shown to have a small negative skew. We show how to modify our evolutionary stochastic model to take this skewness into account, and we validate the modified model using the published official rating data.
The Complexity of Integer Bound Propagation
Bordeaux, L., Katsirelos, G., Narodytska, N., Vardi, M. Y.
Bound propagation is an important Artificial Intelligence technique used in Constraint Programming tools to deal with numerical constraints. It is typically embedded within a search procedure (branch and prune) and used at every node of the search tree to narrow down the search space, so it is critical that it be fast. The procedure invokes constraint propagators until a common fixpoint is reached, but the known algorithms for this have a pseudo-polynomial worst-case time complexity: they are fast indeed when the variables have a small numerical range, but they have the well-known problem of being prohibitively slow when these ranges are large. An important question is therefore whether strongly-polynomial algorithms exist that compute the common bound consistent fixpoint of a set of constraints. This paper answers this question. In particular we show that this fixpoint computation is in fact NP-complete, even when restricted to binary linear constraints.
Algorithms for computing the greatest simulations and bisimulations between fuzzy automata
Ćirić, Miroslav, Ignjatović, Jelena, Jančić, Ivana, Damljanović, Nada
Recently, two types of simulations (forward and backward simulations) and four types of bisimulations (forward, backward, forward-backward, and backward-forward bisimulations) between fuzzy automata have been introduced. If there is at least one simulation/bisimulation of some of these types between the given fuzzy automata, it has been proved that there is the greatest simulation/bisimulation of this kind. In the present paper, for any of the above-mentioned types of simulations/bisimulations we provide an effective algorithm for deciding whether there is a simulation/bisimulation of this type between the given fuzzy automata, and for computing the greatest one, whenever it exists. The algorithms are based on the method developed in [J. Ignjatovi\'c, M. \'Ciri\'c, S. Bogdanovi\'c, On the greatest solutions to certain systems of fuzzy relation inequalities and equations, Fuzzy Sets and Systems 161 (2010) 3081-3113], which comes down to the computing of the greatest post-fixed point, contained in a given fuzzy relation, of an isotone function on the lattice of fuzzy relations.
Closed-set-based Discovery of Bases of Association Rules
Balcázar, José L., García-Saiz, Diego, Gómez-Pérez, Domingo, Tîrnăucă, Cristina
The output of an association rule miner is often huge in practice. This is why several concise lossless representations have been proposed, such as the "essential" or "representative" rules. We revisit the algorithm given by Kryszkiewicz (Int. Symp. Intelligent Data Analysis 2001, Springer-Verlag LNCS 2189, 350-359) for mining representative rules. We show that its output is sometimes incomplete, due to an oversight in its mathematical validation. We propose alternative complete generators and we extend the approach to an existing closure-aware basis similar to, and often smaller than, the representative rules, namely the basis B*.
Reduced Ordered Binary Decision Diagram with Implied Literals: A New knowledge Compilation Approach
Lai, Yong, Liu, Dayou, Wang, Shengsheng
Knowledge compilation is an approach to tackle the computational intractability of general reasoning problems. According to this approach, knowledge bases are converted off-line into a target compilation language which is tractable for on-line querying. Reduced ordered binary decision diagram (ROBDD) is one of the most influential target languages. We generalize ROBDD by associating some implied literals in each node and the new language is called reduced ordered binary decision diagram with implied literals (ROBDD-L). Then we discuss a kind of subsets of ROBDD-L called ROBDD-i with precisely i implied literals (0 \leq i \leq \infty). In particular, ROBDD-0 is isomorphic to ROBDD; ROBDD-\infty requires that each node should be associated by the implied literals as many as possible. We show that ROBDD-i has uniqueness over some specific variables order, and ROBDD-\infty is the most succinct subset in ROBDD-L and can meet most of the querying requirements involved in the knowledge compilation map. Finally, we propose an ROBDD-i compilation algorithm for any i and a ROBDD-\infty compilation algorithm. Based on them, we implement a ROBDD-L package called BDDjLu and then get some conclusions from preliminary experimental results: ROBDD-\infty is obviously smaller than ROBDD for all benchmarks; ROBDD-\infty is smaller than the d-DNNF the benchmarks whose compilation results are relatively small; it seems that it is better to transform ROBDDs-\infty into FBDDs and ROBDDs rather than straight compile the benchmarks.
Handwritten Digit Recognition with a Committee of Deep Neural Nets on GPUs
Cireşan, Dan C., Meier, Ueli, Gambardella, Luca M., Schmidhuber, Jürgen
Current automatic handwriting recognition algorithms are already pretty good at learning to recognize handwritten digits. More than a decade ago, Multilayer Perceptrons or MLPs (Werbos, 1974; LeCun, 1985; Rumelhart et al., 1986) were among the first classifiers tested on the now famous MNIST handwritten digit recognition benchmark. Most had few layers or few artificial neurons (units) per layer (LeCun et al., 1998), but apparently back then these were the biggest feasible MLPs, trained when CPU cores were at least 20 times slower than today. A more recent MLP with a single hidden layer of 800 units achieved 0.70% error (Simard et al., 2003). The latest substantial improvement by others occurred in 2003 (Simard et al., 2003) (error rate 0.4%).
Decidability and Undecidability Results for Propositional Schemata
Aravantinos, V., Caferra, R., Peltier, N.
We define a logic of propositional formula schemata adding to the syntax of propositional logic indexed propositions and iterated connectives ranging over intervals parameterized by arithmetic variables. The satisfiability problem is shown to be undecidable for this new logic, but we introduce a very general class of schemata, called bound-linear, for which this problem becomes decidable. This result is obtained by reduction to a particular class of schemata called regular, for which we provide a sound and complete terminating proof procedure. This schemata calculus allows one to capture proof patterns corresponding to a large class of problems specified in propositional logic. We also show that the satisfiability problem becomes again undecidable for slight extensions of this class, thus demonstrating that bound-linear schemata represent a good compromise between expressivity and decidability.