Europe
Fast Convergent Algorithms for Expectation Propagation Approximate Bayesian Inference
Seeger, Matthias W., Nickisch, Hannes
A growing number of challenging machine learning applications require decision-making from incomplete data (e.g., stochastic optimization, active sampling, robotics), which relies on quantitative representations of uncertainty (e.g., Bayesian posterior, belief state) and is out of reach of the commonly used paradigm of learning as point estimation on hand-selected data. While Bayesian inference is harder than point estimation in general, it can be relaxed to variational optimization problems which can be computationally competitive, if only they are treated with the algorithmic state-of-the-art established for the latter. In this paper, we propose a novel algorithm for the expectation propagation (EP; or adaptive TAP, or expectation consistent (EC)) relaxation [11, 8, 12], which is both much faster than the commonly used sequential EP algorithm, and is provably convergent (the sequential algorithm lacks such a guarantee). Our method builds on the convergent double loop algorithm of [12], but runs orders of magnitude faster. We gain a deeper understanding of EP (or EC) as optimization problem, unifying it with covariance decoupling ideas [19, 10], and allowing for "point estimation" algorithmic progress to be brought to bear on this powerful approximate inference formulation.
Translating biomarkers between multi-way time-series experiments
Huopaniemi, Ilkka, Suvitaival, Tommi, Orešič, Matej, Kaski, Samuel
Translating potential disease biomarkers between multi-species 'omics' experiments is a new direction in biomedical research. The existing methods are limited to simple experimental setups such as basic healthy-diseased comparisons. Most of these methods also require an a priori matching of the variables (e.g., genes or metabolites) between the species. However, many experiments have a complicated multi-way experimental design often involving irregularly-sampled time-series measurements, and for instance metabolites do not always have known matchings between organisms. We introduce a Bayesian modelling framework for translating between multiple species the results from 'omics' experiments having a complex multi-way, time-series experimental design. The underlying assumption is that the unknown matching can be inferred from the response of the variables to multiple covariates including time.
Descriptive-complexity based distance for fuzzy sets
The notion of distance between two objects is very general. Distance metrics and distances have now become an essential tool in many areas of mathematics and its applications including geometry, probability, statistics, coding/graph theory, data analysis, pattern recognition. For a comprehensive source on this subject see [4]. The notion of a fuzzy set was introduced by [8]. It is a class of objects with continuous values of membership and hence extends the classical definition of a set (to distinguish it from a fuzzy set we refer to it as a crisp set).
Dynamic Knowledge Capitalization through Annotation among Economic Intelligence Actors in a Collaborative Environment
Okunoye, Olusoji, Oladejo, Bolanle, Odumuyiwa, Victor
The shift from industrial economy to knowledge economy in today's world has revolutionalized strategic planning in organizations as well as their problem solving approaches. The point of focus today is knowledge and service production with more emphasis been laid on knowledge capital. Many organizations are investing on tools that facilitate knowledge sharing among their employees and they are as well promoting and encouraging collaboration among their staff in order to build the organization's knowledge capital with the ultimate goal of creating a lasting competitive advantage for their organizations. One of the current leading approaches used for solving organization's decision problem is the Economic Intelligence (EI) approach which involves interactions among various actors called EI actors. These actors collaborate to ensure the overall success of the decision problem solving process. In the course of the collaboration, the actors express knowledge which could be capitalized for future reuse. In this paper, we propose in the first place, an annotation model for knowledge elicitation among EI actors. Because of the need to build a knowledge capital, we also propose a dynamic knowledge capitalisation approach for managing knowledge produced by the actors. Finally, the need to manage the interactions and the interdependencies among collaborating EI actors, led to our third proposition which constitute an awareness mechanism for group work management.
Dynamic Capitalization and Visualization Strategy in Collaborative Knowledge Management System for EI Process
Oladejo, Bolanle, Odumuyiwa, Victor, David, Amos
Knowledge is attributed to human whose problem-solving behavior is subjective and complex. In today's knowledge economy, the need to manage knowledge produced by a community of actors cannot be overemphasized. This is due to the fact that actors possess some level of tacit knowledge which is generally difficult to articulate. Problem-solving requires searching and sharing of knowledge among a group of actors in a particular context. Knowledge expressed within the context of a problem resolution must be capitalized for future reuse. In this paper, an approach that permits dynamic capitalization of relevant and reliable actors' knowledge in solving decision problem following Economic Intelligence process is proposed. Knowledge annotation method and temporal attributes are used for handling the complexity in the communication among actors and in contextualizing expressed knowledge. A prototype is built to demonstrate the functionalities of a collaborative Knowledge Management system based on this approach. It is tested with sample cases and the result showed that dynamic capitalization leads to knowledge validation hence increasing reliability of captured knowledge for reuse. The system can be adapted to various domains
On the Implementation of GNU Prolog
Diaz, Daniel, Abreu, Salvador, Codognet, Philippe
GNU Prolog is a general-purpose implementation of the Prolog language, which distinguishes itself from most other systems by being, above all else, a native-code compiler which produces standalone executables which don't rely on any byte-code emulator or meta-interpreter. Other aspects which stand out include the explicit organization of the Prolog system as a multipass compiler, where intermediate representations are materialized, in Unix compiler tradition. GNU Prolog also includes an extensible and high-performance finite domain constraint solver, integrated with the Prolog language but implemented using independent lower-level mechanisms. This article discusses the main issues involved in designing and implementing GNU Prolog: requirements, system organization, performance and portability issues as well as its position with respect to other Prolog system implementations and the ISO standardization initiative.
Exchangeability and sets of desirable gambles
de Cooman, Gert, Quaeghebeur, Erik
Sets of desirable gambles constitute a quite general type of uncertainty model with an interesting geometrical interpretation. We give a general discussion of such models and their rationality criteria. We study exchangeability assessments for them, and prove counterparts of de Finetti's finite and infinite representation theorems. We show that the finite representation in terms of count vectors has a very nice geometrical interpretation, and that the representation in terms of frequency vectors is tied up with multivariate Bernstein (basis) polynomials. We also lay bare the relationships between the representations of updated exchangeable models, and discuss conservative inference (natural extension) under exchangeability and the extension of exchangeable sequences.
An Inverse Power Method for Nonlinear Eigenproblems with Applications in 1-Spectral Clustering and Sparse PCA
Hein, Matthias, Bühler, Thomas
Many problems in machine learning and statistics can be formulated as (generalized) eigenproblems. In terms of the associated optimization problem, computing linear eigenvectors amounts to finding critical points of a quadratic function subject to quadratic constraints. In this paper we show that a certain class of constrained optimization problems with nonquadratic objective and constraints can be understood as nonlinear eigenproblems. We derive a generalization of the inverse power method which is guaranteed to converge to a nonlinear eigenvector. We apply the inverse power method to 1-spectral clustering and sparse PCA which can naturally be formulated as nonlinear eigenproblems. In both applications we achieve state-of-the-art results in terms of solution quality and runtime. Moving beyond the standard eigenproblem should be useful also in many other applications and our inverse power method can be easily adapted to new problems.
An Effective Algorithm for and Phase Transitions of the Directed Hamiltonian Cycle Problem
The Hamiltonian cycle problem (HCP) is an important combinatorial problem with applications in many areas. It is among the first problems used for studying intrinsic properties, including phase transitions, of combinatorial problems. While thorough theoretical and experimental analyses have been made on the HCP in undirected graphs, a limited amount of work has been done for the HCP in directed graphs (DHCP). The main contribution of this work is an effective algorithm for the DHCP. Our algorithm explores and exploits the close relationship between the DHCP and the Assignment Problem (AP) and utilizes a technique based on Boolean satisfiability (SAT). By combining effective algorithms for the AP and SAT, our algorithm significantly outperforms previous exact DHCP algorithms, including an algorithm based on the award-winning Concorde TSP algorithm. The second result of the current study is an experimental analysis of phase transitions of the DHCP, verifying and refining a known phase transition of the DHCP.
Classifying extremely imbalanced data sets
Britsch, Markward, Gagunashvili, Nikolai, Schmelling, Michael
Imbalanced data sets containing much more background than signal instances are very common in particle physics, and will also be characteristic for the upcoming analyses of LHC data. Following up the work presented at ACAT 2008, we use the multivariate technique presented there (a rule growing algorithm with the meta-methods bagging and instance weighting) on much more imbalanced data sets, especially a selection of D0 decays without the use of particle identification. It turns out that the quality of the result strongly depends on the number of background instances used for training. We discuss methods to exploit this in order to improve the results significantly, and how to handle and reduce the size of large training sets without loss of result quality in general. We will also comment on how to take into account statistical fluctuation in receiver operation characteristic curves (ROC) for comparing classifier methods.