Goto

Collaborating Authors

 Genre


Bayesian Conditional Gaussian Network Classifiers with Applications to Mass Spectra Classification

arXiv.org Machine Learning

Classifiers based on probabilistic graphical models are very effective. In continuous domains, maximum likelihood is usually used to assess the predictions of those classifiers. When data is scarce, this can easily lead to overfitting. In any probabilistic setting, Bayesian averaging (BA) provides theoretically optimal predictions and is known to be robust to overfitting. In this work we introduce Bayesian Conditional Gaussian Network Classifiers, which efficiently perform exact Bayesian averaging over the parameters. We evaluate the proposed classifiers against the maximum likelihood alternatives proposed so far over standard UCI datasets, concluding that performing BA improves the quality of the assessed probabilities (conditional log likelihood) whilst maintaining the error rate. Overfitting is more likely to occur in domains where the number of data items is small and the number of variables is large. These two conditions are met in the realm of bioinformatics, where the early diagnosis of cancer from mass spectra is a relevant task. We provide an application of our classification framework to that problem, comparing it with the standard maximum likelihood alternative, where the improvement of quality in the assessed probabilities is confirmed.


Identifiability of Gaussian structural equation models with equal error variances

arXiv.org Machine Learning

We consider structural equation models in which variables can be written as a function of their parents and noise terms, which are assumed to be jointly independent. Corresponding to each structural equation model, there is a directed acyclic graph describing the relationships between the variables. In Gaussian structural equation models with linear functions, the graph can be identified from the joint distribution only up to Markov equivalence classes, assuming faithfulness. In this work, we prove full identifiability if all noise variables have the same variances: the directed acyclic graph can be recovered from the joint Gaussian distribution. Our result has direct implications for causal inference: if the data follow a Gaussian structural equation model with equal error variances and assuming that all variables are observed, the causal structure can be inferred from observational data only. We propose a statistical method and an algorithm that exploit our theoretical findings.


Verification of Semantically-Enhanced Artifact Systems (Extended Version)

arXiv.org Artificial Intelligence

Artifact-Centric systems have emerged in the last years as a suitable framework to model business-relevant entities, by combining their static and dynamic aspects. In particular, the Guard-Stage-Milestone (GSM) approach has been recently proposed to model artifacts and their lifecycle in a declarative way. In this paper, we enhance GSM with a Semantic Layer, constituted by a full-fledged OWL 2 QL ontology linked to the artifact information models through mapping specifications. The ontology provides a conceptual view of the domain under study, and allows one to understand the evolution of the artifact system at a higher level of abstraction. In this setting, we present a technique to specify temporal properties expressed over the Semantic Layer, and verify them according to the evolution in the underlying GSM model. This technique has been implemented in a tool that exploits state-of-the-art ontology-based data access technologies to manipulate the temporal properties according to the ontology and the mappings, and that relies on the GSMC model checker for verification.


The Generalized Mean Information Coefficient

arXiv.org Machine Learning

Reshef & Reshef recently published a paper in which they present a method called the Maximal Information Coefficient (MIC) that can detect all forms of statistical dependence between pairs of variables as sample size goes to infinity. While this method has been praised by some, it has also been criticized for its lack of power in finite samples. We seek to modify MIC so that it has higher power in detecting associations for limited sample sizes. Here we present the Generalized Mean Information Coefficient (GMIC), a generalization of MIC which incorporates a tuning parameter that can be used to modify the complexity of the association favored by the measure. We define GMIC and prove it maintains several key asymptotic properties of MIC. Its increased power over MIC is demonstrated using a simulation of eight different functional relationships at sixty different noise levels. The results are compared to the Pearson correlation, distance correlation, and MIC. Simulation results suggest that while generally GMIC has slightly lower power than the distance correlation measure, it achieves higher power than MIC for many forms of underlying association. For some functional relationships, GMIC surpasses all other statistics calculated. Preliminary results suggest choosing a moderate value of the tuning parameter for GMIC will yield a test that is robust across underlying relationships. GMIC is a promising new method that mitigates the power issues suffered by MIC, at the possible expense of equitability. Nonetheless, distance correlation was in our simulations more powerful for many forms of underlying relationships. At a minimum, this work motivates further consideration of maximal information-based nonparametric exploration (MINE) methods as statistical tests of independence.


A Comparison of Algorithms for Learning Hidden Variables in Normal Graphs

arXiv.org Machine Learning

A Bayesian factor graph reduced to normal form (Forney, 2001) consists in the interconnection of diverter units (or equal constraint units) and Single-Input/Single-Output (SISO) blocks. In this framework localized adaptation rules are explicitly derived from a constrained maximum likelihood (ML) formulation and from a minimum KL-divergence criterion using KKT conditions. The learning algorithms are compared with two other updating equations based on a Viterbi-like and on a variational approximation respectively. The performance of the various algorithm is verified on synthetic data sets for various architectures. The objective of this paper is to provide the programmer with explicit algorithms for rapid deployment of Bayesian graphs in the applications.


Fast learning rate of multiple kernel learning: Trade-off between sparsity and smoothness

arXiv.org Machine Learning

We investigate the learning rate of multiple kernel learning (MKL) with $\ell_1$ and elastic-net regularizations. The elastic-net regularization is a composition of an $\ell_1$-regularizer for inducing the sparsity and an $\ell_2$-regularizer for controlling the smoothness. We focus on a sparse setting where the total number of kernels is large, but the number of nonzero components of the ground truth is relatively small, and show sharper convergence rates than the learning rates have ever shown for both $\ell_1$ and elastic-net regularizations. Our analysis reveals some relations between the choice of a regularization function and the performance. If the ground truth is smooth, we show a faster convergence rate for the elastic-net regularization with less conditions than $\ell_1$-regularization; otherwise, a faster convergence rate for the $\ell_1$-regularization is shown.


A Decomposition of the Max-min Fair Curriculum-based Course Timetabling Problem

arXiv.org Artificial Intelligence

We propose a decomposition of the max-min fair curriculum-based course timetabling (MMF-CB-CTT) problem. The decomposition models the room assignment subproblem as a generalized lexicographic bottleneck optimization problem (LBOP). We show that the generalized LBOP can be solved efficiently if the corresponding sum optimization problem can be solved efficiently. As a consequence, the room assignment subproblem of the MMF-CB-CTT problem can be solved efficiently. We use this insight to improve a previously proposed heuristic algorithm for the MMF-CB-CTT problem. Our experimental results indicate that using the new decomposition improves the performance of the algorithm on most of the 21 ITC2007 test instances with respect to the quality of the best solution found. Furthermore, we introduce a measure of the quality of a solution to a max-min fair optimization problem. This measure helps to overcome some limitations imposed by the qualitative nature of max-min fairness and aids the statistical evaluation of the performance of randomized algorithms for such problems. We use this measure to show that using the new decomposition the algorithm outperforms the original one on most instances with respect to the average solution quality.


An Integrated Framework for Diagnosis and Prognosis of Hybrid Systems

arXiv.org Artificial Intelligence

Complex systems are naturally hybrid: their dynamic behavior is both continuous and discrete. For these systems, maintenance and repair are an increasing part of the total cost of final product. Efficient diagnosis and prognosis techniques have to be adopted to detect, isolate and anticipate faults. This paper presents an original integrated theoretical framework for diagnosis and prognosis of hybrid systems. The formalism used for hybrid diagnosis is enriched in order to be able to follow the evolution of an aging law for each fault of the system. The paper presents a methodology for interleaving diagnosis and prognosis in a hybrid framework.


Evolution Theory of Self-Evolving Autonomous Problem Solving Systems

arXiv.org Artificial Intelligence

The present study is a continuation work of my previous work in the art:" Algebraic Net Class Rewriting Systems, Syntax and Semantics for Knowledge Representation and Aut omated Problem Solving" in Tirri SI (2013), and preliminaries as well as related notations are to be found there. Lots of studies have been driven to clarify routes between nodes e.g. in process algebra, important topics setting ground to game theories as well as overall in halting problems. On the other hand in more complex dimensional cases ordering definitions in sets of subgraphs have been under vigorous investigations mainly concentrated in tree structures. An amazingly minute portion of studies on gr aphs concentrates to relations between graphs and abstraction of them and one explanation for this might be that transformations on conceptual levels lead joints to a succinct model proper to syntax as well as to semantic domain requiring combining algebra ic structures to loop structured graphs and realizations of them, this requiring symbiosis of abstract syntax and real case sides. The most remarkable study of human abstraction mechanism yielding a concrete result especially within mathematics in the for m of analytical tools has been manifested by French philosopher, mathematician and physicist Ren é Descartes in the 17th century in his work " Regulae ad directionem ingenii, Règles utiles et claires pour la direction de l'Esprit en la recherche de la Vérité (1628)", freely outlining: "… a t first we must organize the things which are the most essential ones in conc entrating to do that by simplifying from phase to phase the vague, indefinite original pro b lem. Then we try to understand the relations between tho se simplified parts and then compare the propositions to be proved i.e. wise versa try to see the connections between the reached relations and the original problem.


Spectral redemption: clustering sparse networks

arXiv.org Machine Learning

Spectral algorithms are classic approaches to clustering and community detection in networks. However, for sparse networks the standard versions of these algorithms are suboptimal, in some cases completely failing to detect communities even when other algorithms such as belief propagation can do so. Here we introduce a new class of spectral algorithms based on a non-backtracking walk on the directed edges of the graph. The spectrum of this operator is much better-behaved than that of the adjacency matrix or other commonly used matrices, maintaining a strong separation between the bulk eigenvalues and the eigenvalues relevant to community structure even in the sparse case. We show that our algorithm is optimal for graphs generated by the stochastic block model, detecting communities all the way down to the theoretical limit. We also show the spectrum of the non-backtracking operator for some real-world networks, illustrating its advantages over traditional spectral clustering.