Goto

Collaborating Authors

 Genre


Counting in Graph Covers: A Combinatorial Characterization of the Bethe Entropy Function

arXiv.org Artificial Intelligence

We present a combinatorial characterization of the Bethe entropy function of a factor graph, such a characterization being in contrast to the original, analytical, definition of this function. We achieve this combinatorial characterization by counting valid configurations in finite graph covers of the factor graph. Analogously, we give a combinatorial characterization of the Bethe partition function, whose original definition was also of an analytical nature. As we point out, our approach has similarities to the replica method, but also stark differences. The above findings are a natural backdrop for introducing a decoder for graph-based codes that we will call symbolwise graph-cover decoding, a decoder that extends our earlier work on blockwise graph-cover decoding. Both graph-cover decoders are theoretical tools that help towards a better understanding of message-passing iterative decoding, namely blockwise graph-cover decoding links max-product (min-sum) algorithm decoding with linear programming decoding, and symbolwise graph-cover decoding links sum-product algorithm decoding with Bethe free energy function minimization at temperature one. In contrast to the Gibbs entropy function, which is a concave function, the Bethe entropy function is in general not concave everywhere. In particular, we show that every code picked from an ensemble of regular low-density parity-check codes with minimum Hamming distance growing (with high probability) linearly with the block length has a Bethe entropy function that is convex in certain regions of its domain.


Multi-view constrained clustering with an incomplete mapping between views

arXiv.org Artificial Intelligence

Multi-view learning algorithms typically assume a complete bipartite mapping between the different views in order to exchange information during the learning process. However, many applications provide only a partial mapping between the views, creating a challenge for current methods. To address this problem, we propose a multi-view algorithm based on constrained clustering that can operate with an incomplete mapping. Given a set of pairwise constraints in each view, our approach propagates these constraints using a local similarity measure to those instances that can be mapped to the other views, allowing the propagated constraints to be transferred across views via the partial mapping. It uses co-EM to iteratively estimate the propagation within each view based on the current clustering model, transfer the constraints across views, and then update the clustering model. By alternating the learning process between views, this approach produces a unified clustering model that is consistent with all views. We show that this approach significantly improves clustering performance over several other methods for transferring constraints and allows multi-view clustering to be reliably applied when given a limited mapping between the views. Our evaluation reveals that the propagated constraints have high precision with respect to the true clusters in the data, explaining their benefit to clustering performance in both single- and multi-view learning scenarios.


Level Set Estimation from Compressive Measurements using Box Constrained Total Variation Regularization

arXiv.org Machine Learning

Estimating the level set of a signal from measurements is a task that arises in a variety of fields, including medical imaging, astronomy, and digital elevation mapping. Motivated by scenarios where accurate and complete measurements of the signal may not available, we examine here a simple procedure for estimating the level set of a signal from highly incomplete measurements, which may additionally be corrupted by additive noise. The proposed procedure is based on box-constrained Total Variation (TV) regularization. We demonstrate the performance of our approach, relative to existing state-of-the-art techniques for level set estimation from compressive measurements, via several simulation examples.


Modeling Weather Conditions Consequences on Road Trafficking Behaviors

arXiv.org Machine Learning

T is commonly accepted that adverse weather conditions modify significantly traffic flow dynamics in a complex way. Actually, it is well known that bad weather conditions such as, heavy rain, fog, snow, induce a significant decrease on traffic flow speeds. Note that it can be partially explained by the legal speed regulations. However, if several studies conclude that road traffic speed decreases during adverse weather, this trend is only confirmed. Furthermore, up to our knowledge, no quantitative analysis has been conducted to forecast the evolution of the observed speed of vehicles.


A Fast Distributed Proximal-Gradient Method

arXiv.org Machine Learning

We present a distributed proximal-gradient method for optimizing the average of convex functions, each of which is the private local objective of an agent in a network with time-varying topology. The local objectives have distinct differentiable components, but they share a common nondifferentiable component, which has a favorable structure suitable for effective computation of the proximal operator. In our method, each agent iteratively updates its estimate of the global minimum by optimizing its local objective function, and exchanging estimates with others via communication in the network. Using Nesterov-type acceleration techniques and multiple communication steps per iteration, we show that this method converges at the rate 1/k (where k is the number of communication rounds between the agents), which is faster than the convergence rate of the existing distributed methods for solving this problem. The superior convergence rate of our method is also verified by numerical experiments.


Group Model Selection Using Marginal Correlations: The Good, the Bad and the Ugly

arXiv.org Machine Learning

Group model selection is the problem of determining a small subset of groups of predictors (e.g., the expression data of genes) that are responsible for majority of the variation in a response variable (e.g., the malignancy of a tumor). This paper focuses on group model selection in high-dimensional linear models, in which the number of predictors far exceeds the number of samples of the response variable. Existing works on high-dimensional group model selection either require the number of samples of the response variable to be significantly larger than the total number of predictors contributing to the response or impose restrictive statistical priors on the predictors and/or nonzero regression coefficients. This paper provides comprehensive understanding of a low-complexity approach to group model selection that avoids some of these limitations. The proposed approach, termed Group Thresholding (GroTh), is based on thresholding of marginal correlations of groups of predictors with the response variable and is reminiscent of existing thresholding-based approaches in the literature. The most important contribution of the paper in this regard is relating the performance of GroTh to a polynomial-time verifiable property of the predictors for the general case of arbitrary (random or deterministic) predictors and arbitrary nonzero regression coefficients.


Mining Permission Request Patterns from Android and Facebook Applications (extended author version)

arXiv.org Artificial Intelligence

Android and Facebook provide third-party applications with access to users' private data and the ability to perform potentially sensitive operations (e.g., post to a user's wall or place phone calls). As a security measure, these platforms restrict applications' privileges with permission systems: users must approve the permissions requested by applications before the applications can make privacy- or security-relevant API calls. However, recent studies have shown that users often do not understand permission requests and lack a notion of typicality of requests. As a first step towards simplifying permission systems, we cluster a corpus of 188,389 Android applications and 27,029 Facebook applications to find patterns in permission requests. Using a method for Boolean matrix factorization for finding overlapping clusters, we find that Facebook permission requests follow a clear structure that exhibits high stability when fitted with only five clusters, whereas Android applications demonstrate more complex permission requests. We also find that low-reputation applications often deviate from the permission request patterns that we identified for high-reputation applications suggesting that permission request patterns are indicative for user satisfaction or application quality.


Simulated Tom Thumb, the Rule Of Thumb for Autonomous Robots

arXiv.org Artificial Intelligence

For a mobile robot to be truly autonomous, it must solve the simultaneous localization and mapping (SLAM) problem. We develop a new metaheuristic algorithm called Simulated Tom Thumb (STT), based on the detailed adventure of the clever Tom Thumb and advances in researches relating to path planning based on potential functions. Investigations show that it is very promising and could be seen as an optimization of the powerful solution of SLAM with data association and learning capabilities. STT outperform JCBB. The performance is 100 % match.


Disjunctive Datalog with Existential Quantifiers: Semantics, Decidability, and Complexity Issues

arXiv.org Artificial Intelligence

Datalog is one of the best-known rule-based languages, and extensions of it are used in a wide context of applications. An important Datalog extension is Disjunctive Datalog, which significantly increases the expressivity of the basic language. Disjunctive Datalog is useful in a wide range of applications, ranging from Databases (e.g., Data Integration) to Artificial Intelligence (e.g., diagnosis and planning under incomplete knowledge). However, in recent years an important shortcoming of Datalog-based languages became evident, e.g. in the context of data-integration (consistent query-answering, ontology-based data access) and Semantic Web applications: The language does not permit any generation of and reasoning with unnamed individuals in an obvious way. In general, it is weak in supporting many cases of existential quantification. To overcome this problem, Datalogex has recently been proposed, which extends traditional Datalog by existential quantification in rule heads. In this work, we propose a natural extension of Disjunctive Datalog and Datalogex, called Datalogexor, which allows both disjunctions and existential quantification in rule heads and is therefore an attractive language for knowledge representation and reasoning, especially in domains where ontology-based reasoning is needed. We formally define syntax and semantics of the language Datalogexor, and provide a notion of instantiation, which we prove to be adequate for Datalogexor. A main issue of Datalogex and hence also of Datalogexor is that decidability is no longer guaranteed for typical reasoning tasks. In order to address this issue, we identify many decidable fragments of the language, which extend, in a natural way, analog classes defined in the non-disjunctive case. Moreover, we carry out an in-depth complexity analysis, deriving interesting results which range from Logarithmic Space to Exponential Time.


Annotating Answer-Set Programs in LANA?

arXiv.org Artificial Intelligence

While past research in answer-set programming (ASP) mainly focused on theory, ASP solver technology, and applications, the present work situates itself in the context of a quite recent research trend: development support for ASP. In particular, we propose to augment answer-set programs with additional meta-information formulated in a dedicated annotation language, called LANA. This language allows the grouping of rules into coherent blocks and to specify language signatures, types, pre- and postconditions, as well as unit tests for such blocks. While these annotations are invisible to an ASP solver, as they take the form of program comments, they can be interpreted by tools for documentation, testing, and verification purposes, as well as to eliminate sources of common programming errors by realising syntax checking or code completion features. To demonstrate its versatility, we introduce two such tools, viz. (i) ASPDOC, for generating an HTML documentation for a program based on the annotated information, and (ii) ASPUNIT, for running and monitoring unit tests on program blocks. LANA is also exploited in the SeaLion system, an integrated development environment for ASP based on Eclipse. To appear in Theory and Practice of Logic Programming