Goto

Collaborating Authors

 Country


How the initialization affects the stability of the k-means algorithm

arXiv.org Machine Learning

We investigate the role of the initialization for the stability of the k-means clustering algorithm. As opposed to other papers, we consider the actual k-means algorithm and do not ignore its property of getting stuck in local optima. We are interested in the actual clustering, not only in the costs of the solution. We analyze when different initializations lead to the same local optimum, and when they lead to different local optima. This enables us to prove that it is reasonable to select the number of clusters based on stability scores.


Bounds Arc Consistency for Weighted CSPs

Journal of Artificial Intelligence Research

The Weighted Constraint Satisfaction Problem (WCSP) framework allows representing and solving problems involving both hard constraints and cost functions. It has been applied to various problems, including resource allocation, bioinformatics, scheduling, etc. To solve such problems, solvers usually rely on branch-and-bound algorithms equipped with local consistency filtering, mostly soft arc consistency. However, these techniques are not well suited to solve problems with very large domains. Motivated by the resolution of an RNA gene localization problem inside large genomic sequences, and in the spirit of bounds consistency for large domains in crisp CSPs, we introduce soft bounds arc consistency, a new weighted local consistency specifically designed for WCSP with very large domains. Compared to soft arc consistency, BAC provides significantly improved time and space asymptotic complexity. In this paper, we show how the semantics of cost functions can be exploited to further improve the time complexity of BAC. We also compare both in theory and in practice the efficiency of BAC on a WCSP with bounds consistency enforced on a crisp CSP using cost variables. On two different real problems modeled as WCSP, including our RNA gene localization problem, we observe that maintaining bounds arc consistency outperforms arc consistency and also improves over bounds consistency enforced on a constraint model with cost variables.


Optimal Value of Information in Graphical Models

Journal of Artificial Intelligence Research

Many real-world decision making tasks require us to choose among several expensive observations. In a sensor network, for example, it is important to select the subset of sensors that is expected to provide the strongest reduction in uncertainty. In medical decision making tasks, one needs to select which tests to administer before deciding on the most effective treatment. It has been general practice to use heuristic-guided procedures for selecting observations. In this paper, we present the first efficient optimal algorithms for selecting observations for a class of probabilistic graphical models. For example, our algorithms allow to optimally label hidden variables in Hidden Markov Models (HMMs). We provide results for both selecting the optimal subset of observations, and for obtaining an optimal conditional observation plan. Furthermore we prove a surprising result: In most graphical models tasks, if one designs an efficient algorithm for chain graphs, such as HMMs, this procedure can be generalized to polytree graphical models. We prove that the optimizing value of information is $NP^{PP}$-hard even for polytrees. It also follows from our results that just computing decision theoretic value of information objective functions, which are commonly used in practice, is a #P-complete problem even on Naive Bayes models (a simple special case of polytrees). In addition, we consider several extensions, such as using our algorithms for scheduling observation selection for multiple sensors. We demonstrate the effectiveness of our approach on several real-world datasets, including a prototype sensor network deployment for energy conservation in buildings.


A Unified Semi-Supervised Dimensionality Reduction Framework for Manifold Learning

arXiv.org Artificial Intelligence

We present a general framework of semi-supervised dimensionality reduction for manifold learning which naturally generalizes existing supervised and unsupervised learning frameworks which apply the spectral decomposition. Algorithms derived under our framework are able to employ both labeled and unlabeled examples and are able to handle complex problems where data form separate clusters of manifolds. Our framework offers simple views, explains relationships among existing frameworks and provides further extensions which can improve existing algorithms. Furthermore, a new semi-supervised kernelization framework called ``KPCA trick'' is proposed to handle non-linear problems.


Solving Weighted Constraint Satisfaction Problems with Memetic/Exact Hybrid Algorithms

Journal of Artificial Intelligence Research

A weighted constraint satisfaction problem (WCSP) is a constraint satisfaction problem in which preferences among solutions can be expressed. Bucket elimination is a complete technique commonly used to solve this kind of constraint satisfaction problem. When the memory required to apply bucket elimination is too high, a heuristic method based on it (denominated mini-buckets) can be used to calculate bounds for the optimal solution. Nevertheless, the curse of dimensionality makes these techniques impractical on large scale problems. In response to this situation, we present a memetic algorithm for WCSPs in which bucket elimination is used as a mechanism for recombining solutions, providing the best possible child from the parental set. Subsequently, a multi-level model in which this exact/metaheuristic hybrid is further hybridized with branch-and-bound techniques and mini-buckets is studied. As a case study, we have applied these algorithms to the resolution of the maximum density still life problem, a hard constraint optimization problem based on Conway's game of life. The resulting algorithm consistently finds optimal patterns for up to date solved instances in less time than current approaches. Moreover, it is shown that this proposal provides new best known solutions for very large instances.


Online Search Cost Estimation for SAT Solvers

arXiv.org Artificial Intelligence

The methods focus on the online behaviour of the backtracking solver, as well as the structure of the problem. Modern SAT solvers present several challenges to estimate search cost including coping with nonchronological backtracking, learning and restarts. Our first method adapt an existing algorithm for estimating the size of a search tree to deal with these challenges. We then suggest a second method that uses a linear model trained on data gathered online at the start of search. We compare the effectiveness of these two methods using random and structured problems. We also demonstrate that predictions made in early restarts can be used to improve later predictions. We conclude by showing that the cost of solving a set of problems can be reduced by selecting a solver from a portfolio based on such cost estimations.


Restart Strategy Selection using Machine Learning Techniques

arXiv.org Artificial Intelligence

Restart strategies are an important factor in the performance of conflict-driven Davis Putnam style SAT solvers. Selecting a good restart strategy for a problem instance can enhance the performance of a solver. Inspired by recent success applying machine learning techniques to predict the runtime of SAT solvers, we present a method which uses machine learning to boost solver performance through a smart selection of the restart strategy. Based on easy to compute features, we train both a satisfiability classifier and runtime models. We use these models to choose between restart strategies. We present experimental results comparing this technique with the most commonly used restart strategies. Our results demonstrate that machine learning is effective in improving solver performance.


Fact Sheet on Semantic Web

arXiv.org Artificial Intelligence

What is the Semantic Web? "The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation" Machines should not just be able to display data, but rather be able to use it for automation, integration and reuse across various applications. The European Commission is funding numerous projects related to Ontologies and the Semantic Web; even more will be funded in its recently launched Sixth Framework Research Programme. The worldwide Semantic Web community is growing rather fast and forces are being joined with other technology developments such as Web Services or multimedia. Last, but not least, vendors are already offering mature products and solutions based on semantic technologies. Thus, the Semantic Web is currently moving from being a vision to becoming reality.


Graphical Probabilistic Routing Model for OBS Networks with Realistic Traffic Scenario

arXiv.org Artificial Intelligence

Burst contention is a well-known challenging problem in Optical Burst Switching (OBS) networks. Contention resolution approaches are always reactive and attempt to minimize the BLR based on local information available at the core node. On the other hand, a proactive approach that avoids burst losses before they occur is desirable. To reduce the probability of burst contention, a more robust routing algorithm than the shortest path is needed. This paper proposes a new routing mechanism for JET-based OBS networks, called Graphical Probabilistic Routing Model (GPRM) that selects less utilized links, on a hop-by-hop basis by using a bayesian network. We assume no wavelength conversion and no buffering to be available at the core nodes of the OBS network. We simulate the proposed approach under dynamic load to demonstrate that it reduces the Burst Loss Ratio (BLR) compared to static approaches by using Network Simulator 2 (ns-2) on NSFnet network topology and with realistic traffic matrix. Simulation results clearly show that the proposed approach outperforms static approaches in terms of BLR.


How Hard Is Bribery in Elections?

Journal of Artificial Intelligence Research

We study the complexity of influencing elections through bribery: How computationally complex is it for an external actor to determine whether by paying certain voters to change their preferences a specified candidate can be made the elections winner? We study this problem for election systems as varied as scoring protocols and Dodgson voting, and in a variety of settings regarding homogeneous-vs.-nonhomogeneous electorate bribability, bounded-size-vs.-arbitrary-sized candidate sets, weighted-vs.-unweighted voters, and succinct-vs.-nonsuccinct input specification. We obtain both polynomial-time bribery algorithms and proofs of the intractability of bribery, and indeed our results show that the complexity of bribery is extremely sensitive to the setting. For example, we find settings in which bribery is NP-complete but manipulation (by voters) is in P, and we find settings in which bribing weighted voters is NP-complete but bribing voters with individual bribe thresholds is in P. For the broad class of elections (including plurality, Borda, k-approval, and veto) known as scoring protocols, we prove a dichotomy result for bribery of weighted voters: We find a simple-to-evaluate condition that classifies every case as either NP-complete or in P.