Genre
Consensus Dynamics in a non-deterministic Naming Game with Shared Memory
Filho, Reginaldo J. da Silva, Brust, Matthias R., Ribeiro, Carlos H. C.
In the naming game, individuals or agents exchange pairwise local information in order to communicate about objects in their common environment. The goal of the game is to reach a consensus about naming these objects. Originally used to investigate language formation and self-organizing vocabularies, we extend the classical naming game with a globally shared memory accessible by all agents. This shared memory can be interpreted as an external source of knowledge like a book or an Internet site. The extended naming game models an environment similar to one that can be found in the context of social bookmarking and collaborative tagging sites where users tag sites using appropriate labels, but also mimics an important aspect in the field of human-based image labeling. Although the extended naming game is non-deterministic in its word selection, we show that consensus towards a common vocabulary is reached. More importantly, we show the qualitative and quantitative influence of the external source of information, i.e. the shared memory, on the consensus dynamics between the agents.
Reduced-Rank Hidden Markov Models
Siddiqi, Sajid M., Boots, Byron, Gordon, Geoffrey J.
We introduce the Reduced-Rank Hidden Markov Model (RR-HMM), a generalization of HMMs that can model smooth state evolution as in Linear Dynamical Systems (LDSs) as well as non-log-concave predictive distributions as in continuous-observation HMMs. RR-HMMs assume an m-dimensional latent state and n discrete observations, with a transition matrix of rank k <= m. This implies the dynamics evolve in a k-dimensional subspace, while the shape of the set of predictive distributions is determined by m. Latent state belief is represented with a k-dimensional state vector and inference is carried out entirely in R^k, making RR-HMMs as computationally efficient as k-state HMMs yet more expressive. To learn RR-HMMs, we relax the assumptions of a recently proposed spectral learning algorithm for HMMs (Hsu, Kakade and Zhang 2009) and apply it to learn k-dimensional observable representations of rank-k RR-HMMs. The algorithm is consistent and free of local optima, and we extend its performance guarantees to cover the RR-HMM case. We show how this algorithm can be used in conjunction with a kernel density estimator to efficiently model high-dimensional multivariate continuous data. We also relax the assumption that single observations are sufficient to disambiguate state, and extend the algorithm accordingly. Experiments on synthetic data and a toy video, as well as on a difficult robot vision modeling problem, yield accurate models that compare favorably with standard alternatives in simulation quality and prediction capability.
Optimal construction of k-nearest neighbor graphs for identifying noisy clusters
Maier, Markus, Hein, Matthias, von Luxburg, Ulrike
We study clustering algorithms based on neighborhood graphs on a random sample of data points. The question we ask is how such a graph should be constructed in order to obtain optimal clustering results. Which type of neighborhood graph should one choose, mutual k-nearest neighbor or symmetric k-nearest neighbor? What is the optimal parameter k? In our setting, clusters are defined as connected components of the t-level set of the underlying probability distribution. Clusters are said to be identified in the neighborhood graph if connected components in the graph correspond to the true underlying clusters. Using techniques from random geometric graph theory, we prove bounds on the probability that clusters are identified successfully, both in a noise-free and in a noisy setting. Those bounds lead to several conclusions. First, k has to be chosen surprisingly high (rather of the order n than of the order log n) to maximize the probability of cluster identification. Secondly, the major difference between the mutual and the symmetric k-nearest neighbor graph occurs when one attempts to detect the most significant cluster only.
Industrial-Strength Formally Certified SAT Solving
Darbari, Ashish, Fischer, Bernd, Marques-Silva, Joao
Boolean Satisfiability (SAT) solvers are now routinely used in the verification of large industrial problems. However, their application in safety-critical domains such as the railways, avionics, and automotive industries requires some form of assurance for the results, as the solvers can (and sometimes do) have bugs. Unfortunately, the complexity of modern, highly optimized SAT solvers renders impractical the development of direct formal proofs of their correctness. This paper presents an alternative approach where an untrusted, industrial-strength, SAT solver is plugged into a trusted, formally certified, SAT proof checker to provide industrial-strength certified SAT solving. The key novelties and characteristics of our approach are (i) that the checker is automatically extracted from the formal development, (ii), that the combined system can be used as a standalone executable program independent of any supporting theorem prover, and (iii) that the checker certifies any SAT solver respecting the agreed format for satisfiability and unsatisfiability claims. The core of the system is a certified checker for unsatisfiability claims that is formally designed and verified in Coq. We present its formal design and outline the correctness proofs. The actual standalone checker is automatically extracted from the the Coq development. An evaluation of the certified checker on a representative set of industrial benchmarks from the SAT Race Competition shows that, albeit it is slower than uncertified SAT checkers, it is significantly faster than certified checkers implemented on top of an interactive theorem prover.
Multi-Way, Multi-View Learning
Huopaniemi, Ilkka, Suvitaival, Tommi, Nikkilรค, Janne, Oreลกiฤ, Matej, Kaski, Samuel
We extend multi-way, multivariate ANOVA-type analysis to cases where one covariate is the view, with features of each view coming from different, high-dimensional domains. The different views are assumed to be connected by having paired samples; this is a common setup in recent bioinformatics experiments, of which we analyze metabolite profiles in different conditions (disease vs. control and treatment vs. untreated) in different tissues (views). We introduce a multi-way latent variable model for this new task, by extending the generative model of Bayesian canonical correlation analysis (CCA) both to take multi-way covariate information into account as population priors, and by reducing the dimensionality by an integrated factor analysis that assumes the metabolites to come in correlated groups.
Composite Binary Losses
Reid, Mark D., Williamson, Robert C.
We study losses for binary classification and class probability estimation and extend the understanding of them from margin losses to general composite losses which are the composition of a proper loss with a link function. We characterise when margin losses can be proper composite losses, explicitly show how to determine a symmetric loss in full from half of one of its partial losses, introduce an intrinsic parametrisation of composite binary losses and give a complete characterisation of the relationship between proper losses and ``classification calibrated'' losses. We also consider the question of the ``best'' surrogate binary loss. We introduce a precise notion of ``best'' and show there exist situations where two convex surrogate losses are incommensurable. We provide a complete explicit characterisation of the convexity of composite binary losses in terms of the link function and the weight function associated with the proper loss which make up the composite loss. This characterisation suggests new ways of ``surrogate tuning''. Finally, in an appendix we present some new algorithm-independent results on the relationship between properness, convexity and robustness to misclassification noise for binary losses and show that all convex proper losses are non-robust to misclassification noise.
Variational Inducing Kernels for Sparse Convolved Multiple Output Gaussian Processes
รlvarez, Mauricio A., Luengo, David, Titsias, Michalis K., Lawrence, Neil D.
Interest in multioutput kernel methods is increasing, whether under the guise of multitask learning, multisensor networks or structured output data. From the Gaussian process perspective a multioutput Mercer kernel is a covariance function over correlated output functions. One way of constructing such kernels is based on convolution processes (CP). A key problem for this approach is efficient inference. Alvarez and Lawrence (2009) recently presented a sparse approximation for CPs that enabled efficient inference. In this paper, we extend this work in two directions: we introduce the concept of variational inducing functions to handle potential non-smooth functions involved in the kernel CP construction and we consider an alternative approach to approximate inference based on variational methods, extending the work by Titsias (2009) to the multiple output case. We demonstrate our approaches on prediction of school marks, compiler performance and financial time series.
New Generalization Bounds for Learning Kernels
Cortes, Corinna, Mohri, Mehryar, Rostamizadeh, Afshin
This paper presents several novel generalization bounds for the problem of learning kernels based on the analysis of the Rademacher complexity of the corresponding hypothesis sets. Our bound for learning kernels with a convex combination of p base kernels has only a log(p) dependency on the number of kernels, p, which is considerably more favorable than the previous best bound given for the same problem. We also give a novel bound for learning with a linear combination of p base kernels with an L_2 regularization whose dependency on p is only in p^{1/4}.
On Backtracking in Real-time Heuristic Search
Bulitko, Valeriy K., Bulitko, Vadim
Real-time heuristic search algorithms are suitable for situated agents that need to make their decisions in constant time. Since the original work by Korf nearly two decades ago, numerous extensions have been suggested. One of the most intriguing extensions is the idea of backtracking wherein the agent decides to return to a previously visited state as opposed to moving forward greedily. This idea has been empirically shown to have a significant impact on various performance measures. The studies have been carried out in particular empirical testbeds with specific real-time search algorithms that use backtracking. Consequently, the extent to which the trends observed are characteristic of backtracking in general is unclear. In this paper, we present the first entirely theoretical study of backtracking in real-time heuristic search. In particular, we present upper bounds on the solution cost exponential and linear in a parameter regulating the amount of backtracking. The results hold for a wide class of real-time heuristic search algorithms that includes many existing algorithms as a small subclass.
Condition Number Analysis of Kernel-based Density Ratio Estimation
Kanamori, Takafumi, Suzuki, Taiji, Sugiyama, Masashi
The ratio of two probability densities can be used for solving various machine learning tasks such as covariate shift adaptation (importance sampling), outlier detection (likelihood-ratio test), and feature selection (mutual information). Recently, several methods of directly estimating the density ratio have been developed, e.g., kernel mean matching, maximum likelihood density ratio estimation, and least-squares density ratio fitting. In this paper, we consider a kernelized variant of the least-squares method and investigate its theoretical properties from the viewpoint of the condition number using smoothed analysis techniques--the condition number of the Hessian matrix determines the convergence rate of optimization and the numerical stability. We show that the kernel least-squares method has a smaller condition number than a version of kernel mean matching and other M-estimators, implying that the kernel least-squares method has preferable numerical properties. We further give an alternative formulation of the kernel least-squares estimator which is shown to possess an even smaller condition number. We show that numerical studies meet our theoretical analysis.