Country
On Introspection, Metacognitive Control and Augmented Data Mining Live Cycles
We discuss metacognitive modelling as an enhancement to cognitive modelling and computing. Metacognitive control mechanisms should enable AI systems to self-reflect, reason about their actions, and to adapt to new situations. In this respect, we propose implementation details of a knowledge taxonomy and an augmented data mining life cycle which supports a live integration of obtained models.
Foundations of a Multi-way Spectral Clustering Framework for Hybrid Linear Modeling
Chen, Guangliang, Lerman, Gilad
The problem of Hybrid Linear Modeling (HLM) is to model and segment data using a mixture of affine subspaces. Different strategies have been proposed to solve this problem, however, rigorous analysis justifying their performance is missing. This paper suggests the Theoretical Spectral Curvature Clustering (TSCC) algorithm for solving the HLM problem, and provides careful analysis to justify it. The TSCC algorithm is practically a combination of Govindu's multi-way spectral clustering framework (CVPR 2005) and Ng et al.'s spectral clustering algorithm (NIPS 2001). The main result of this paper states that if the given data is sampled from a mixture of distributions concentrated around affine subspaces, then with high sampling probability the TSCC algorithm segments well the different underlying clusters. The goodness of clustering depends on the within-cluster errors, the between-clusters interaction, and a tuning parameter applied by TSCC. The proof also provides new insights for the analysis of Ng et al. (NIPS 2001).
Differential Privacy with Compression
Zhou, Shuheng, Ligett, Katrina, Wasserman, Larry
This work studies formal utility and privacy guarantees for a simple multiplicative database transformation, where the data are compressed by a random linear or affine transformation, reducing the number of data records substantially, while preserving the number of original input variables. We provide an analysis framework inspired by a recent concept known as differential privacy (Dwork 06). Our goal is to show that, despite the general difficulty of achieving the differential privacy guarantee, it is possible to publish synthetic data that are useful for a number of common statistical learning applications. This includes high dimensional sparse regression (Zhou et al. 07), principal component analysis (PCA), and other statistical measures (Liu et al. 06) based on the covariance of the initial data.
Logical Algorithms meets CHR: A meta-complexity result for Constraint Handling Rules with rule priorities
This paper investigates the relationship between the Logical Algorithms language (LA) of Ganzinger and McAllester and Constraint Handling Rules (CHR). We present a translation schema from LA to CHR-rp: CHR with rule priorities, and show that the meta-complexity theorem for LA can be applied to a subset of CHR-rp via inverse translation. Inspired by the high-level implementation proposal for Logical Algorithm by Ganzinger and McAllester and based on a new scheduling algorithm, we propose an alternative implementation for CHR-rp that gives strong complexity guarantees and results in a new and accurate meta-complexity theorem for CHR-rp. It is furthermore shown that the translation from Logical Algorithms to CHR-rp combined with the new CHR-rp implementation, satisfies the required complexity for the Logical Algorithms meta-complexity result to hold.
A nonclassical symbolic theory of working memory, mental computations, and mental set
The paper tackles four basic questions associated with human brain as a learning system. How can the brain learn to (1) mentally simulate different external memory aids, (2) perform, in principle, any mental computations using imaginary memory aids, (3) recall the real sensory and motor events and synthesize a combinatorial number of imaginary events, (4) dynamically change its mental set to match a combinatorial number of contexts? We propose a uniform answer to (1)-(4) based on the general postulate that the human neocortex processes symbolic information in a "nonclassical" way. Instead of manipulating symbols in a read/write memory, as the classical symbolic systems do, it manipulates the states of dynamical memory representing different temporary attributes of immovable symbolic structures stored in a long-term memory. The approach is formalized as the concept of E-machine. Intuitively, an E-machine is a system that deals mainly with characteristic functions representing subsets of memory pointers rather than the pointers themselves. This nonclassical symbolic paradigm is Turing universal, and, unlike the classical one, is efficiently implementable in homogeneous neural networks with temporal modulation topologically resembling that of the neocortex.
A Step Forward in Studying the Compact Genetic Algorithm
One of the most famous optimization procedures for combinatorial optimization is the Genetic Algorithm (GA). By maintaining a population of solutions, the GA can be viewed as implicitly modeling of the solutions seen in the search process. In the standard GA, new solutions are generated by applying randomized recombination operators on two or more high-quality individuals of the current population (Goldberg, 1989). These recombination operators, such as one-point, two-point or uniform crossover, randomly select non-overlapping subsets of two "parent" solutions to form "children" solutions. By using a crossover operator that preserves groups of parameters from parents to children, the GA attempts to capture dependencies between the parameters implicitly. The poor behavior of genetic algorithms in some problems, sometimes attributed to designed operators, has led to the development of other types of algorithms. The Probabilistic Model Building Genetic Algorithms (PMBGAs) or Estimation of Distribution Algorithms (EDAs) are a class of algorithms which has been developed recently to preserve the building blocks (Larranaga and Lozano, 2001). The principal concept in this new technique is to prevent the disruption of partial solutions contained in a solution by building a probabilistic model.
Time-Varying Networks: Recovering Temporally Rewiring Genetic Networks During the Life Cycle of Drosophila melanogaster
Ahmed, Amr, Song, Le, Xing, Eric P.
Due to the dynamic nature of biological systems, biological networks underlying temporal process such as the development of {\it Drosophila melanogaster} can exhibit significant topological changes to facilitate dynamic regulatory functions. Thus it is essential to develop methodologies that capture the temporal evolution of networks, which make it possible to study the driving forces underlying dynamic rewiring of gene regulation circuity, and to predict future network structures. Using a new machine learning method called Tesla, which builds on a novel temporal logistic regression technique, we report the first successful genome-wide reverse-engineering of the latent sequence of temporally rewiring gene networks over more than 4000 genes during the life cycle of \textit{Drosophila melanogaster}, given longitudinal gene expression measurements and even when a single snapshot of such measurement resulted from each (time-specific) network is available. Our methods offer the first glimpse of time-specific snapshots and temporal evolution patterns of gene networks in a living organism during its full developmental course. The recovered networks with this unprecedented resolution chart the onset and duration of many gene interactions which are missed by typical static network analysis, and are suggestive of a wide array of other temporal behaviors of the gene network over time not noticed before.
Information, Divergence and Risk for Binary Experiments
Reid, Mark D., Williamson, Robert C.
We unify f-divergences, Bregman divergences, surrogate loss bounds (regret bounds), proper scoring rules, matching losses, cost curves, ROC-curves and information. We do this by systematically studying integral and variational representations of these objects and in so doing identify their primitives which all are related to cost-sensitive binary classification. As well as clarifying relationships between generative and discriminative views of learning, the new machinery leads to tight and more general surrogate loss bounds and generalised Pinsker inequalities relating f-divergences to variational divergence. The new viewpoint illuminates existing algorithms: it provides a new derivation of Support Vector Machines in terms of divergences and relates Maximum Mean Discrepancy to Fisher Linear Discriminants. It also suggests new techniques for estimating f-divergences.
Thoughts on an Unified Framework for Artificial Chemistries
Artificial Chemistries (ACs) are symbolic chemical metaphors for the exploration of Artificial Life, with specific focus on the problem of biogenesis or the origin of life. This paper presents authors thoughts towards defining a unified framework to characterize and classify symbolic artificial chemistries by devising appropriate formalism to capture semantic and organizational information. We identify three basic high level abstractions in initial proposal for this framework viz., information, computation, and communication. We present an analysis of two important notions of information, namely, Shannon's Entropy and Algorithmic Information, and discuss inductive and deductive approaches for defining the framework. Work done when author was in NUS (2002-2005).