Genre
Clustering by compression
Cilibrasi, Rudi, Vitanyi, Paul
We present a new method for clustering based on compression. The method doesn't use subject-specific features or background knowledge, and works as follows: First, we determine a universal similarity distance, the normalized compression distance or NCD, computed from the lengths of compressed data files (singly and in pairwise concatenation). Second, we apply a hierarchical clustering method. The NCD is universal in that it is not restricted to a specific application area, and works across application area boundaries. A theoretical precursor, the normalized information distance, co-developed by one of the authors, is provably optimal but uses the non-computable notion of Kolmogorov complexity. We propose precise notions of similarity metric, normal compressor, and show that the NCD based on a normal compressor is a similarity metric that approximates universality. To extract a hierarchy of clusters from the distance matrix, we determine a dendrogram (binary tree) by a new quartet method and a fast heuristic to implement it. The method is implemented and available as public software, and is robust under choice of different compressors. To substantiate our claims of universality and robustness, we report evidence of successful application in areas as diverse as genomics, virology, languages, literature, music, handwritten digits, astronomy, and combinations of objects from completely different domains, using statistical, dictionary, and block sorting compressors. In genomics we presented new evidence for major questions in Mammalian evolution, based on whole-mitochondrial genomic analysis: the Eutherian orders and the Marsupionta hypothesis against the Theria hypothesis.
Fuzzy Relational Modeling of Cost and Affordability for Advanced Technology Manufacturing Environment
Kohout, Ladislav J., Kim, Eunjin, Zenz, Gary
Relational representation of knowledge makes it possible to perform all the computations and decision making in a uniform relational way by means of special relational compositions called triangle and square products. In this paper some applications in manufacturing related to cost analysis are described. Testing fuzzy relational structures for various relational properties allows us to discover dependencies, hierarchies, similarities, and equivalences of the attributes characterizing technological processes and manufactured artifacts in their relationship to costs and performance. A brief overview of mathematical aspects of BK-relational products is given in Appendix 1 together with further references in the literature.
Finding Traitors in Secure Networks Using Byzantine Agreements
Wagner, Liam, McDonald, Stuart
Secure networks rely upon players to maintain security and reliability. However not every player can be assumed to have total loyalty and one must use methods to uncover traitors in such networks. We use the original concept of the Byzantine Generals Problem by Lamport, and the more formal Byzantine Agreement describe by Linial, to nd traitors in secure networks. By applying general fault-tolerance methods to develop a more formal design of secure networks we are able to uncover traitors amongst a group of players. We also propose methods to integrate this system with insecure channels. This new resiliency can be applied to broadcast and peer-to-peer secure communication systems where agents may be traitors or become unreliable due to faults.
Goedel Machines: Self-Referential Universal Problem Solvers Making Provably Optimal Self-Improvements
We present the first class of mathematically rigorous, general, fully self-referential, self-improving, optimally efficient problem solvers. Inspired by Kurt Goedel's celebrated self-referential formulas (1931), such a problem solver rewrites any part of its own code as soon as it has found a proof that the rewrite is useful, where the problem-dependent utility function and the hardware and the entire initial code are described by axioms encoded in an initial proof searcher which is also part of the initial code. The searcher systematically and efficiently tests computable proof techniques (programs whose outputs are proofs) until it finds a provably useful, computable self-rewrite. We show that such a self-rewrite is globally optimal - no local maxima! - since the code first had to prove that it is not useful to continue the proof search for alternative self-rewrites. Unlike previous non-self-referential methods based on hardwired proof searchers, ours not only boasts an optimal order of complexity but can optimally reduce any slowdowns hidden by the O()-notation, provided the utility of such speed-ups is provable at all.
Metric learning pairwise kernel for graph inference
Vert, Jean-Philippe, Qiu, Jian, Noble, William Stafford
Much recent work in bioinformatics has focused on the inference of various types of biological networks, representing gene regulation, metabolic processes, protein-protein interactions, etc. A common setting involves inferring network edges in a supervised fashion from a set of high-confidence edges, possibly characterized by multiple, heterogeneous data sets (protein sequence, gene expression, etc.). Here, we distinguish between two modes of inference in this setting: direct inference based upon similarities between nodes joined by an edge, and indirect inference based upon similarities between one pair of nodes and another pair of nodes. We propose a supervised approach for the direct case by translating it into a distance metric learning problem. A relaxation of the resulting convex optimization problem leads to the support vector machine (SVM) algorithm with a particular kernel for pairs, which we call the metric learning pairwise kernel (MLPK). We demonstrate, using several real biological networks, that this direct approach often improves upon the state-of-the-art SVM for indirect inference with the tensor product pairwise kernel.
Topology Induced Coarsening in Language Games
Baronchelli, A., Dall'Asta, L., Barrat, A., Loreto, V.
We investigate how very large populations are able to reach a global consensus, out of local "microscopic" interaction rules, in the framework of a recently introduced class of models of semiotic dynamics, the so-called Naming Game. We compare in particular the convergence mechanism for interacting agents embedded in a low-dimensional lattice with respect to the mean-field case. We highlight that in low-dimensions consensus is reached through a coarsening process which requires less cognitive effort of the agents, with respect to the mean-field case, but takes longer to complete. In 1-d the dynamics of the boundaries is mapped onto a truncated Markov process from which we analytically computed the diffusion coefficient. More generally we show that the convergence process requires a memory per agent scaling as N and lasts a time N^{1+2/d} in dimension d<5 (d=4 being the upper critical dimension), while in mean-field both memory and time scale as N^{3/2}, for a population of N agents. We present analytical and numerical evidences supporting this picture.
Three Logistic Models for the Ecological and Economic Interactions: Symbiosis, Predator-Prey and Competition
Lopez-Ruiz, Ricardo, Fournier-Prunaret, Daniele
If one isolated species (corporation) is supposed to evolve following the logistic mapping, then we are tempted to think that the dynamics of two species (corporations) can be expressed by a coupled system of two discrete logistic equations. As three basic relationships between two species are present in Nature, namely symbiosis, predator-prey and competition, three different models are obtained. Each model is a cubic two-dimensional discrete logistic-type equation with its own dynamical properties: stationary regime, periodicit y, quasi-periodicity and chaos. We also propose that these models could be useful for thinking in the different interactions happening in the economic world, as for instance for the competition and the collaboration between corporations. Furthermore, these models could be considered as the basic ingr edients to construct more complex interactions in the ecological and economic networks.
Modeling Endogenous Social Networks: the Example of Emergence and Stability of Cooperation without Refusal
Aggregated phenomena in social sciences and economi cs are highly dependent on the way individuals interact. To help understanding the interplay betwe en socio-economic activities and underlying social networks, this paper studies a sequential prisoner's dilemma with binary choice. It proposes an analytical and computational insight about the role of endogenous networks in emergence and sustainability of cooperation and exhibits an alternative to the choice and refusal mechanism that is often proposed to explain cooperation. The study fo cuses on heterogeneous equilibriums and emergence of cooperation from an all-defector state that are the two stylized facts that this model successfully reconstructs.
Combinatorial Approach to Object Analysis
Object Analysis, from this paper point of view, is just a continuity to the already well defined Object Oriented Programming and modeling techniques, with a difference, that is, we will be looking for automated methods realizing the analysis of the object, and eventually construct an object model of a given environment -or a signal. From one hand the "Object" concept define a central point for Object's Data storage, and the functions, interfacing it to the external world, and on the other hand, the "Object" concept, threw its hierarchy, is an actual investment of "similarities" between different object forms, known as polymorphisms . Object programming has been used, with a great success, in computer science. But the thinking process, or the analysis process, generating these models, is of course nothing but intelligence; our intelligence, with its inherent complexity. In our search for an automated object-analysis capable algorithms -or machines, image processing, and more generally signal processing, are the most capable in what we know in science. To this date, image-processing science, coupled to the information processing science, do provide us with different analysis technique of the signal that can be categorized into these categories: 1.
Intensional Models for the Theory of Types
The axiom scheme of Extensionality states that whenever two predicates or relations are coextensive they must have the same propertie s: XY ( null x(Xnull x Ynull x) Z (ZX ZY)) (1) Historically Extensionality has always been problematic, the main problem being that in many areas of application, though not perhaps in t he foundations of mathematics, the statement is simply false. This was reco gnized by Whitehead and Russell in Principia Mathematica [32], where intensional functions such as ' A believes that p ' or'it is a strange coincidence that p ' are discussed at length. However, in the introduction to the second edition ( 1927) of the Prin-cipia Whitehead and Russell (influenced by Wittgenstein's Tractatus) already entertain the possibility that "all functions of functions are extensional". Thirteen years later, in Church's [6] canonical formulation of t he Theory of Types, it is observed that axioms of Extensionality should be adopt ed "[i]n order to obtain classical real number theory (analysis)", a wording that does not seem to rule out the option of not adopting them. Church's formula tion of type theory was completely syntactic and axioms could be adopted or d ropped at will, The Journal of Symbolic Logic, to appear.