Genre
Revealing the Autonomous System Taxonomy: The Machine Learning Approach
Dimitropoulos, Xenofontas, Krioukov, Dmitri, Riley, George, claffy, kc
Although the Internet AS-level topology has been extensively studied over the past few years, little is known about the details of the AS taxonomy. An AS "node" can represent a wide variety of organizations, e.g., large ISP, or small private business, university, with vastly different network characteristics, external connectivity patterns, network growth tendencies, and other properties that we can hardly neglect while working on veracious Internet representations in simulation environments. In this paper, we introduce a radically new approach based on machine learning techniques to map all the ASes in the Internet into a natural AS taxonomy. We successfully classify 95.3% of ASes with expected accuracy of 78.1%. We release to the community the AS-level topology dataset augmented with: 1) the AS taxonomy information and 2) the set of AS attributes we used to classify ASes. We believe that this dataset will serve as an invaluable addition to further understanding of the structure and evolution of the Internet.
Semi-Supervised Learning -- A Statistical Physics Approach
Getz, Gad, Shental, Noam, Domany, Eytan
We present a novel approach to semi-supervised learning which is based on statistical physics. Most of the former work in the field of semi-supervised learning classifies the points by minimizing a certain energy function, which corresponds to a minimal k-way cut solution. In contrast to these methods, we estimate the distribution of classifications, instead of the sole minimal k-way cut, which yields more accurate and robust results. Our approach may be applied to all energy functions used for semi-supervised learning. The method is based on sampling using a Mul-ticanonical Markov chain Monte-Carlo algorithm, and has a straightforward probabilistic interpretation, which allows for soft assignments of points to classes, and also to cope with yet unseen class types. The suggested approach is demonstrated on a toy data set and on two real-life data sets of gene expression.
Can an Organism Adapt Itself to Unforeseen Circumstances?
A model of an organism as an au tonomous intelligent system has been proposed. This model was used to analyz e learning of an organism in various environmental conditions. Processes of learning were divided into two types: strong and weak processes taking place in the absence an d the presence of aprioristic information about an object respectively. Weak lear ning is synonymous to adaptation when aprioristic programs already available in a system (an organism) are started. It was shown that strong learning is impossible fo r both an organism and any autonomous intelligent system. It was shown also that the knowledge base of an organism cannot be updated. Therefore, all behavior programs of an organism are congenital. A model of a conditioned reflex as a series of consecutive measurements of environmental parameters has been advanced. Repeated measurements are necessary in this case to reduce the error during decision making.
If a tree casts a shadow is it telling the time?
Physical processes are computations only when we use them to externalize thought. Computation is the performance of one or more fixed processes within a contingent environment. We reformula te the Church-Turing thesis so that it applies to programs rather than to c omputability. When suitably formulated agent-based computing in an open, multi-scalar environment represents the current consensus view of how we interact with the world. But we don't know how to formulate multi-scalar environments. Keywords: agents, agent-based, agent-based computation, Church-Turing thesis, Church's thesis, computing, computation, envir onment ideas, interaction, interactive computation, models, multi-scalar envir onment, thought, thought tools, unconventional computation.
Approximation Algorithms for K-Modes Clustering
In this paper, we study clustering with respect to the k-modes objective function, a natural formulation of clustering for categorical data. One of the main contributions of this paper is to establish the connection between k-modes and k-median, i.e., the optimum of k-median is at most twice the optimum of k-modes for the same categorical data clustering problem. Based on this observation, we derive a deterministic algorithm that achieves an approximation factor of 2. Furthermore, we prove that the distance measure in k-modes defines a metric. Hence, we are able to extend existing approximation algorithms for metric k-median to k-modes. Empirical results verify the superiority of our method.
Asymptotic Learnability of Reinforcement Problems with Arbitrary Dependence
Ryabko, Daniil, Hutter, Marcus
We address the problem of reinforcement learning in which observations may exhibit an arbitrary form of stochastic dependence on past observations and actions. The task for an agent is to attain the best possible asymptotic reward where the true generating environment is unknown but belongs to a known countable family of environments. We find some sufficient conditions on the class of environments under which an agent exists which attains the best asymptotic reward for any environment in the class. We analyze how tight these conditions are and how they relate to different probabilistic assumptions known in reinforcement learning and related fields, such as Markov Decision Processes and mixing conditions.
Topological Grammars for Data Approximation
Gorban, A. N., Sumner, N. R., Zinovyev, A. Y.
A method of {\it topological grammars} is proposed for multidimensional data approximation. For data with complex topology we define a {\it principal cubic complex} of low dimension and given complexity that gives the best approximation for the dataset. This complex is a generalization of linear and non-linear principal manifolds and includes them as particular cases. The problem of optimal principal complex construction is transformed into a series of minimization problems for quadratic functionals. These quadratic functionals have a physically transparent interpretation in terms of elastic energy. For the energy computation, the whole complex is represented as a system of nodes and springs. Topologically, the principal complex is a product of one-dimensional continuums (represented by graphs), and the grammars describe how these continuums transform during the process of optimal complex construction. This factorization of the whole process onto one-dimensional transformations using minimization of quadratic energy functionals allow us to construct efficient algorithms.
Application of Support Vector Regression to Interpolation of Sparse Shock Physics Data Sets
Sakhanenko, Nikita A., Luger, George F., Makaruk, Hanna E., Holtkamp, David B.
Experimental physics, along with many other fields in applied and basic research, uses experiments, physical tests, and observations to gain insight into various phenomena and to validate hypotheses and models. Shock p hysics is a field that explores the response of materials to the extremes of p ressure, deformation, and temperature which are present when shock waves interact with those materials [17]. High explosive (HE) or propellant guns are often used to generate these strong shock waves. Many different diagnostic ap proaches have been used to probe these phenomena [8]. Because of the energetic nature of the shock wave drive, often a large amount of experimental equipment is destroyed during the test.
Yet Another Efficient Unification Algorithm
The unification algorithm is at the heart of the logic program ming paradigm [3]. Starting with the classic algorithm of Robinson [5], the unification algor ithm was developed to become more and more efficient [4]. Even nowadays the unification theory is sti ll under development and is receiving continuous scrutiny from the scientific community [2]. The present paper presents yet another efficient unification a lgorithm centered on a data structure called Unification Table, which borrows some ideas from the data structures used by the Warren's Abstract Machine [1]. The next paragraph presents in detail the proposed unificati on algorithm, giving the C-style pseudo code. An example of application of the algorithm take n from [1] is also presented.
Estimation of linear, non-gaussian causal models in the presence of confounding latent variables
Hoyer, Patrik O., Shimizu, Shohei, Kerminen, Antti J.
The estimation of linear causal models (also known as structural equation models) from data is a well-known problem which has received much attention in the past. Most previous work has, however, made an explicit or implicit assumption of gaussianity, limiting the identifiability of the models. We have recently shown (Shimizu et al, 2005; Hoyer et al, 2006) that for non-gaussian distributions the full causal model can be estimated in the no hidden variables case. In this contribution, we discuss the estimation of the model when confounding latent variables are present. Although in this case uniqueness is no longer guaranteed, there is at most a finite set of models which can fit the data. We develop an algorithm for estimating this set, and describe numerical simulations which confirm the theoretical arguments and demonstrate the practical viability of the approach. Full Matlab code is provided for all simulations.