Goto

Collaborating Authors

 Minimum Complexity Machines


Efficient pattern-based anomaly detection in a network of multivariate devices

arXiv.org Artificial Intelligence

Many organisations manage service quality and monitor a large set devices and servers where each entity is associated with telemetry or physical sensor data series. Recently, various methods have been proposed to detect behavioural anomalies, however existing approaches focus on multivariate time series and ignore communication between entities. Moreover, we aim to support end-users in not only in locating entities and sensors causing an anomaly at a certain period, but also explain this decision. We propose a scalable approach to detect anomalies using a two-step approach. First, we recover relations between entities in the network, since relations are often dynamic in nature and caused by an unknown underlying process. Next, we report anomalies based on an embedding of sequential patterns. Pattern mining is efficient and supports interpretation, i.e. patterns represent frequent occurring behaviour in time series. We extend pattern mining to filter sequential patterns based on frequency, temporal constraints and minimum description length. We collect and release two public datasets for international broadcasting and X from an Internet company. \textit{BAD} achieves an overall F1-Score of 0.78 on 9 benchmark datasets, significantly outperforming the best baseline by 3\%. Additionally, \textit{BAD} is also an order-of-magnitude faster than state-of-the-art anomaly detection methods.


Why Oatmeal is Cheap: Kolmogorov Complexity and Procedural Generation

arXiv.org Artificial Intelligence

The Game Developer's Conference, the largest event in the games industry, has hosted over 50 talks in the last decade about procedural generation, from small-scale independent speakers to large AAA companies, covering disciplines from programming to art to writing. Correspondingly, procedural generation has been an increasingly hot topic among game AI researchers in the last two decades. The Procedural Generation Workshop at FDG, now in its twelfth year, is one of the longest-running workshops in the field of game AI, and dedicated paper tracks at conferences are a regular occurrence. Despite the huge importance of content generation, and the wealth of time invested into developing practical techniques, the analysis of procedural generators is a relatively underdeveloped area of study. A few notable techniques have emerged over the last two decades of research [7, 8], as well as studies of efficacy [4, 9], but many of the techniques used by game researchers have changed little in that time. As a result, a lot of procedural generation work is done by'feel', with postmortems shared at events such as the Roguelike Celebration


Technical Perspective: Finding Connections between One-Way Functions and Kolmogorov Complexity

Communications of the ACM

Cryptography requires "useful" sources of computational hardness for most of its constructs. For example, in the classic setting of encryption schemes, decryption should be easy when given an appropriate decryption key, while it must be infeasible without it. Fortunately, the theory of computational complexity generously provides a wide variety of sources of computational hardness, but which ones may be useful for cryptography? The long-celebrated interplay between cryptography and computational complexity has been challenged constantly with understanding what "useful" hardness means, where it may be found, and how it may be utilized. This has led the cryptography community to embark on an exciting journey initiated by the pioneering work of Whitfield Diffie and Martin Hellman back in 1976.


Chaitin-Kolmogorov Complexity and Generalization in Neural Networks

Neural Information Processing Systems

We present a unified framework for a number of different ways of failing to generalize properly. The complexity of the function computed is therefore increased, and generalization is degraded. We analyze replicated networks, in which a number of identical networks are independently trained on the same data and their results averaged. We conclude that replication almost always results in a decrease in the expected complexity of the network, and that replication therefore increases expected generalization. Simulations confirming the effect are also presented.


Developing Population Codes by Minimizing Description Length

Neural Information Processing Systems

The Minimum Description Length principle (MDL) can be used to train the hidden units of a neural network to extract a representa(cid:173) tion that is cheap to describe but nonetheless allows the input to be reconstructed accurately. We show how MDL can be used to develop highly redundant population codes. Each hidden unit has a location in a low-dimensional implicit space. If the hidden unit activities form a bump of a standard shape in this space, they can be cheaply encoded by the center ofthis bump. So the weights from the input units to the hidden units in an autoencoder are trained to make the activities form a standard bump.


Autoencoders, Minimum Description Length and Helmholtz Free Energy

Neural Information Processing Systems

An autoencoder network uses a set of recognition weights to convert an input vector into a code vector. It then uses a set of generative weights to convert the code vector into an approximate reconstruction of the input vector. We derive an objective function for training autoencoders based on the Minimum Description Length (MDL) principle. The aim is to minimize the information required to describe both the code vector and the reconstruction error. We show that this information is minimized by choosing code vectors stochastically according to a Boltzmann distri(cid:173) bution, where the generative weights define the energy of each possible code vector given the input vector.


The Use of MDL to Select among Computational Models of Cognition

Neural Information Processing Systems

How should we decide among competing explanations of a cognitive process given limited observations? The problem of model selection is at the heart of progress in cognitive science. In this paper, Minimum Description Length (MDL) is introduced as a method for selecting among computational models of cognition. We also show that differential geometry provides an intuitive understanding of what drives model selection in MDL. Finally, adequacy of MDL is demonstrated in two areas of cognitive modeling.


Transfer Learning using Kolmogorov Complexity: Basic Theory and Empirical Evaluations

Neural Information Processing Systems

In transfer learning we aim to solve new problems using fewer examples using information gained from solving related problems. Transfer learning has been successful in practice, and extensive PAC analysis of these methods has been de- veloped. However it is not yet clear how to define relatedness between tasks. This is considered as a major problem as it is conceptually troubling and it makes it unclear how much information to transfer and when and how to transfer it. In this paper we propose to measure the amount of information one task contains about another using conditional Kolmogorov complexity between the tasks.


Discrete MDL Predicts in Total Variation

Neural Information Processing Systems

The Minimum Description Length (MDL) principle selects the model that has the shortest code for data plus model. We show that for a countable class of models, MDL predictions are close to the true distribution in a strong sense. The result is completely general. No independence, ergodicity, stationarity, identifiability, or other assumption on the model class need to be made. More formally, we show that for any countable class of models, the distributions selected by MDL (or MAP) asymptotically predict (merge with) the true measure in the class in total variation distance.


Universal coding, intrinsic volumes, and metric complexity

arXiv.org Machine Learning

We study sequential probability assignment in the Gaussian setting, where the goal is to predict, or equivalently compress, a sequence of real-valued observations almost as well as the best Gaussian distribution with mean constrained to a given subset of $\mathbf{R}^n$. First, in the case of a convex constraint set $K$, we express the hardness of the prediction problem (the minimax regret) in terms of the intrinsic volumes of $K$; specifically, it equals the logarithm of the Wills functional from convex geometry. We then establish a comparison inequality for the Wills functional in the general nonconvex case, which underlines the metric nature of this quantity and generalizes the Slepian-Sudakov-Fernique comparison principle for the Gaussian width. Motivated by this inequality, we characterize the exact order of magnitude of the considered functional for a general nonconvex set, in terms of global covering numbers and local Gaussian widths. This implies metric isomorphic estimates for the log-Laplace transform of the intrinsic volume sequence of a convex body. As part of our analysis, we also characterize the minimax redundancy for a general constraint set. We finally relate and contrast our findings with classical asymptotic results in information theory.