Country
Geometry and Topology of Deep Neural Networks' Decision Boundaries
Geometry and topology of decision regions are closely related with classification performance and robustness against adversarial attacks. In this paper, we use differential geometry and topology to explore theoretically the geometrical and topological properties of decision regions produced by deep neural networks (DNNs). The goals are to obtain some geometrical and topological properties of decision regions for given DNN models, and provide some principled guidances to designing and regularizing DNNs. At first, we give the curvatures of decision boundaries in terms of network weights. Based on the rotation index theorem and Gauss-Bonnet-Chern theorem, we then propose methods to identify the closeness and connectivity of given decision boundaries, and obtain the Euler characteristics of closed ones, all without the need to solve decision boundaries explicitly. Finally, we give necessary conditions on network architectures in order to produce closed decision boundaries, and sufficient conditions on network weights for producing zero curvature (flat or developable) decision boundaries.
Discovering contemporaneous and lagged causal relations in autocorrelated nonlinear time series datasets
We consider causal discovery from time series using conditional independence (CI) based network learning algorithms such as the PC algorithm. The PC algorithm is divided into a skeleton phase where adjacencies are determined based on efficiently selected CI tests and subsequent phases where links are oriented utilizing the Markov and Faithfulness assumptions. Here we show that autocorrelation makes the PC algorithm much less reliable with very low adjacency and orientation detection rates and inflated false positives. We propose a new algorithm, called PCMCI$^+$ that extends the PCMCI method from [Runge et al., 2019b] to also include discovery of contemporaneous links. It separates the skeleton phase for lagged and contemporaneous conditioning sets and modifies the conditioning sets for the individual CI tests. We show that this algorithm now benefits from increasing autocorrelation and yields much more adjacency detection power and especially more orientation recall for contemporaneous links while controlling false positives and having much shorter runtimes. Numerical experiments indicate that the algorithm can be of considerable use in many application scenarios for dozens of variables and large time delays.
Modeling of Spatio-Temporal Hawkes Processes with Randomized Kernels
Ilhan, Fatih, Kozat, Suleyman Serdar
We investigate spatio-temporal event analysis using point processes. Inferring the dynamics of event sequences spatiotemporally has many practical applications including crime prediction, social media analysis, and traffic forecasting. In particular, we focus on spatio-temporal Hawkes processes that are commonly used due to their capability to capture excitations between event occurrences. We introduce a novel inference framework based on randomized transformations and gradient descent to learn the process. We replace the spatial kernel calculations by randomized Fourier feature-based transformations. The introduced randomization by this representation provides flexibility while modeling the spatial excitation between events. Moreover, the system described by the process is expressed within closed-form in terms of scalable matrix operations. During the optimization, we use maximum likelihood estimation approach and gradient descent while properly handling positivity and orthonormality constraints. The experiment results show the improvements achieved by the introduced method in terms of fitting capability in synthetic and real datasets with respect to the conventional inference methods in the spatio-temporal Hawkes process literature. We also analyze the triggering interactions between event types and how their dynamics change in space and time through the interpretation of learned parameters.
High-dimensional, multiscale online changepoint detection
Chen, Yudong, Wang, Tengyao, Samworth, Richard J.
Modern technology has not only allowed the collection of data sets of unprecedented size, but has also facilitated the real-time monitoring of many types of evolving processes of interest. Wearable health devices, astronomical survey telescopes, self-driving cars and transport network load-tracking systems are just a few examples of new technologies that collect large quantities of streaming data, and that provide new challenges and opportunities for statisticians. Very often, a key feature of interest in the monitoring of a data stream is a changepoint; that is, a moment in time at which the data generating mechanism undergoes a change. Such times often represent events of interest, e.g. a change in heart function, and moreover, the accurate identification of changepoints often facilitates the decomposition of a data stream into stationary segments. Historically, it has tended to be univariate time series that have been monitored and studied, within the well-established field of statistical process control (e.g.
Prediction with Spatio-temporal Point Processes with Self Organizing Decision Trees
Karaahmetoglu, Oguzhan, Kozat, Suleyman Serdar
We study the spatio-temporal prediction problem, which has attracted attention of many researchers due to its critical real-life applications. In particular, we introduce a novel approach to this problem. Our approach is based on the Hawkes process, which is a non-stationary and self-exciting point process. We extend the formulations of a standard point process model that can represent time-series data to represent a spatio-temporal data. We model the data as nonstationary in time and space. Furthermore, we partition the spatial region we are working on into subregions via an adaptive decision tree and model the source statistics in each subregion with individual but mutually interacting point processes. We also provide a gradient based joint optimization algorithm for the point process and decision tree parameters. Thus, we introduce a model that can jointly infer the source statistics and an adaptive partitioning of the spatial region. Finally, we provide experimental results on a real-life data, which provides significant improvement due to space adaptation and joint optimization compared to standard well-known methods in the literature.
AL2: Progressive Activation Loss for Learning General Representations in Classification Neural Networks
Helou, Majed El, Dümbgen, Frederike, Süsstrunk, Sabine
The large capacity of neural networks enables them to learn complex functions. To avoid overfitting, networks however require a lot of training data that can be expensive and time-consuming to collect. A common practical approach to attenuate overfitting is the use of network regularization techniques. We propose a novel regularization method that progressively penalizes the magnitude of activations during training. The combined activation signals produced by all neurons in a given layer form the representation of the input image in that feature space. We propose to regularize this representation in the last feature layer before classification layers. Our method's effect on generalization is analyzed with label randomization tests and cumulative ablations. Experimental results show the advantages of our approach in comparison with commonly-used regularizers on standard benchmark datasets.
Getting Better from Worse: Augmented Bagging and a Cautionary Tale of Variable Importance
As the size, complexity, and availability of data continues to grow, scientists are increasingly relying upon black-box learning algorithms that can often provide accurate predictions with minimal a priori model specifications. Tools like random forest have an established track record of off-the-shelf success and even offer various strategies for analyzing the underlying relationships between features and the response. Motivated by recent insights into random forest behavior, here we introduce the idea of augmented bagging (AugBagg), a procedure that operates in an identical fashion to the classical bagging and random forest counterparts but which operates on a larger space containing additional, randomly generated features. Somewhat surprisingly, we demonstrate that the simple act of adding additional random features into the model can have a dramatic beneficial effect on performance, sometimes outperforming even an optimally tuned traditional random forest. This finding that the inclusion of an additional set of features generated independently of the response can considerably improve predictive performance has crucial implications for the manner in which we consider and measure variable importance. Numerous demonstrations on both real and synthetic data are provided.
Diffusion State Distances: Multitemporal Analysis, Fast Algorithms, and Applications to Biological Networks
Cowen, Lenore, Devkota, Kapil, Hu, Xiaozhe, Murphy, James M., Wu, Kaiyi
Data-dependent metrics are powerful tools for learning the underlying structure of high-dimensional data. This article develops and analyzes a data-dependent metric known as diffusion state distance (DSD), which compares points using a data-driven diffusion process. Unlike related diffusion methods, DSDs incorporate information across time scales, which allows for the intrinsic data structure to be inferred in a parameter-free manner. This article develops a theory for DSD based on the multitemporal emergence of mesoscopic equilibria in the underlying diffusion process. New algorithms for denoising and dimension reduction with DSD are also proposed and analyzed. These approaches are based on a weighted spectral decomposition of the underlying diffusion process, and experiments on synthetic datasets and real biological networks illustrate the efficacy of the proposed algorithms in terms of both speed and accuracy. Throughout, comparisons with related methods are made, in order to illustrate the distinct advantages of DSD for datasets exhibiting multiscale structure.
Reinforcement Learning for Combinatorial Optimization: A Survey
Mazyavkina, Nina, Sviridov, Sergey, Ivanov, Sergei, Burnaev, Evgeny
Combinatorial optimization (CO) is the workhorse of numerous important applications in operations research, engineering and other fields and, thus, has been attracting enormous attention from the research community for over a century. Many efficient solutions to common problems involve using hand-crafted heuristics to sequentially construct a solution. Therefore, it is intriguing to see how a combinatorial optimization problem can be formulated as a sequential decision making process and whether efficient heuristics can be implicitly learned by a reinforcement learning agent to find a solution. This survey explores the synergy between CO and reinforcement learning (RL) framework, which can become a promising direction for solving combinatorial problems.
A machine learning environment for evaluating autonomous driving software
Hanhirova, Jussi, Debner, Anton, Hyyppä, Matias, Hirvisalo, Vesa
Autonomous vehicles need safe development and testing environments. Many traffic scenarios are such that they cannot be tested in the real world. We see hybrid photorealistic simulation as a viable tool for developing AI (artificial intelligence) software for autonomous driving. We present a machine learning environment for detecting autonomous vehicle corner case behavior. Our environment is based on connecting the CARLA simulation software to TensorFlow machine learning framework and custom AI client software. The AI client software receives data from a simulated world via virtual sensors and transforms the data into information using machine learning models. The AI clients control vehicles in the simulated world. Our environment monitors the state assumed by the vehicle AIs to the ground truth state derived from the simulation model. Our system can search for corner cases where the vehicle AI is unable to correctly understand the situation. In our paper, we present the overall hybrid simulator architecture and compare different configurations. We present performance measurements from real setups, and outline the main parameters affecting the hybrid simulator performance.