Not enough data to create a plot.
Try a different view from the menu above.
Püschel, Markus
Learning DAGs and Root Causes from Time-Series Data
Misiakos, Panagiotis, Püschel, Markus
Many applications produce time-series data: multi-dimensional data measured in regular time steps. Examples include temperature measurements at different sites in meteorology [Yang et al., 2022], stock prices in finance [Kleinberg, 2013, Jiang and Shimizu, 2023], and brain data in medicine [Smith et al., 2011]. A key problem in analyzing time-series data is causal structure discovery, which aims to understand the generation mechanism of such data between nodes and across time [Assaad et al., 2022b, Runge et al., 2023, Gong et al., 2023, Hasan et al., 2023]. On common structural model associates time-series data with directed acyclic graphs (DAGs) that encode how the data in one time step is obtained from prior ones. Our work specifically focuses on learning these DAGs from time-series data [Sun et al., 2023, Gao et al., 2022, Pamfil et al., 2020]. This approach simplifies the broader problem of causal discovery by abstracting away the need for true causal relationships, which often require techniques like interventions. Despite this simplification, DAG learning from time series still poses a challenge due to the complexity of temporal dependencies and the high dimensionality of data.
Causal Fourier Analysis on Directed Acyclic Graphs and Posets
Seifert, Bastian, Wendler, Chris, Püschel, Markus
We present a novel form of Fourier analysis, and associated signal processing concepts, for signals (or data) indexed by edge-weighted directed acyclic graphs (DAGs). This means that our Fourier basis yields an eigendecomposition of a suitable notion of shift and convolution operators that we define. DAGs are the common model to capture causal relationships between data values and in this case our proposed Fourier analysis relates data with its causes under a linearity assumption that we define. The definition of the Fourier transform requires the transitive closure of the weighted DAG for which several forms are possible depending on the interpretation of the edge weights. Examples include level of influence, distance, or pollution distribution. Our framework is different from prior GSP: it is specific to DAGs and leverages, and extends, the classical theory of Moebius inversion from combinatorics. For a prototypical application we consider DAGs modeling dynamic networks in which edges change over time. Specifically, we model the spread of an infection on such a DAG obtained from real-world contact tracing data and learn the infection signal from samples assuming sparsity in the Fourier domain.
QIGen: Generating Efficient Kernels for Quantized Inference on Large Language Models
Pegolotti, Tommaso, Frantar, Elias, Alistarh, Dan, Püschel, Markus
Our approach is informed by the target architecture Focusing specifically on generative inference, where the and a performance model, including both hardware size of the weights is the main bottleneck, the currently bestperforming characteristics and method-specific accuracy method is GPTQ (Frantar et al., 2022), which constraints. Results on CPU-based inference achieves near-lossless quantization to 4-bit weights, and can for LLaMA models show that our approach even accurately support 2 and 3-bit weights by reducing the can lead to high performance and high accuracy, granularity to smaller weight groups, e.g., by jointly quantizing comparing favorably to the best existing blocks of 64 weights using a shared scale and zero-point.
Learning DAGs from Data with Few Root Causes
Misiakos, Panagiotis, Wendler, Chris, Püschel, Markus
We present a novel perspective and algorithm for learning directed acyclic graphs (DAGs) from data generated by a linear structural equation model (SEM). First, we show that a linear SEM can be viewed as a linear transform that, in prior work, computes the data from a dense input vector of random valued root causes (as we will call them) associated with the nodes. Instead, we consider the case of (approximately) few root causes and also introduce noise in the measurement of the data. Intuitively, this means that the DAG data is produced by few data-generating events whose effect percolates through the DAG. We prove identifiability in this new setting and show that the true DAG is the global minimizer of the $L^0$-norm of the vector of root causes. For data with few root causes, with and without noise, we show superior performance compared to prior DAG learning methods.
Precise Multi-Neuron Abstractions for Neural Network Certification
Müller, Mark Niklas, Makarchuk, Gleb, Singh, Gagandeep, Püschel, Markus, Vechev, Martin
Formal verification of neural networks is critical for their safe adoption in real-world applications. However, designing a verifier which can handle realistic networks in a precise manner remains an open and difficult challenge. In this paper, we take a major step in addressing this challenge and present a new framework, called PRIMA, that computes precise convex approximations of arbitrary non-linear activations. PRIMA is based on novel approximation algorithms that compute the convex hull of polytopes, leveraging concepts from computational geometry. The algorithms have polynomial complexity, yield fewer constraints, and minimize precision loss. We evaluate the effectiveness of PRIMA on challenging neural networks with ReLU, Sigmoid, and Tanh activations. Our results show that PRIMA is significantly more precise than the state-of-the-art, verifying robustness for up to 16%, 30%, and 34% more images than prior work on ReLU-, Sigmoid-, and Tanh-based networks, respectively.
Learning Set Functions that are Sparse in Non-Orthogonal Fourier Bases
Wendler, Chris, Amrollahi, Andisheh, Seifert, Bastian, Krause, Andreas, Püschel, Markus
Many applications of machine learning on discrete domains, such as learning preference functions in recommender systems or auctions, can be reduced to estimating a set function that is sparse in the Fourier domain. In this work, we present a new family of algorithms for learning Fourier-sparse set functions. They require at most $nk - k \log_2 k + k$ queries (set function evaluations), under mild conditions on the Fourier coefficients, where $n$ is the size of the ground set and $k$ the number of non-zero Fourier coefficients. In contrast to other work that focused on the orthogonal Walsh-Hadamard transform, our novel algorithms operate with recently introduced non-orthogonal Fourier transforms that offer different notions of Fourier-sparsity. These naturally arise when modeling, e.g., sets of items forming substitutes and complements. We demonstrate effectiveness on several real-world applications.
Neural Network Robustness Verification on GPUs
Müller, Christoph, Singh, Gagandeep, Püschel, Markus, Vechev, Martin
Certifying the robustness of neural networks against adversarial attacks is critical to their reliable adoption in real-world systems including autonomous driving and medical diagnosis. Unfortunately, state-of-the-art verifiers either do not scale to larger networks or are too imprecise to prove robustness, which limits their practical adoption. In this work, we introduce GPUPoly, a scalable verifier that can prove the robustness of significantly larger deep neural networks than possible with prior work. The key insight behind GPUPoly is the design of custom, sound polyhedra algorithms for neural network verification on a GPU. Our algorithms leverage the available GPU parallelism and the inherent sparsity of the underlying neural network verification task. GPUPoly scales to very large networks: for example, it can prove the robustness of a 1M neuron, 34-layer deep residual network in about 1 minute. We believe GPUPoly is a promising step towards the practical verification of large real-world networks.
Powerset Convolutional Neural Networks
Wendler, Chris, Alistarh, Dan, Püschel, Markus
We present a novel class of convolutional neural networks (CNNs) for set functions, i.e., data indexed with the powerset of a finite set. The convolutions are derived as linear, shift-equivariant functions for various notions of shifts on set functions. The framework is fundamentally different from graph convolutions based on the Laplacian, as it provides not one but several basic shifts, one for each element in the ground set. Prototypical experiments with several set function classification tasks on synthetic datasets and on datasets derived from real-world hypergraphs demonstrate the potential of our new powerset CNNs.
On Linear Learning with Manycore Processors
Wszola, Eliza, Dünner, Celestine, Jaggi, Martin, Püschel, Markus
A new generation of manycore processors is on the rise that offers dozens and more cores on a chip and, in a sense, fuses host processor and accelerator. In this paper we target the efficient training of generalized linear models on these machines. We propose a novel approach for achieving parallelism which we call Heterogeneous Tasks on Homogeneous Cores (HTHC). It divides the problem into multiple fundamentally different tasks, which themselves are parallelized. For evaluation, we design a detailed, architecture-cognizant implementation of our scheme on a recent 72-core Knights Landing processor that is adaptive to the cache, memory, and core structure. Experiments for Lasso and SVM with different data sets show a speedup of typically an order of magnitude compared to straightforward parallel implementations in C++.
Fast and Effective Robustness Certification
Singh, Gagandeep, Gehr, Timon, Mirman, Matthew, Püschel, Markus, Vechev, Martin
We present a new method and system, called DeepZ, for certifying neural network robustness based on abstract interpretation. Compared to state-of-the-art automated verifiers for neural networks, DeepZ: (i) handles ReLU, Tanh and Sigmoid activation functions, (ii) supports feedforward, convolutional, and residual architectures, (iii) is significantly more scalable and precise, and (iv) and is sound with respect to floating point arithmetic. These benefits are due to carefully designed approximations tailored to the setting of neural networks. As an example, DeepZ achieves a verification accuracy of 97% on a large network with 88, 500 hidden units under L attack with ɛ 0.1 with an average runtime of 133 seconds.