Not enough data to create a plot.
Try a different view from the menu above.
Chimera: Effectively Modeling Multivariate Time Series with 2-Dimensional State Space Models
Modeling multivariate time series is a well-established problem with a wide range of applications from healthcare to financial markets. It, however, is challenging as it requires methods to (1) have high expressive power of representing complicated dependencies along the time axis to capture both long-term progression and seasonal patterns, (2) capture the inter-variate dependencies when it is informative, (3) dynamically model the dependencies of variate and time dimensions, and (4) have efficient training and inference for very long sequences. Traditional State Space Models (SSMs) are classical approaches for univariate time series modeling due to their simplicity and expressive power to represent linear dependencies. They, however, have fundamentally limited expressive power to capture non-linear dependencies, are slow in practice, and fail to model the inter-variate information flow. Despite recent attempts to improve the expressive power of SSMs by using deep structured SSMs, the existing methods are either limited to univariate time series, fail to model complex patterns (e.g., seasonal patterns), fail to dynamically model the dependencies of variate and time dimensions, and/or are input-independent. We present Chimera, an expressive variation of the 2-dimensional SSMs with careful design of parameters to maintain high expressive power while keeping the training complexity linear. Using two SSM heads with different discretization processes and input-dependent parameters, Chimera is provably able to learn long-term progression, seasonal patterns, and desirable dynamic autoregressive processes. To improve the efficiency of complex 2D recurrence, we present a fast training using a new 2-dimensional parallel selective scan. Our experimental evaluation shows the superior performance of Chimera on extensive and diverse benchmarks, including ECG and speech time series classification, long-term and short-term time series forecasting, and time series anomaly detection.
Supplemental: Training Neural Networks is NP-Hard in Fixed Dimension A Detailed Proof of NP-Hardness for Two Dimensions-axis (with x 1 = 0, we call this vertical line h
In this section we provide the omitted details to prove Theorem 1. We start by describing the precise positions of the data points in the selection gadget. Next, we need a small ɛ > 0 to be chosen later in a global context. With the precise description of the selection gadget at hand, we can proceed to proving Lemma 4. Proof of Lemma 4. First, we focus on the three vertical lines h For the following argument, compare Figure 5. Observe that f restricted to one of the three lines is a one-dimensional, continuous, piecewise linear function with at most four breakpoints. Note that the exact location of these breakpoints and the slope in the sloped segments is not implied by the nine data points considered so far.
OPUS: Occupancy Prediction Using a Sparse Set Jiabao Wang
Occupancy prediction, aiming at predicting the occupancy status within voxelized 3D environment, is quickly gaining momentum within the autonomous driving community. Mainstream occupancy prediction works first discretize the 3D environment into voxels, then perform classification on such dense grids. However, inspection on sample data reveals that the vast majority of voxels is unoccupied.
A Theory |S, Y, G) EQ [ln P (G
In this section, we provide more details of model implementation and experiment setup for reproducibility of the experimental results. B.1 Details of Model Implementation B.1.1 Details of the Prediction Model The prediction model f is implemented with a graph neural network based model. Specifically, this prediction model includes the following components: Three layers of graph convolutional network (GCN) [34] with learnable node masks. The prediction model uses negative log likelihood loss. The representation dimension is set as 32. We use Adam optimizer, set the learning rate as 0.001, weight decay as 1e 5, the training epochs as 600, dropout rate as 0.1, and batch size as 500.
CLEAR: Generative Counterfactual Explanations on Graphs
Counterfactual explanations promote explainability in machine learning models by answering the question "how should an input instance be perturbed to obtain a desired predicted label?". The comparison of this instance before and after perturbation can enhance human interpretation. Most existing studies on counterfactual explanations are limited in tabular data or image data. In this work, we study the problem of counterfactual explanation generation on graphs. A few studies have explored counterfactual explanations on graphs, but many challenges of this problem are still not well-addressed: 1) optimizing in the discrete and disorganized space of graphs; 2) generalizing on unseen graphs; and 3) maintaining the causality in the generated counterfactuals without prior knowledge of the causal model. To tackle these challenges, we propose a novel framework CLEAR which aims to generate counterfactual explanations on graphs for graph-level prediction models. Specifically, CLEAR leverages a graph variational autoencoder based mechanism to facilitate its optimization and generalization, and promotes causality by leveraging an auxiliary variable to better identify the underlying causal model.
Appendix: A Datasets
Several benchmark multi-view datasets are adopted in our experiments. There are 948 news articles covering 416 different news stories. Among them, 169 news were reported in all three sources and each news was annotated with one of six topical labels: business, health, politics, entertainment, sport, and technology. MSRC is comprised of 240 images in eight classes. We select seven classes with each class containing 30 images.
Supplementary Materials - Adaptive Online Replanning with Diffusion Models
In the supplementary, we first discuss the experimental details and hyperparameters in Section A. Next, we analyze the impact of different numbers of diffusion steps N on the replanning process in Section B, and further present the visualization in RLBench in Section C. Finally, we discuss how to compute the likelihood in Section D. In detail, our architecture comprises a temporal U-Net structure with six repeated residual networks. Each network consists of two temporal convolutions followed by GroupNorm [6], and a final Mish nonlinearity [4]. Additionally, We incorporate timestep and conditions embeddings, which are both 128-dimensional vectors produced by MLP, within each block. The probability ϵ of random actions is set to 0.03 in Stochastic Environments. The total number of diffusion steps, corresponding to the number of diffusion steps for Replan from scratch is set to 256 in Maze2D, 200 in Stochastic Environments, and 400 in RLBench.