Azencot, Omri
A Multi-Task Learning Approach to Linear Multivariate Forecasting
Nochumsohn, Liran, Zisling, Hedi, Azencot, Omri
Accurate forecasting of multivariate time series data is important in many engineering and scientific applications. Recent state-of-the-art works ignore the inter-relations between variates, using their model on each variate independently. This raises several research questions related to proper modeling of multivariate data. In this work, we propose to view multivariate forecasting as a multi-task learning problem, facilitating the analysis of forecasting by considering the angle between task gradients and their balance. To do so, we analyze linear models to characterize the behavior of tasks. Our analysis suggests that tasks can be defined by grouping similar variates together, which we achieve via a simple clustering that depends on correlation-based similarities. Moreover, to balance tasks, we scale gradients with respect to their prediction error. Then, each task is solved with a linear model within our MTLinear framework. We evaluate our approach on challenging benchmarks in comparison to strong baselines, and we show it obtains on-par or better results on multivariate forecasting problems. The implementation is available at: https://github.com/azencot-group/MTLinear
Beyond Data Scarcity: A Frequency-Driven Framework for Zero-Shot Forecasting
Nochumsohn, Liran, Moshkovitz, Michal, Avner, Orly, Di Castro, Dotan, Azencot, Omri
Time series forecasting is critical in numerous real-world applications, requiring accurate predictions of future values based on observed patterns. While traditional forecasting techniques work well in in-domain scenarios with ample data, they struggle when data is scarce or not available at all, motivating the emergence of zero-shot and few-shot learning settings. Recent advancements often leverage large-scale foundation models for such tasks, but these methods require extensive data and compute resources, and their performance may be hindered by ineffective learning from the available training set. This raises a fundamental question: What factors influence effective learning from data in time series forecasting? Toward addressing this, we propose using Fourier analysis to investigate how models learn from synthetic and real-world time series data. Our findings reveal that forecasters commonly suffer from poor learning from data with multiple frequencies and poor generalization to unseen frequencies, which impedes their predictive performance. To alleviate these issues, we present a novel synthetic data generation framework, designed to enhance real data or replace it completely by creating task-specific frequency information, requiring only the sampling rate of the target data. Our approach, Freq-Synth, improves the robustness of both foundation as well as nonfoundation forecast models in zero-shot and few-shot settings, facilitating more reliable time series forecasting under limited data scenarios. Time series forecasting (TSF) plays a critical role in various areas, such as finance, healthcare, and energy, where accurate predictions of future values are essential for decision-making and planning. Traditionally, in-domain learning has been the common setting for developing forecasting models, where a model is trained using data from the same domain it will later be deployed in (Salinas et al., 2020; Zhou et al., 2021). This ensures that the model captures the patterns, seasonality, and trends specific to the target domain, improving its predictive performance. However, a significant challenge arises when there is scarce or no historical information available for training, limiting the ability to apply traditional in-domain learning approaches (Sarmas et al., 2022; Fong et al., 2020). In such cases, the emergence of zero-shot (ZS) and few-shot (FS) learning settings offer potential solutions. Zero-shot learning enables models to generalize to new, unseen domains without requiring domainspecific data by leveraging knowledge transfer from other domains or tasks.
Utilizing Image Transforms and Diffusion Models for Generative Modeling of Short and Long Time Series
Naiman, Ilan, Berman, Nimrod, Pemper, Itai, Arbiv, Idan, Fadlon, Gal, Azencot, Omri
Lately, there has been a surge in interest surrounding generative modeling of time series data. Most existing approaches are designed either to process short sequences or to handle long-range sequences. This dichotomy can be attributed to gradient issues with recurrent networks, computational costs associated with transformers, and limited expressiveness of state space models. Towards a unified generative model for varying-length time series, we propose in this work to transform sequences into images. By employing invertible transforms such as the delay embedding and the short-time Fourier transform, we unlock three main advantages: i) We can exploit advanced diffusion vision models; ii) We can remarkably process short- and long-range inputs within the same framework; and iii) We can harness recent and established tools proposed in the time series to image literature. We validate the effectiveness of our method through a comprehensive evaluation across multiple tasks, including unconditional generation, interpolation, and extrapolation. We show that our approach achieves consistently state-of-the-art results against strong baselines. In the unconditional generation tasks, we show remarkable mean improvements of 58.17% over previous diffusion models in the short discriminative score and 132.61% in the (ultra-)long classification scores. Code is at https://github.com/azencot-group/ImagenTime.
Analyzing Deep Transformer Models for Time Series Forecasting via Manifold Learning
Kaufman, Ilya, Azencot, Omri
Transformer models have consistently achieved remarkable results in various domains such as natural language processing and computer vision. However, despite ongoing research efforts to better understand these models, the field still lacks a comprehensive understanding. This is particularly true for deep time series forecasting methods, where analysis and understanding work is relatively limited. Time series data, unlike image and text information, can be more challenging to interpret and analyze. To address this, we approach the problem from a manifold learning perspective, assuming that the latent representations of time series forecasting models lie next to a low-dimensional manifold. In our study, we focus on analyzing the geometric features of these latent data manifolds, including intrinsic dimension and principal curvatures. Our findings reveal that deep transformer models exhibit similar geometric behavior across layers, and these geometric features are correlated with model performance. Additionally, we observe that untrained models initially have different structures, but they rapidly converge during training. By leveraging our geometric analysis and differentiable tools, we can potentially design new and improved deep forecasting neural networks. This approach complements existing analysis studies and contributes to a better understanding of transformer models in the context of time series forecasting. Code is released at https://github.com/azencot-group/GATLM.
Sequential Disentanglement by Extracting Static Information From A Single Sequence Element
Berman, Nimrod, Naiman, Ilan, Arbiv, Idan, Fadlon, Gal, Azencot, Omri
One of the fundamental representation learning tasks is unsupervised sequential disentanglement, where latent codes of inputs are decomposed to a single static factor and a sequence of dynamic factors. To extract this latent information, existing methods condition the static and dynamic codes on the entire input sequence. Unfortunately, these models often suffer from information leakage, i.e., the dynamic vectors encode both static and dynamic information, or vice versa, leading to a non-disentangled representation. Attempts to alleviate this problem via reducing the dynamic dimension and auxiliary loss terms gain only partial success. Instead, we propose a novel and simple architecture that mitigates information leakage by offering a simple and effective subtraction inductive bias while conditioning on a single sample. Remarkably, the resulting variational framework is simpler in terms of required loss terms, hyperparameters, and data augmentation. We evaluate our method on multiple data-modality benchmarks including general time series, video, and audio, and we show beyond state-of-the-art results on generation and prediction tasks in comparison to several strong baselines.
First-Order Manifold Data Augmentation for Regression Learning
Kaufman, Ilya, Azencot, Omri
Data augmentation (DA) methods tailored to specific domains generate synthetic samples by applying transformations that are appropriate for the characteristics of the underlying data domain, such as rotations on images and time warping on time series data. In contrast, domain-independent approaches, e.g. mixup, are applicable to various data modalities, and as such they are general and versatile. While regularizing classification tasks via DA is a well-explored research topic, the effect of DA on regression problems received less attention. To bridge this gap, we study the problem of domain-independent augmentation for regression, and we introduce FOMA: a new data-driven domain-independent data augmentation method. Essentially, our approach samples new examples from the tangent planes of the train distribution. Augmenting data in this way aligns with the network tendency towards capturing the dominant features of its input signals. We evaluate FOMA on in-distribution generalization and out-of-distribution robustness benchmarks, and we show that it improves the generalization of several neural architectures. We also find that strong baselines based on mixup are less effective in comparison to our approach. Our code is publicly available athttps://github.com/azencot-group/FOMA.
Data Augmentation Policy Search for Long-Term Forecasting
Nochumsohn, Liran, Azencot, Omri
In practice, DA has been shown to achieve state-of-the-art (SOTA) results in e.g., vision Data augmentation serves as a popular regularization [Krizhevsky et al., 2012] and natural language [Wei technique to combat overfitting challenges and Zou, 2019] tasks. in neural networks. While automatic augmentation Unfortunately, DA is not free from challenges. For instance, has demonstrated success in image classification Tian et al. [2020b] showed that the effectivity of augmented tasks, its application to time-series problems, samples depends on the downstream task. To this end, recent particularly in long-term forecasting, has received approaches explored automatic augmentation tools, where comparatively less attention. To address a good DA policy is searched for [Lemley et al., 2017, this gap, we introduce a time-series automatic augmentation Cubuk et al., 2019]. While automatic frameworks achieved approach named TSAA, which is both impressive results on image classification tasks [Zheng et al., efficient and easy to implement. The solution involves 2022] and other data modalities, problems with time-series tackling the associated bilevel optimization data received significantly less attention. Toward bridging problem through a two-step process: initially training this gap, we propose in this work a new automatic data a non-augmented model for a limited number augmentation method, designed for time-series forecasting of epochs, followed by an iterative split procedure.
Generative Modeling of Graphs via Joint Diffusion of Node and Edge Attributes
Berman, Nimrod, Kosman, Eitan, Di Castro, Dotan, Azencot, Omri
Graph generation is integral to various engineering and scientific disciplines. Nevertheless, existing methodologies tend to overlook the generation of edge attributes. However, we identify critical applications where edge attributes are essential, making prior methods potentially unsuitable in such contexts. Moreover, while trivial adaptations are available, empirical investigations reveal their limited efficacy as they do not properly model the interplay among graph components. To address this, we propose a joint score-based model of nodes and edges for graph generation that considers all graph components. Our approach offers two key novelties: (i) node and edge attributes are combined in an attention module that generates samples based on the two ingredients; and (ii) node, edge and adjacency information are mutually dependent during the graph diffusion process. We evaluate our method on challenging benchmarks involving real-world and synthetic datasets in which edge features are crucial. Additionally, we introduce a new synthetic dataset that incorporates edge values. Furthermore, we propose a novel application that greatly benefits from the method due to its nature: the generation of traffic scenes represented as graphs. Our method outperforms other graph generation methods, demonstrating a significant advantage in edge-related measures.
Data Representations' Study of Latent Image Manifolds
Kaufman, Ilya, Azencot, Omri
Deep neural networks have been demonstrated to achieve phenomenal success in many domains, and yet their inner mechanisms are not well understood. In this paper, we investigate the curvature of image manifolds, i.e., the manifold deviation from being flat in its principal directions. We find that state-of-the-art trained convolutional neural networks for image classification have a characteristic curvature profile along layers: an initial steep increase, followed by a long phase of a plateau, and followed by another increase. In contrast, this behavior does not appear in untrained networks in which the curvature flattens. We also show that the curvature gap between the last two layers has a strong correlation with the generalization capability of the network. Moreover, we find that the intrinsic dimension of latent codes is not necessarily indicative of curvature. Finally, we observe that common regularization methods such as mixup yield flatter representations when compared to other methods. Our experiments show consistent results over a variety of deep learning architectures and multiple data sets. Our code is publicly available at https://github.com/azencot-group/CRLM
Generative Modeling of Regular and Irregular Time Series Data via Koopman VAEs
Naiman, Ilan, Erichson, N. Benjamin, Ren, Pu, Mahoney, Michael W., Azencot, Omri
Generating realistic time series data is important for many engineering and scientific applications. Existing work tackles this problem using generative adversarial networks (GANs). However, GANs are often unstable during training, and they can suffer from mode collapse. While variational autoencoders (VAEs) are known to be more robust to these issues, they are (surprisingly) less often considered for time series generation. In this work, we introduce Koopman VAE (KVAE), a new generative framework that is based on a novel design for the model prior, and that can be optimized for either regular and irregular training data. Inspired by Koopman theory, we represent the latent conditional prior dynamics using a linear map. Our approach enhances generative modeling with two desired features: (i) incorporating domain knowledge can be achieved by leverageing spectral tools that prescribe constraints on the eigenvalues of the linear map; and (ii) studying the qualitative behavior and stablity of the system can be performed using tools from dynamical systems theory. Our results show that KVAE outperforms state-of-the-art GAN and VAE methods across several challenging synthetic and real-world time series generation benchmarks. Whether trained on regular or irregular data, KVAE generates time series that improve both discriminative and predictive metrics. We also present visual evidence suggesting that KVAE learns probability density functions that better approximate empirical ground truth distributions.