Not enough data to create a plot.
Try a different view from the menu above.
Wang, Yuyang
Discovering Bias in Latent Space: An Unsupervised Debiasing Approach
Adila, Dyah, Zhang, Shuai, Han, Boran, Wang, Yuyang
The question-answering (QA) capabilities of foundation models are highly sensitive to prompt variations, rendering their performance susceptible to superficial, non-meaning-altering changes. This vulnerability often stems from the model's preference or bias towards specific input characteristics, such as option position or superficial image features in multi-modal settings. We propose to rectify this bias directly in the model's internal representation. Our approach, SteerFair, finds the bias direction in the model's representation space and steers activation values away from it during inference. Specifically, we exploit the observation that bias often adheres to simple association rules, such as the spurious association between the first option and correctness likelihood. Next, we construct demonstrations of these rules from unlabeled samples and use them to identify the bias directions. We empirically show that SteerFair significantly reduces instruction-tuned model performance variance across prompt modifications on three benchmark tasks. Remarkably, our approach surpasses a supervised baseline with 100 labels by an average of 10.86% accuracy points and 12.95 score points and matches the performance with 500 labels.
Chronos: Learning the Language of Time Series
Ansari, Abdul Fatir, Stella, Lorenzo, Turkmen, Caner, Zhang, Xiyuan, Mercado, Pedro, Shen, Huibin, Shchur, Oleksandr, Rangapuram, Syama Sundar, Arango, Sebastian Pineda, Kapoor, Shubham, Zschiegner, Jasper, Maddix, Danielle C., Wang, Hao, Mahoney, Michael W., Torkkola, Kari, Wilson, Andrew Gordon, Bohlke-Schneider, Michael, Wang, Yuyang
We introduce Chronos, a simple yet effective framework for pretrained probabilistic time series models. Chronos tokenizes time series values using scaling and quantization into a fixed vocabulary and trains existing transformer-based language model architectures on these tokenized time series via the cross-entropy loss. We pretrained Chronos models based on the T5 family (ranging from 20M to 710M parameters) on a large collection of publicly available datasets, complemented by a synthetic dataset that we generated via Gaussian processes to improve generalization. In a comprehensive benchmark consisting of 42 datasets, and comprising both classical local models and deep learning methods, we show that Chronos models: (a) significantly outperform other methods on datasets that were part of the training corpus; and (b) have comparable and occasionally superior zero-shot performance on new datasets, relative to methods that were trained specifically on them. Our results demonstrate that Chronos models can leverage time series data from diverse domains to improve zero-shot accuracy on unseen forecasting tasks, positioning pretrained models as a viable tool to greatly simplify forecasting pipelines.
OmniColor: A Global Camera Pose Optimization Approach of LiDAR-360Camera Fusion for Colorizing Point Clouds
Liu, Bonan, Zhao, Guoyang, Jiao, Jianhao, Cai, Guang, Li, Chengyang, Yin, Handi, Wang, Yuyang, Liu, Ming, Hui, Pan
A Colored point cloud, as a simple and efficient 3D representation, has many advantages in various fields, including robotic navigation and scene reconstruction. This representation is now commonly used in 3D reconstruction tasks relying on cameras and LiDARs. However, fusing data from these two types of sensors is poorly performed in many existing frameworks, leading to unsatisfactory mapping results, mainly due to inaccurate camera poses. This paper presents OmniColor, a novel and efficient algorithm to colorize point clouds using an independent 360-degree camera. Given a LiDAR-based point cloud and a sequence of panorama images with initial coarse camera poses, our objective is to jointly optimize the poses of all frames for mapping images onto geometric reconstructions. Our pipeline works in an off-the-shelf manner that does not require any feature extraction or matching process. Instead, we find optimal poses by directly maximizing the photometric consistency of LiDAR maps. In experiments, we show that our method can overcome the severe visual distortion of omnidirectional images and greatly benefit from the wide field of view (FOV) of 360-degree cameras to reconstruct various scenarios with accuracy and stability. The code will be released at https://github.com/liubonan123/OmniColor/.
Explainable AI for Embedded Systems Design: A Case Study of Static Redundant NVM Memory Write Prediction
Gamatiรฉ, Abdoulaye, Wang, Yuyang
This paper investigates the application of eXplainable Artificial Intelligence (XAI) in the design of embedded systems using machine learning (ML). As a case study, it addresses the challenging problem of static silent store prediction. This involves identifying redundant memory writes based only on static program features. Eliminating such stores enhances performance and energy efficiency by reducing memory access and bus traffic, especially in the presence of emerging non-volatile memory technologies. To achieve this, we propose a methodology consisting of: 1) the development of relevant ML models for explaining silent store prediction, and 2) the application of XAI to explain these models. We employ two state-of-the-art model-agnostic XAI methods to analyze the causes of silent stores. Through the case study, we evaluate the effectiveness of the methods. We find that these methods provide explanations for silent store predictions, which are consistent with known causes of silent store occurrences from previous studies. Typically, this allows us to confirm the prevalence of silent stores in operations that write the zero constant into memory, or the absence of silent stores in operations involving loop induction variables. This suggests the potential relevance of XAI in analyzing ML models' decision in embedded system design. From the case study, we share some valuable insights and pitfalls we encountered. More generally, this study aims to lay the groundwork for future research in the emerging field of XAI for embedded system design.
PreDiff: Precipitation Nowcasting with Latent Diffusion Models
Gao, Zhihan, Shi, Xingjian, Han, Boran, Wang, Hao, Jin, Xiaoyong, Maddix, Danielle, Zhu, Yi, Li, Mu, Wang, Yuyang
Earth system forecasting has traditionally relied on complex physical models that are computationally expensive and require significant domain expertise. In the past decade, the unprecedented increase in spatiotemporal Earth observation data has enabled data-driven forecasting models using deep learning techniques. These models have shown promise for diverse Earth system forecasting tasks but either struggle with handling uncertainty or neglect domain-specific prior knowledge, resulting in averaging possible futures to blurred forecasts or generating physically implausible predictions. To address these limitations, we propose a two-stage pipeline for probabilistic spatiotemporal forecasting: 1) We develop PreDiff, a conditional latent diffusion model capable of probabilistic forecasts. 2) We incorporate an explicit knowledge alignment mechanism to align forecasts with domain-specific physical constraints. This is achieved by estimating the deviation from imposed constraints at each denoising step and adjusting the transition distribution accordingly. We conduct empirical studies on two datasets: N-body MNIST, a synthetic dataset with chaotic behavior, and SEVIR, a real-world precipitation nowcasting dataset. Specifically, we impose the law of conservation of energy in N-body MNIST and anticipated precipitation intensity in SEVIR. Experiments demonstrate the effectiveness of PreDiff in handling uncertainty, incorporating domain-specific prior knowledge, and generating forecasts that exhibit high operational utility.
Deep Non-Parametric Time Series Forecaster
Rangapuram, Syama Sundar, Gasthaus, Jan, Stella, Lorenzo, Flunkert, Valentin, Salinas, David, Wang, Yuyang, Januschowski, Tim
This paper presents non-parametric baseline models for time series forecasting. Unlike classical forecasting models, the proposed approach does not assume any parametric form for the predictive distribution and instead generates predictions by sampling from the empirical distribution according to a tunable strategy. By virtue of this, the model is always able to produce reasonable forecasts (i.e., predictions within the observed data range) without fail unlike classical models that suffer from numerical stability on some data distributions. Moreover, we develop a global version of the proposed method that automatically learns the sampling strategy by exploiting the information across multiple related time series. The empirical evaluation shows that the proposed methods have reasonable and consistent performance across all datasets, proving them to be strong baselines to be considered in one's forecasting toolbox.
Generating Molecular Conformer Fields
Wang, Yuyang, Elhag, Ahmed A., Jaitly, Navdeep, Susskind, Joshua M., Bautista, Miguel Angel
This complicates brute force approaches, making them virtually unfeasible for even moderately small molecules. In this paper we tackle the problem of generating conformers of a molecule in 3D space given Systematic methods, like OMEGA (Hawkins et al., 2010), its molecular graph. We parameterize these conformers offer rapid processing through rule-based generators and as continuous functions that map elements curated torsion templates. Despite their efficiency, these from the molecular graph to points in 3D models typically fail on complex molecules, as they often space. We then formulate the problem of learning overlook global interactions and are tricky to extend to to generate conformers as learning a distribution inputs like transition states or open-shell molecules. Classic over these functions using a diffusion generative stochastic methods, like molecular dynamics (MD) and model, called Molecular Conformer Fields Markov chain Monte Carlo (MCMC), rely on extensively exploring (MCF). Our approach is simple and scalable, and the energy landscape to find low-energy conformers.
Predict, Refine, Synthesize: Self-Guiding Diffusion Models for Probabilistic Time Series Forecasting
Kollovieh, Marcel, Ansari, Abdul Fatir, Bohlke-Schneider, Michael, Zschiegner, Jasper, Wang, Hao, Wang, Yuyang
Diffusion models have achieved state-of-the-art performance in generative modeling tasks across various domains. Prior works on time series diffusion models have primarily focused on developing conditional models tailored to specific forecasting or imputation tasks. In this work, we explore the potential of task-agnostic, unconditional diffusion models for several time series applications. We propose TSDiff, an unconditionally-trained diffusion model for time series. Our proposed self-guidance mechanism enables conditioning TSDiff for downstream tasks during inference, without requiring auxiliary networks or altering the training procedure. We demonstrate the effectiveness of our method on three different time series tasks: forecasting, refinement, and synthetic data generation. First, we show that TSDiff is competitive with several task-specific conditional forecasting methods (predict). Second, we leverage the learned implicit probability density of TSDiff to iteratively refine the predictions of base forecasters with reduced computational overhead over reverse diffusion (refine). Notably, the generative performance of the model remains intact -- downstream forecasters trained on synthetic samples from TSDiff outperform forecasters that are trained on samples from other state-of-the-art generative time series models, occasionally even outperforming models trained on real data (synthesize).
Backpropagation through Back Substitution with a Backslash
Edelman, Alan, Akyurek, Ekin, Wang, Yuyang
We present a linear algebra formulation of backpropagation which allows the calculation of gradients by using a generically written ``backslash'' or Gaussian elimination on triangular systems of equations. Generally, the matrix elements are operators. This paper has three contributions: (i) it is of intellectual value to replace traditional treatments of automatic differentiation with a (left acting) operator theoretic, graph-based approach; (ii) operators can be readily placed in matrices in software in programming languages such as Julia as an implementation option; (iii) we introduce a novel notation, ``transpose dot'' operator ``$\{\}^{T_\bullet}$'' that allows for the reversal of operators. We further demonstrate the elegance of the operators approach in a suitable programming language consisting of generic linear algebra operators such as Julia \cite{bezanson2017julia}, and that it is possible to realize this abstraction in code. Our implementation shows how generic linear algebra can allow operators as elements of matrices. In contrast to ``operator overloading,'' where backslash would normally have to be rewritten to take advantage of operators, with ``generic programming'' there is no such need.
Theoretical Guarantees of Learning Ensembling Strategies with Applications to Time Series Forecasting
Hasson, Hilaf, Maddix, Danielle C., Wang, Yuyang, Gupta, Gaurav, Park, Youngsuk
Ensembling is among the most popular tools in machine learning (ML) due to its effectiveness in minimizing variance and thus improving generalization. Most ensembling methods for black-box base learners fall under the umbrella of "stacked generalization," namely training an ML algorithm that takes the inferences from the base learners as input. While stacking has been widely applied in practice, its theoretical properties are poorly understood. In this paper, we prove a novel result, showing that choosing the best stacked generalization from a (finite or finite-dimensional) family of stacked generalizations based on cross-validated performance does not perform "much worse" than the oracle best. Our result strengthens and significantly extends the results in Van der Laan et al. (2007). Inspired by the theoretical analysis, we further propose a particular family of stacked generalizations in the context of probabilistic forecasting, each one with a different sensitivity for how much the ensemble weights are allowed to vary across items, timestamps in the forecast horizon, and quantiles. Experimental results demonstrate the performance gain of the proposed method.