Goto

Collaborating Authors

 Quanz, Brian


CoRe: Coherency Regularization for Hierarchical Time Series

arXiv.org Machine Learning

Hierarchical time series forecasting presents unique challenges, particularly when dealing with noisy data that may not perfectly adhere to aggregation constraints. This paper introduces a novel approach to soft coherency in hierarchical time series forecasting using neural networks. We present a network coherency regularization method, which we denote as CoRe (Coherency Regularization), a technique that trains neural networks to produce forecasts that are inherently coherent across hierarchies, without strictly enforcing aggregation constraints. Our method offers several key advantages. (1) It provides theoretical guarantees on the coherency of forecasts, even for out-of-sample data. (2) It is adaptable to scenarios where data may contain errors or missing values, making it more robust than strict coherency methods. (3) It can be easily integrated into existing neural network architectures for time series forecasting. We demonstrate the effectiveness of our approach on multiple benchmark datasets, comparing it against state-of-the-art methods in both coherent and noisy data scenarios. Additionally, our method can be used within existing generative probabilistic forecasting frameworks to generate coherent probabilistic forecasts. Our results show improved generalization and forecast accuracy, particularly in the presence of data inconsistencies. On a variety of datasets, including both strictly hierarchically coherent and noisy data, our training method has either equal or better accuracy at all levels of the hierarchy while being strictly more coherent out-of-sample than existing soft-coherency methods.


A Survey of Classical And Quantum Sequence Models

arXiv.org Artificial Intelligence

Our primary objective is to conduct a brief survey of various classical and quantum neural net sequence models, which includes self-attention and recurrent neural networks, with a focus on recent quantum approaches proposed to work with near-term quantum devices, while exploring some basic enhancements for these quantum models. We re-implement a key representative set of these existing methods, adapting an image classification approach using quantum self-attention to create a quantum hybrid transformer that works for text and image classification, and applying quantum self-attention and quantum recurrent neural networks to natural language processing tasks. We also explore different encoding techniques and introduce positional encoding into quantum self-attention neural networks leading to improved accuracy and faster convergence in text and image classification experiments. This paper also performs a comparative analysis of classical self-attention models and their quantum counterparts, helping shed light on the differences in these models and their performance.


Hierarchical Proxy Modeling for Improved HPO in Time Series Forecasting

arXiv.org Artificial Intelligence

Selecting the right set of hyperparameters is crucial in time series forecasting. The classical temporal cross-validation framework for hyperparameter optimization (HPO) often leads to poor test performance because of a possible mismatch between validation and test periods. To address this test-validation mismatch, we propose a novel technique, H-Pro to drive HPO via test proxies by exploiting data hierarchies often associated with time series datasets. Since higher-level aggregated time series often show less irregularity and better predictability as compared to the lowest-level time series which can be sparse and intermittent, we optimize the hyperparameters of the lowest-level base-forecaster by leveraging the proxy forecasts for the test period generated from the forecasters at higher levels. H-Pro can be applied on any off-the-shelf machine learning model to perform HPO. We validate the efficacy of our technique with extensive empirical evaluation on five publicly available hierarchical forecasting datasets. Our approach outperforms existing state-of-the-art methods in Tourism, Wiki, and Traffic datasets, and achieves competitive result in Tourism-L dataset, without any model-specific enhancements. Moreover, our method outperforms the winning method of the M5 forecast accuracy competition.


Math Programming based Reinforcement Learning for Multi-Echelon Inventory Management

arXiv.org Artificial Intelligence

Reinforcement learning has lead to considerable break-throughs in diverse areas such as robotics, games and many others. But the application to RL in complex real-world decision making problems remains limited. Many problems in operations management (inventory and revenue management, for example) are characterized by large action spaces and stochastic system dynamics. These characteristics make the problem considerably harder to solve for existing RL methods that rely on enumeration techniques to solve per step action problems. To resolve these issues, we develop Programmable Actor Reinforcement Learning (PARL), a policy iteration method that uses techniques from integer programming and sample average approximation. Analytically, we show that the for a given critic, the learned policy in each iteration converges to the optimal policy as the underlying samples of the uncertainty go to infinity. Practically, we show that a properly selected discretization of the underlying uncertain distribution can yield near optimal actor policy even with very few samples from the underlying uncertainty. We then apply our algorithm to real-world inventory management problems with complex supply chain structures and show that PARL outperforms state-of-the-art RL and inventory optimization methods in these settings. We find that PARL outperforms commonly used base stock heuristic by 44.7% and the best performing RL method by up to 12.1% on average across different supply chain environments.


Learning to shortcut and shortlist order fulfillment deciding

arXiv.org Artificial Intelligence

With the increase of order fulfillment options and business objectives taken into consideration in the deciding process, order fulfillment deciding is becoming more and more complex. For example, with the advent of ship from store retailers now have many more fulfillment nodes to consider, and it is now common to take into account many and varied business goals in making fulfillment decisions. With increasing complexity, efficiency of the deciding process can become a real concern. Finding the optimal fulfillment assignments among all possible ones may be too costly to do for every order especially during peak times. In this work, we explore the possibility of exploiting regularity in the fulfillment decision process to reduce the burden on the deciding system. By using data mining we aim to find patterns in past fulfillment decisions that can be used to efficiently predict most likely assignments for future decisions. Essentially, those assignments that can be predicted with high confidence can be used to shortcut, or bypass, the expensive deciding process, or else a set of most likely assignments can be used for shortlisting -- sending a much smaller set of candidates for consideration by the fulfillment deciding system.


Predicting Deep Neural Network Generalization with Perturbation Response Curves

arXiv.org Artificial Intelligence

The field of Deep Learning is rich with empirical evidence of human-like performance on a variety of prediction tasks. However, despite these successes, the recent Predicting Generalization in Deep Learning (PGDL) NeurIPS 2020 competition [1] suggests that there is a need for more robust and efficient measures of network generalization. In this work, we propose a new framework for evaluating the generalization capabilities of trained networks. We use perturbation response (PR) curves that capture the accuracy change of a given network as a function of varying levels of training sample perturbation. From these PR curves, we derive novel statistics that capture generalization capability. Specifically, we introduce two new measures for accurately predicting generalization gaps: the Gi-score and Pal-score, that are inspired by the Gini coefficient and Palma ratio (measures of income inequality), that accurately predict generalization gaps. Using our framework applied to intra and inter class sample mixup, we attain better predictive scores than the current state-of-the-art measures on a majority of tasks in the PGDL competition. In addition, we show that our framework and the proposed statistics can be used to capture to what extent a trained network is invariant to a given parametric input transformation, such as rotation or translation. Therefore, these generalization gap prediction statistics also provide a useful means for selecting the optimal network architectures and hyperparameters that are invariant to a certain perturbation.


Towards creativity characterization of generative models via group-based subset scanning

arXiv.org Artificial Intelligence

Deep generative models, such as Variational Autoencoders (VAEs), have been employed widely in computational creativity research. However, such models discourage out-of-distribution generation to avoid spurious sample generation, limiting their creativity. Thus, incorporating research on human creativity into generative deep learning techniques presents an opportunity to make their outputs more compelling and human-like. As we see the emergence of generative models directed to creativity research, a need for machine learning-based surrogate metrics to characterize creative output from these models is imperative. We propose group-based subset scanning to quantify, detect, and characterize creative processes by detecting a subset of anomalous node-activations in the hidden layers of generative models. Our experiments on original, typically decoded, and "creatively decoded" (Das et al 2020) image datasets reveal that the proposed subset scores distribution is more useful for detecting creative processes in the activation space rather than the pixel space. Further, we found that creative samples generate larger subsets of anomalies than normal or non-creative samples across datasets. The node activations highlighted during the creative decoding process are different from those responsible for normal sample generation.


Temporal Latent Auto-Encoder: A Method for Probabilistic Multivariate Time Series Forecasting

arXiv.org Artificial Intelligence

A key reason Forecasting - predicting future values of time series, is a key for recent success of deep learning for forecasting is multitask component in many industries (Fildes et al. 2008). Applications univariate forecasting - sharing deep learning model parameters include forecasting supply chain and airline demand across all series, possibly with some series-specific (Fildes et al. 2008; Seeger, Salinas, and Flunkert 2016), financial scaling factors or parametric model components (Salinas, prices (Kim 2003), and energy, traffic or weather Flunkert, and Gasthaus 2019; Smyl 2020; Bandara, Bergmeir, patterns (Chatfield 2000). Forecasts are often required for and Hewamalage 2020; Li et al. 2019; Wen et al. 2017; Rangapuram large numbers of related time series, i.e., multivariate time series et al. 2018; Chen et al. 2018). E.g., the winner of forecasting, as opposed to univariate (single time series) the M4 forecasting competition (Makridakis, Spiliotis, and forecasting. For example, retailers may require sales/demand Assimakopoulos 2020) was a hybrid ES-RNN model (Smyl forecasts for millions of different products at thousands of 2020), in which a single shared univariate RNN model is used different locations - amounting to billions of sales time series.


Toward A Neuro-inspired Creative Decoder

arXiv.org Artificial Intelligence

Creativity, a process that generates novel and valuable ideas, involves increased association between task-positive (control) and task-negative (default) networks in brain. Inspired by this seminal finding, in this study we propose a creative decoder that directly modulates the neuronal activation pattern, while sampling from the learned latent space. The proposed approach is fully unsupervised and can be used as off-the-shelf. Our experiments on three different image datasets (MNIST, FMNIST, CELEBA) reveal that the co-activation between task-positive and task-negative neurons during decoding in a deep neural net enables generation of novel artifacts. We further identify sufficient conditions on several novelty metrics towards measuring the creativity of generated samples.