AITopics | Eisenach, Carson

Collaborating Authors

Eisenach, Carson

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models

Song, Yuda, Zhang, Hanlin, Eisenach, Carson, Kakade, Sham, Foster, Dean, Ghai, Udaya

arXiv.org Artificial IntelligenceDec-3-2024

While synthetic data, often generated by LLMs, offers a valuable complement to human-generated data, its misuse can harm performance. Bertrand et al. (2023) and Gerstgrasser et al. (2024) showed self-training on model-generated data leads to degradation. To mitigate this, incorporating a "reliable" verifier to label data has shown promise in preventing such performance collapse (Gillman et al., 2024). A straightforward verification mechanism is to train a reward model on human-annotated data to assess the quality of synthetic data (Lightman et al., 2023; Wang et al., 2024a). However, this approach can be prohibitively expensive and may offer few signals in domains where models exhibit super-human performance. An alternative is to use a stronger model (Chang et al., 2023; Havrilla et al., 2024) for annotation, but this becomes infeasible when the model is at the frontier of current capabilities. A promising solution is to use the model to label its own generations. Motivated by the intuition that "verification is easier than generation", one can hypothesize that the model may act as a better-than-random verifier of its own outputs, enabling self-improvement (Zelikman et al., 2022).

arxiv preprint arxiv, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2412.02674

Country: North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Neural Coordination and Capacity Control for Inventory Management

Eisenach, Carson, Ghai, Udaya, Madeka, Dhruv, Torkkola, Kari, Foster, Dean, Kakade, Sham

arXiv.org Machine LearningSep-24-2024

This paper addresses the capacitated periodic review inventory control problem, focusing on a retailer managing multiple products with limited shared resources, such as storage or inbound labor at a facility. Specifically, this paper is motivated by the questions of (1) what does it mean to backtest a capacity control mechanism, (2) can we devise and backtest a capacity control mechanism that is compatible with recent advances in deep reinforcement learning for inventory management? First, because we only have a single historic sample path of Amazon's capacity limits, we propose a method that samples from a distribution of possible constraint paths covering a space of real-world scenarios. This novel approach allows for more robust and realistic testing of inventory management strategies. Second, we extend the exo-IDP (Exogenous Decision Process) formulation of Madeka et al. 2022 to capacitated periodic review inventory control problems and show that certain capacitated control problems are no harder than supervised learning. Third, we introduce a `neural coordinator', designed to produce forecasts of capacity prices, guiding the system to adhere to target constraints in place of a traditional model predictive controller. Finally, we apply a modified DirectBackprop algorithm for learning a deep RL buying policy and a training the neural coordinator. Our methodology is evaluated through large-scale backtests, demonstrating RL buying policies with a neural coordinator outperforms classic baselines both in terms of cumulative discounted reward and capacity adherence (we see improvements of up to 50% in some cases).

coordinator, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

2410.02817

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)

Add feedback

Learning an Inventory Control Policy with General Inventory Arrival Dynamics

Andaz, Sohrab, Eisenach, Carson, Madeka, Dhruv, Torkkola, Kari, Jia, Randy, Foster, Dean, Kakade, Sham

arXiv.org Machine LearningOct-26-2023

In this paper we address the problem of learning and backtesting inventory control policies in the presence of general arrival dynamics -- which we term as a quantity-over-time arrivals model (QOT). We also allow for order quantities to be modified as a post-processing step to meet vendor constraints such as order minimum and batch size constraints -- a common practice in real supply chains. To the best of our knowledge this is the first work to handle either arbitrary arrival dynamics or an arbitrary downstream post-processing of order quantities. Building upon recent work (Madeka et al., 2022) we similarly formulate the periodic review inventory control problem as an exogenous decision process, where most of the state is outside the control of the agent. Madeka et al. (2022) show how to construct a simulator that replays historic data to solve this class of problem. In our case, we incorporate a deep generative model for the arrivals process as part of the history replay. By formulating the problem as an exogenous decision process, we can apply results from Madeka et al. (2022) to obtain a reduction to supervised learning. Finally, we show via simulation studies that this approach yields statistically significant improvements in profitability over production baselines. Using data from an ongoing real-world A/B test, we show that Gen-QOT generalizes well to off-policy data.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

arXiv.org Machine Learning

2310.17168

Country: North America > United States > California (0.14)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Deep Inventory Management

Madeka, Dhruv, Torkkola, Kari, Eisenach, Carson, Luo, Anna, Foster, Dean P., Kakade, Sham M.

arXiv.org Artificial IntelligenceNov-28-2022

This work provides a Deep Reinforcement Learning approach to solving a periodic review inventory control system with stochastic vendor lead times, lost sales, correlated demand, and price matching. While this dynamic program has historically been considered intractable, our results show that several policy learning approaches are competitive with or outperform classical methods. In order to train these algorithms, we develop novel techniques to convert historical data into a simulator. On the theoretical side, we present learnability results on a subclass of inventory control problems, where we provide a provable reduction of the reinforcement learning problem to that of supervised learning. On the algorithmic side, we present a model-based reinforcement learning procedure (Direct Backprop) to solve the periodic review inventory control problem by constructing a differentiable simulator. Under a variety of metrics Direct Backprop outperforms model-free RL and newsvendor baselines, in both simulations and real-world deployments.

inventory, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2210.03137

Genre:

Research Report > New Finding (0.54)
Research Report > Promising Solution (0.34)

Industry: Retail (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

MQTransformer: Multi-Horizon Forecasts with Context Dependent and Feedback-Aware Attention

Eisenach, Carson, Patel, Yagna, Madeka, Dhruv

arXiv.org Machine LearningOct-7-2020

Recent advances in neural forecasting have produced major improvements in accuracy for probabilistic demand prediction. In this work, we propose novel improvements to the current state of the art by incorporating changes inspired by recent advances in Transformer architectures for Natural Language Processing. We develop a novel decoder-encoder attention for context-alignment, improving forecasting accuracy by allowing the network to study its own history based on the context for which it is producing a forecast. We also present a novel positional encoding that allows the neural network to learn context-dependent seasonality functions as well as arbitrary holiday distances. Finally we show that the current state of the art MQ-Forecaster (Wen et al., 2017) models display excess variability by failing to leverage previous errors in the forecast to improve accuracy. We propose a novel decoder-self attention scheme for forecasting that produces significant improvements in the excess variation of the forecast.

deep learning, forecast, neural network, (21 more...)

arXiv.org Machine Learning

2009.14799

Country: North America > United States > California (0.28)

Genre: Research Report (0.50)

Industry: Retail (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

High-Dimensional Inference for Cluster-Based Graphical Models

Eisenach, Carson, Bunea, Florentina, Ning, Yang, Dinicu, Claudiu

arXiv.org Machine LearningJun-13-2018

Motivated by modern applications in which one constructs graphical models based on a very large number of features, this paper introduces a new class of cluster-based graphical models. Unlike standard graphical models, variable clustering is applied as an initial step for reducing the dimension of the feature space. We employ model assisted clustering, in which the clusters contain features that are similar to the same unobserved latent variable. Two different cluster-based Gaussian graphical models are considered: the latent variable graph, corresponding to the graphical model associated with the unobserved latent variables, and the cluster-average graph, corresponding to the vector of features averaged over clusters. We derive estimates tailored to these graphs, with the goal of pattern recovery under false discovery rate (FDR) control. Our study reveals that likelihood based inference for the latent graph is analytically intractable, and we develop alternative estimation and inference strategies. We replace the likelihood of the data by appropriate empirical risk functions that allow for valid inference in both graphical models under study. Our main results are Berry-Esseen central limit theorems for the proposed estimators, which are proved under weaker assumptions than those employed in the existing literature on Gaussian graphical model inference. We make explicit the implications of the asymptotic approximations on graph recovery under FDR control, and show when it can be controlled asymptotically. Our analysis takes into account the uncertainty induced by the initial clustering step. We find that the errors induced by clustering are asymptotically ignorable in the follow-up analysis, under no further restrictions on the parameter space for which inference is valid. The theoretical properties of the proposed procedures are verified on simulated data and an fMRI data analysis.

artificial intelligence, estimator, health & medicine, (19 more...)

arXiv.org Machine Learning

1806.05139

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.87)

Technology:

Information Technology > Artificial Intelligence > Systems & Languages (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Marginal Policy Gradients for Complex Control

Eisenach, Carson, Yang, Haichuan, Liu, Ji, Liu, Han

arXiv.org Machine LearningJun-13-2018

Many complex domains, such as robotics control and real-time strategy (RTS) games, require an agent to learn a continuous control. In the former, an agent learns a policy over $\mathbb{R}^d$ and in the latter, over a discrete set of actions each of which is parametrized by a continuous parameter. Such problems are naturally solved using policy based reinforcement learning (RL) methods, but unfortunately these often suffer from high variance leading to instability and slow convergence. We show that in many cases a substantial portion of the variance in policy gradient estimators is completely unnecessary and can be eliminated without introducing bias. Unnecessary variance is introduced whenever policies over bounded action spaces are modeled using distributions with unbounded support, by applying a transformation $T$ to the sampled action before execution in the environment. Recent works have studied variance reduced policy gradients for actions in bounded intervals, but to date no variance reduced methods exist when the action is a direction -- constrained to the unit sphere -- something often seen in RTS games. To address these challenges we: (1) introduce a stochastic policy gradient method for directional control; (2) introduce the marginal policy gradient framework, a powerful technique to obtain variance reduced policy gradients for arbitrary $T$; (3) show that marginal policy gradients are guaranteed to reduce variance, quantifying that reduction exactly; (4) validate our framework by applying the methods to a popular RTS game and a navigation task, demonstrating improvement over a policy gradient baseline.

artificial intelligence, neural network, policy gradient, (16 more...)

arXiv.org Machine Learning

1806.05134

Country: North America > United States (0.46)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.90)

Add feedback

Efficient, Certifiably Optimal High-Dimensional Clustering

Eisenach, Carson, Liu, Han

arXiv.org Machine LearningJun-1-2018

We consider SDP relaxation methods for data and variable clustering problems, which have been shown in the literature to have good statistical properties in a variety of settings, but remain intractable to solve in practice. In particular, we propose FORCE, a new algorithm to solve the Peng-Wei $K$-means SDP. Compared to the naive interior point method, our method reduces the computational complexity of solving the SDP from $\tilde{O}(d^7\log\epsilon^{-1})$ to $\tilde{O}(d^{6}K^{-2}\epsilon^{-1})$. Our method combines a primal first-order method with a dual optimality certificate search, which when successful, allows for early termination of the primal method. We show under certain data generating distributions that, with high probability, FORCE is guaranteed to find the optimal solution to the SDP relaxation and provide a certificate of exact optimality. As verified by our numerical experiments, this allows FORCE to solve the Peng-Wei SDP with dimensions in the hundreds in only tens of seconds. We also consider a variation of the Peng-Wei SDP for the case when $K$ is not known a priori and show that a slight modification of FORCE reduces the computational complexity of solving this problem as well: from $\tilde{O}(d^7\log\epsilon^{-1})$ using a standard SDP solver to $\tilde{O}(d^{4}\epsilon^{-1})$.

artificial intelligence, optimization problem, probability, (19 more...)

arXiv.org Machine Learning

1806.0053

Country: North America > United States (0.45)

Genre: Research Report > New Finding (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback