lookahead model
Autoregressive Modeling with Lookahead Attention
Du, Li, Mei, Hongyuan, Eisner, Jason
To predict the next token, autoregressive models However, those NP-hard distributions are artificial. For naturally ordinarily examine the past. Could they also benefit occurring sequences, why might one expect lookahead from also examining hypothetical futures? We to help autoregressive modeling? We argue that when the consider a novel Transformer-based autoregressive sequences represent an agent's behavior, an autoregressive architecture that estimates the next-token distribution parameterization is not always the simplest description. If by extrapolating multiple continuations the behavior is goal-directed--for example, an agent trying of the past, according to some proposal distribution, to achieve high reward in a Markov Decision Process--then and attending to these extended strings. This the simplest description may include a characterization of architecture draws insights from classical AI systems the agent's environment and goals. Even if the agent explicitly such as board game players: when making consults an autoregressive policy p(action | state) a local decision, a policy may benefit from exploring at each step, that policy is not arbitrary: while it may appear possible future trajectories and analyzing complex, it was shaped by reinforcement learning or them. On multiple tasks including morphological by natural selection so as to achieve high-reward trajectories.
The Parametric Cost Function Approximation: A new approach for multistage stochastic programming
Powell, Warren B, Ghadimi, Saeed
The most common approaches for solving multistage stochastic programming problems in the research literature have been to either use value functions ("dynamic programming") or scenario trees ("stochastic programming") to approximate the impact of a decision now on the future. By contrast, common industry practice is to use a deterministic approximation of the future which is easier to understand and solve, but which is criticized for ignoring uncertainty. We show that a parameterized version of a deterministic optimization model can be an effective way of handling uncertainty without the complexity of either stochastic programming or dynamic programming. We present the idea of a parameterized deterministic optimization model, and in particular a deterministic lookahead model, as a powerful strategy for many complex stochastic decision problems. This approach can handle complex, high-dimensional state variables, and avoids the usual approximations associated with scenario trees or value function approximations. Instead, it introduces the offline challenge of designing and tuning the parameterization. We illustrate the idea by using a series of application settings, and demonstrate its use in a nonstationary energy storage problem with rolling forecasts.
Dynamic Bidding for Advance Commitments in Truckload Brokerage Markets
Wang, Yingfei, Nascimento, Juliana Martins Do, Powell, Warren
Truckload brokerages, a $100 billion/year industry in the U.S., plays the critical role of matching shippers with carriers, often to move loads several days into the future. Brokerages not only have to find companies that will agree to move a load, the brokerage often has to find a price that both the shipper and carrier will agree to. The price not only varies by shipper and carrier, but also by the traffic lanes and other variables such as commodity type. Brokerages have to learn about shipper and carrier response functions by offering a price and observing whether each accepts the quote. We propose a knowledge gradient policy with bootstrap aggregation for high-dimensional contextual settings to guide price experimentation by maximizing the value of information. The learning policy is tested using a newly developed, carefully calibrated fleet simulator that includes a stochastic lookahead policy that simulates fleet movements, as well as the stochastic modeling of driver assignments and the carrier's load commitment policies with advance booking.