This paper describes a purely data-driven solution to a class of sequential decision-making problems with a large number of concurrent online decisions, with applications to computing systems and operations research. We assume that while the micro-level behaviour of the system can be broadly captured by analytical expressions or simulation, the macro-level or emergent behaviour is complicated by non-linearity, constraints, and stochasticity. If we represent the set of concurrent decisions to be computed as a vector, each element of the vector is assumed to be a continuous variable, and the number of such elements is arbitrarily large and variable from one problem instance to another. We first formulate the decision-making problem as a canonical reinforcement learning (RL) problem, which can be solved using purely data-driven techniques. We modify a standard approach known as advantage actor critic (A2C) to ensure its suitability to the problem at hand, and compare its performance to that of baseline approaches on the specific instance of a multi-product inventory management task. The key modifications include a parallelised formulation of the decision-making task, and a training procedure that explicitly recognises the quantitative relationship between different decisions. We also present experimental results probing the learned policies, and their robustness to variations in the data.
The years of research on robotics and multiagent systems are coming together to provide just such a disruption to the material-handling industry. While autonomous guided vehicles (AGVs) have been used to move material within warehouses since the 1950s, they have been used primarily to transport very large, very heavy objects like rolls of uncut paper or engine blocks. The confluence of inexpensive wireless communications, computational power, and robotic components are making autonomous vehicles cheaper, smaller, and more capable. In recent years, we have seen an increase in the use of autonomous vehicles in the field. Examples include teleoperated military devices like iRobot's Packbot and the pilotless Predator aircraft, both of which have seen service in Iraq and Afghanistan.
The Kiva warehouse-management system creates a new paradigm for pick-pack-and-ship warehouses that significantly improves worker productivity. The Kiva system uses movable storage shelves that can be lifted by small, autonomous robots. By bringing the product to the worker, productivity is increased by a factor of two or more, while simultaneously improving accountability and flexibility. A Kiva installation for a large distribution center may require 500 or more vehicles. As such, the Kiva system represents the first commercially available, large-scale autonomous robot system. The first permanent installation of a Kiva system was deployed in the summer of 2006.
With machine-learning technology, retailers can address the common--and costly--problem of having too much or too little fresh food in stock. Fresh food, already a fiercely competitive arena in grocery retail, is becoming an even more crowded battleground. Discounters, convenience-store chains, and online players are recognizing the power of fresh-food categories to drive store visits, basket size, and customer loyalty. With fresh products accounting for up to 40 percent of grocers' revenue and one-third of cost of goods sold, getting fresh-food retailing right is more important than ever.1 1.Raphael Buck and Arnaud Minvielle, "A fresh take on food retailing," Perspectives on retail and consumer goods, Winter 2013/14. Fresh food is perishable, demand is highly variable, and lead times are often uncertain.
Machine learning can help retailers address the challenges of offering fresh foods, which account for up to 40% of a grocers' revenue and one-third of the cost of goods sold, according to a report by McKinsey and Company. The increasing demand of these products have led to new offerings, like exotic and hard-to-find items as well as "ultrafresh" items with a shelf life of no more than one or two days. Old processes can make it difficult to order the correct amount of food: order too much, and the food goes to waste; order too little, and you lose sales. Most traditional supply chain planning systems take a fixed, rule-based approach to forecasting and replenishment, but because local demand and conditions vary from day to day, planners have to manually enter different types of data into their replenishment systems. These manual processes are time consuming, error prone and reliant on individual planners' experience and instincts.