Jones, Chris
Benchmarking Offline Reinforcement Learning Algorithms for E-Commerce Order Fraud Evaluation
Degirmenci, Soysal, Jones, Chris
Amazon and other e-commerce sites must employ mechanisms to protect their millions of customers from fraud, such as unauthorized use of credit cards. One such mechanism is order fraud evaluation, where systems evaluate orders for fraud risk, and either "pass" the order, or take an action to mitigate high risk. Order fraud evaluation systems typically use binary classification models that distinguish fraudulent and legitimate orders, to assess risk and take action. We seek to devise a system that considers both financial losses of fraud and long-term customer satisfaction, which may be impaired when incorrect actions are applied to legitimate customers. We propose that taking actions to optimize long-term impact can be formulated as a Reinforcement Learning (RL) problem. Standard RL methods require online interaction with an environment to learn, but this is not desirable in high-stakes applications like order fraud evaluation. Offline RL algorithms learn from logged data collected from the environment, without the need for online interaction, making them suitable for our use case. We show that offline RL methods outperform traditional binary classification solutions in SimStore, a simplified e-commerce simulation that incorporates order fraud risk. We also propose a novel approach to training offline RL policies that adds a new loss term during training, to better align policy exploration with taking correct actions.
Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Rae, Jack W., Borgeaud, Sebastian, Cai, Trevor, Millican, Katie, Hoffmann, Jordan, Song, Francis, Aslanides, John, Henderson, Sarah, Ring, Roman, Young, Susannah, Rutherford, Eliza, Hennigan, Tom, Menick, Jacob, Cassirer, Albin, Powell, Richard, Driessche, George van den, Hendricks, Lisa Anne, Rauh, Maribeth, Huang, Po-Sen, Glaese, Amelia, Welbl, Johannes, Dathathri, Sumanth, Huang, Saffron, Uesato, Jonathan, Mellor, John, Higgins, Irina, Creswell, Antonia, McAleese, Nat, Wu, Amy, Elsen, Erich, Jayakumar, Siddhant, Buchatskaya, Elena, Budden, David, Sutherland, Esme, Simonyan, Karen, Paganini, Michela, Sifre, Laurent, Martens, Lena, Li, Xiang Lorraine, Kuncoro, Adhiguna, Nematzadeh, Aida, Gribovskaya, Elena, Donato, Domenic, Lazaridou, Angeliki, Mensch, Arthur, Lespiau, Jean-Baptiste, Tsimpoukelli, Maria, Grigorev, Nikolai, Fritz, Doug, Sottiaux, Thibault, Pajarskas, Mantas, Pohlen, Toby, Gong, Zhitao, Toyama, Daniel, d'Autume, Cyprien de Masson, Li, Yujia, Terzi, Tayfun, Mikulik, Vladimir, Babuschkin, Igor, Clark, Aidan, Casas, Diego de Las, Guy, Aurelia, Jones, Chris, Bradbury, James, Johnson, Matthew, Hechtman, Blake, Weidinger, Laura, Gabriel, Iason, Isaac, William, Lockhart, Ed, Osindero, Simon, Rimell, Laura, Dyer, Chris, Vinyals, Oriol, Ayoub, Kareem, Stanway, Jeff, Bennett, Lorrayne, Hassabis, Demis, Kavukcuoglu, Koray, Irving, Geoffrey
Natural language communication is core to intelligence, as it allows ideas to be efficiently shared between humans or artificially intelligent systems. The generality of language allows us to express many intelligence tasks as taking in natural language input and producing natural language output. Autoregressive language modelling -- predicting the future of a text sequence from its past -- provides a simple yet powerful objective that admits formulation of numerous cognitive tasks. At the same time, it opens the door to plentiful training data: the internet, books, articles, code, and other writing. However this training objective is only an approximation to any specific goal or application, since we predict everything in the sequence rather than only the aspects we care about. Yet if we treat the resulting models with appropriate caution, we believe they will be a powerful tool to capture some of the richness of human intelligence. Using language models as an ingredient towards intelligence contrasts with their original application: transferring text over a limited-bandwidth communication channel. Shannon's Mathematical Theory of Communication (Shannon, 1948) linked the statistical modelling of natural language with compression, showing that measuring the cross entropy of a language model is equivalent to measuring its compression rate.
TF-Replicator: Distributed Machine Learning for Researchers
Buchlovsky, Peter, Budden, David, Grewe, Dominik, Jones, Chris, Aslanides, John, Besse, Frederic, Brock, Andy, Clark, Aidan, Colmenarejo, Sergio Gómez, Pope, Aedan, Viola, Fabio, Belov, Dan
We describe TF-Replicator, a framework for distributed machine learning designed for DeepMind researchers and implemented as an abstraction over TensorFlow. TF-Replicator simplifies writing data-parallel and model-parallel research code. The same models can be effortlessly deployed to different cluster architectures (i.e. one or many machines containing CPUs, GPUs or TPU accelerators) using synchronous or asynchronous training regimes. To demonstrate the generality and scalability of TF-Replicator, we implement and benchmark three very different models: (1) A ResNet-50 for ImageNet classification, (2) a SN-GAN for class-conditional ImageNet image generation, and (3) a D4PG reinforcement learning agent for continuous control. Our results show strong scalability performance without demanding any distributed systems expertise of the user. The TF-Replicator programming model will be open-sourced as part of TensorFlow 2.0 (see https://github.com/tensorflow/community/pull/25).
Learning by Demonstration for a Collaborative Planning Environment
Myers, Karen (SRI International) | Kolojejchic, Jake (General Dynamics C4 Systems | Viz) | Angiolillo, Carl (General Dynamics C4 Systems | Viz) | Cummings, Tim (General Dynamics C4 Systems | Viz) | Garvey, Tom (SRI International) | Gaston, Matt (Carnegie Mellon University) | Gervasio, Melinda (SRI International) | Haines, Will (SRI International) | Jones, Chris (SRI International) | Keifer, Kellie (SRI International) | Knittel, Janette (General Dynamics C4 Systems | Viz) | Morley, David (SRI International) | Ommert, William (General Dynamics C4 Systems | Viz) | Potter, Scott (General Dynamics C4 Systems | Viz)
We then describe the process of getting to deployment, covering Task learning provides tremendous value for technical challenges encountered, unit CPOF by enabling individual users and collective engagement activities, and an Army-led assessment command staffs to create customized, automated of the technology. Next, we discuss the fielding information-management schemes tailored to of the technology, including tradeoffs made to individual preferences and the staff's standard ensure deployability, the impact of the deployed operating procedures, without needing software technology, and lessons learned. We close with a engineers for extensive recoding. Task learning can summary of ongoing work to deploy additional reduce work load and stress, can enable managing functionality and to broaden the user base for task more tasks with better effectiveness, and can facilitate learning in CPOF.
Learning by Demonstration Technology for Military Planning and Decision Making: A Deployment Story
Myers, Karen (SRI International) | Kolojejchick, Jake (General Dynamics C4 Systems) | Angiolillo, Carl (General Dynamics C4 Systems) | Cummings, Tim (General Dynamics C4 Systems) | Garvey, Tom (SRI International) | Gervasio, Melinda (SRI International) | Haines, Will (SRI International) | Jones, Chris (SRI International) | Knittel, Janette (General Dynamics C4 Systems) | Morley, David (SRI International) | Ommert, William (General Dynamics C4 Systems) | Potter, Scott (General Dynamics C4 Systems)
Learning by demonstration technology has long held the promise to empower non-programmers to customize and extend software. We describe the deployment of a learning by demonstration capability to support user creation of automated procedures in a collaborative planning environment that is used widely by the U.S. Army. This technology, which has been in operational use since the summer of 2010, has helped to reduce user workloads by automating repetitive and time-consuming tasks. The technology has also provided the unexpected benefit of enabling standardization of products and processes.