AITopics

2003.12828

Country:

North America > United States > Montana (0.04)
Europe > Netherlands (0.04)
Europe > Ireland (0.04)
Africa > South Africa (0.04)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Health Care Providers & Services (0.86)
Health & Medicine > Diagnostic Medicine (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

#artificialintelligenceMar-27-2020, 20:39:05 GMT

Uber details Fiber, a framework for distributed AI model training

A preprint paper coauthored by Uber AI scientists and Jeff Clune, a research team leader at San Francisco startup OpenAI, describes Fiber, an AI development and distributed training platform for methods including reinforcement learning (which spurs AI agents to complete goals via rewards) and population-based learning. The team says that Fiber expands the accessibility of large-scale parallel computation without the need for specialized hardware or equipment, enabling non-experts to reap the benefits of genetic algorithms in which populations of agents evolve rather than individual members. Fiber -- which was developed to power large-scale parallel scientific computation projects like POET -- is available in open source as of this week, on Github. It supports Linux systems running Python 3.6 and up and Kubernetes running on public cloud environments like Google Cloud, and the research team says that it can scale to hundreds or even thousands of machines. As the researchers point out, increasing computation underlies many recent advances in machine learning, with more and more algorithms relying on distributed training for processing an enormous amount of data.

ai model training, fiber, reinforcement, (11 more...)

Country: North America > United States > California > San Francisco County > San Francisco (0.25)

Genre: Research Report > New Finding (0.56)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.41)
Information Technology > Artificial Intelligence > Systems & Languages > Distributed Architectures (0.40)

#artificialintelligenceMar-27-2020, 12:38:11 GMT

google-research/seed_rl

This repository contains an implementation of distributed reinforcement learning agent where both training and inference are performed on the learner. However, any reinforcement learning environment using the gym API can be used. For a detailed description of the architecture please read our paper. Please cite the paper if you use the code from this repository in your work. There are a few steps you need to take before playing with SEED.

docker image, reinforcement, repository, (9 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.82)

#artificialintelligenceMar-27-2020, 05:25:27 GMT

Three Things to Know About Reinforcement Learning

If you are following technology news, you have likely already read about how AI programs trained with reinforcement learning beat human players in board games like Go and chess, as well as video games. As an engineer, scientist, or researcher, you may want to take advantage of this new and growing technology, but where do you start? The best place to begin is to understand what the concept is, how to implement it, and whether it's the right approach for a given problem. If we simplify the concept, at its foundation, reinforcement learning is a type of machine learning that has the potential to solve tough decision-making problems. Reinforcement learning is a type of machine learning in which a computer learns to perform a task through repeated trial-and-error interactions with a dynamic environment.

neural network, reinforcement, training algorithm, (11 more...)

Industry: Leisure & Entertainment > Games (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

#artificialintelligenceMar-27-2020, 01:30:16 GMT

Playing Space Invaders Blind RL & Cross Modality Transfer

In the 1975 film Tommy, the "deaf, dumb, and blind" protagonist overcomes substantial sensory limitations to capture a pinball championship. While it's difficult to imagine playing a video game without being able to see the screen, that was the challenge taken up by AI researchers from INESC-ID and Instituto Superior Técnico in Lisbon and Pittsburgh's Carnegie Mellon University. Using cross-modality transfer techniques and reinforcement learning (RL), the researchers produced an agent that can play video games with only the game audio to guide it. In some respects, an RL policy learned over image and sound inputs succeeding when only sound inputs are available mimics the available sensory data leveraging process that comes as second nature to humans -- we use touch and hearing for example to navigate through a dark room. The new cross-modality transfer RL approach explores how latent representations built by advanced variational autoencoder (VAE) methods might enable RL agents to learn and transfer policies over different input modalities.

modality, rl & cross modality transfer, rl agent, (9 more...)

Country: Europe > Portugal > Lisbon > Lisbon (0.26)

Industry: Leisure & Entertainment > Games (0.62)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.77)

Validation Set Evaluation can be Wrong: An Evaluator-Generator Approach for Maximizing Online Performance of Ranking in E-commerce

Huzhang, Guangda, Pang, Zhen-Jia, Gao, Yongqing, Zhou, Wen-Ji, Da, Qing, Zeng, An-Xiang, Yu, Yang

Learning-to-rank (LTR) has become a key technology in E-commerce applications. Previous LTR approaches followed the supervised learning paradigm so that learned models should match the labeled data point-wisely or pair-wisely. However, we have noticed that global context information, including the total order of items in the displayed webpage, can play an important role in interactions with the customers. Therefore, to approach the best global ordering, the exploration in a large combinatorial space of items is necessary, which requires evaluating orders that may not appear in the labeled data. In this scenario, we first show that the classical data-based metrics can be inconsistent with online performance, or even misleading. We then propose to learn an evaluator and search the best model guided by the evaluator, which forms the evaluator-generator framework for training the group-wise LTR model. The evaluator is learned from the labeled data, and is enhanced by incorporating the order context information. The generator is trained with the supervision of the evaluator by reinforcement learning to generate the best order in the combinatorial space. Our experiments in one of the world's largest retail platforms disclose that the learned evaluator is a much better indicator than classical data-based metrics. Moreover, our LTR model achieves a significant improvement ($\textgreater2\%$) from the current industrial-level pair-wise models in terms of both Conversion Rate (CR) and Gross Merchandise Volume (GMV) in online A/B tests.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

2003.11941

Country:

North America > United States > Oregon > Benton County > Corvallis (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > Alaska > Anchorage Municipality > Anchorage (0.04)
(9 more...)

Genre: Research Report (0.82)

Industry: Information Technology > Services > e-Commerce Services (0.61)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)

Ertefaie, Ashkan, McKay, James R., Oslin, David, Strawderman, Robert L.

Robust Q-learning

arXiv.org Machine LearningMar-27-2020

A dynamic treatment strategy is a sequence of decision rules that maps individual characteristics to a treatment option at each decision point (i.e., a specific point in time in which a treatment is to be considered or altered). An optimal dynamic treatment strategy seeks to make these decisions to maximize a particular expected health outcome (Lavori & Dawson, 2000; Murphy, 2005; Nahum-Shani et al., 2012a; Lei et al., 2012; Davidian et al., 2016). This is similar to clinical decision making whereby care providers tailor the type/dose of treatment over the course of clinical care based on ongoing information regarding patient progress in treatment. The main goal of precision medicine (i.e., developing an effective dynamic treatment strategy) is to use patient characteristics to inform a personalized treatment plan as a sequence of decision rules that leads to the best possible health outcome for each patient (Nahum-Shani et al., 2012a; Chakraborty & Moodie, 2013; Moodie & Kosorok, 2015; Butler et al., 2018). Q-learning is a reinforcement learning algorithm that is widely used to estimate an optimal dynamic treatment strategy using data from multistage randomized clinical trials or observational studies (Watkins & Dayan, 1992; Nahum-Shani et al., 2012b; Laber et al., 2014).

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

2003.12427

Country:

North America > United States > New York (0.04)
North America > United States > Pennsylvania (0.04)
North America > United States > California (0.04)

Genre:

Research Report > Strength High (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Consumer Health (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.87)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Amortila, Philip, Precup, Doina, Panangaden, Prakash, Bellemare, Marc G.

A Distributional Analysis of Sampling-Based Reinforcement Learning Algorithms

We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant step-sizes. We demonstrate its effectiveness by presenting simple and unified proofs of convergence for a variety of commonly-used methods. We show that value-based methods such as TD($\lambda$) and $Q$-Learning have update rules which are contractive in the space of distributions of functions, thus establishing their exponentially fast convergence to a stationary distribution. We demonstrate that the stationary distribution obtained by any algorithm whose target is an expected Bellman update has a mean which is equal to the true value function. Furthermore, we establish that the distributions concentrate around their mean as the step-size shrinks. We further analyse the optimistic policy iteration algorithm, for which the contraction property does not hold, and formulate a probabilistic policy improvement property which entails the convergence of the algorithm.

algorithm, convergence, operator, (14 more...)

2003.12239

Country:

North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > Italy > Sicily > Palermo (0.04)

Genre: Research Report (0.81)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Adaptive Reward-Poisoning Attacks against Reinforcement Learning

Zhang, Xuezhou, Ma, Yuzhe, Singla, Adish, Zhu, Xiaojin

In reward-poisoning attacks against reinforcement learning (RL), an attacker can perturb the environment reward $r_t$ into $r_t+\delta_t$ at each step, with the goal of forcing the RL agent to learn a nefarious policy. We categorize such attacks by the infinity-norm constraint on $\delta_t$: We provide a lower threshold below which reward-poisoning attack is infeasible and RL is certified to be safe; we provide a corresponding upper threshold above which the attack is feasible. Feasible attacks can be further categorized as non-adaptive where $\delta_t$ depends only on $(s_t,a_t, s_{t+1})$, or adaptive where $\delta_t$ depends further on the RL agent's learning process at time $t$. Non-adaptive attacks have been the focus of prior works. However, we show that under mild conditions, adaptive attacks can achieve the nefarious policy in steps polynomial in state-space size $|S|$, whereas non-adaptive attacks require exponential steps. We provide a constructive proof that a Fast Adaptive Attack strategy achieves the polynomial rate. Finally, we show that empirically an attacker can find effective reward-poisoning attacks using state-of-the-art deep RL techniques.

agent, attacker, target state, (15 more...)

2003.12613

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Milani, Stephanie, Topin, Nicholay, Houghton, Brandon, Guss, William H., Mohanty, Sharada P., Nakata, Keisuke, Vinyals, Oriol, Kuno, Noboru Sean

Retrospective Analysis of the 2019 MineRL Competition on Sample Efficient Reinforcement Learning

To facilitate research in the direction of sample-efficient reinforcement learning, we held the MineRL Competition on Sample-Efficient Reinforcement Learning Using Human Priors at the Thirty-fourth Conference on Neural Information Processing Systems (NeurIPS 2019). The primary goal of this competition was to promote the development of algorithms that use human demonstrations alongside reinforcement learning to reduce the number of samples needed to solve complex, hierarchical, and sparse environments. We describe the competition and provide an overview of the top solutions, each of which uses deep reinforcement learning and/or imitation learning. We also discuss the impact of our organizational decisions on the competition as well as future directions for improvement.

competition, participant, retrospective analysis, (12 more...)

2003.05012

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.05)
Europe > Sweden > Skåne County > Malmö (0.05)

Genre:

Overview (0.55)
Research Report (0.54)

Industry: Leisure & Entertainment > Games > Computer Games (0.71)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)