AITopics

1906.09205

Genre: Research Report > New Finding (0.48)

Industry:

Education (0.88)
Leisure & Entertainment > Games > Computer Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

#artificialintelligenceJun-20-2019, 20:21:18 GMT

AI Adoption in the Enterprise

While O'Reilly has identified several trends among enterprise companies for adopting artificial intelligence, we decided to drill down further to learn just how businesses worldwide are planning and prioritizing this work. In a recent survey, we asked respondents about revenue-bearing AI projects their organizations have in production. How might their AI adoption patterns change over the course of the next year? In this report, data science experts Ben Lorica and Paco Nathan examine the survey's results to reveal how (and how many) respondents are ramping up AI projects. You'll also learn how the scope of AI use among companies is quickly expanding into deep learning, human in the loop, knowledge graphs, and reinforcement learning.

artificial intelligence, machine learning, reinforcement learning, (4 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.32)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.32)

#artificialintelligenceJun-20-2019, 04:24:17 GMT

Google's AI picks which machine learning models will produce the best results

Leave it to the folks at Google to devise AI capable of predicting which machine learning models will produce the best results. In a newly-published paper ("Off-Policy Evaluation via Off-Policy Classification") and blog post, a team of Google AI researchers propose what they call "off-policy classification," or OPC, which evaluates the performance of AI-driven agents by treating evaluation as a classification problem. The team notes that their approach -- a variant of reinforcement learning, which employs rewards to drive software policies toward goals -- works with image inputs and scales to tasks including vision-based robotic grasping. "Fully off-policy reinforcement learning is a variant in which an agent learns entirely from older data, which is appealing because it enables model iteration without requiring a physical robot," writes Robotics at Google software engineer Alexa Irpan. "With fully off-policy RL, one can train several models on the same fixed dataset collected by previous agents, then select the best one."

best result, machine learning, reinforcement learning, (11 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)

Everitt, Tom, Kumar, Ramana, Krakovna, Victoria, Legg, Shane

Modeling AGI Safety Frameworks with Causal Influence Diagrams

One of the primary goals of AI research is the development of artificial agents that can exceed human performance on a wide range of cognitive tasks, in other words, artificial general intelligence (AGI). Although the development of AGI has many potential benefits, there are also many safety concerns that have been raised in the literature [Bostrom, 2014; Everitt et al., 2018; Amodei et al., 2016]. Various approaches for addressing AGI safety have been proposed [Leike et al., 2018; Christiano et al., 2018; Irving et al., 2018; Hadfield-Menell et al., 2016; Everitt, 2018], often presented as a modification of the reinforcement learning (RL) framework, or a new framework altogether. Understanding and comparing different frameworks for AGI safety can be difficult because they build on differing concepts and assumptions. For example, both reward modeling [Leike et al., 2018] and cooperative inverse RL [Hadfield-Menell et al., 2016] are frameworks for making an agent learn the preferences of a human user, but what are the key differences between them?

artificial intelligence, machine learning, reinforcement learning, (16 more...)

1906.08663

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.54)

Cooperative Lane Changing via Deep Reinforcement Learning

Wang, Guan, Hu, Jianming, Li, Zhiheng, Li, Li

In this paper, we study how to learn an appropriate lane changing strategy for autonomous vehicles by using deep reinforcement learning. We show that the reward of the system should consider the overall traffic efficiency instead of the travel efficiency of an individual vehicle. In summary, cooperation leads to a more harmonic and efficient traffic system rather than competition

artificial intelligence, machine learning, reinforcement learning, (12 more...)

1906.08662

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Asia > China > Beijing > Beijing (0.05)
Asia > China > Guangdong Province > Shenzhen (0.04)
(3 more...)

Genre: Research Report (0.83)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Kim, Jong Wook, Bello, Juan Pablo

Adversarial Learning for Improved Onsets and Frames Music Transcription

arXiv.org Machine LearningJun-20-2019

Automatic music transcription is considered to be one of the hardest problems in music information retrieval, yet recent deep learning approaches have achieved substantial improvements on transcription performance. These approaches commonly employ supervised learning models that predict various time-frequency representations, by minimizing element-wise losses such as the cross entropy function. However, applying the loss in this manner assumes conditional independence of each label given the input, and thus cannot accurately express inter-label dependencies. To address this issue, we introduce an adversarial training scheme that operates directly on the time-frequency representations and makes the output distribution closer to the ground-truth. Through adversarial learning, we achieve a consistent improvement in both frame-level and note-level metrics over Onsets and Frames, a state-of-the-art music transcription model. Our results show that adversarial learning can significantly reduce the error rate while increasing the confidence of the model estimations. Our approach is generic and applicable to any transcription model based on multi-label predictions, which are very common in music signal analysis.

machine learning, onset and frame music transcription, reinforcement learning, (3 more...)

arXiv.org Machine Learning

1906.08512

Genre: Research Report > New Finding (0.53)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.80)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.53)

Addanki, Ravichandra, Venkatakrishnan, Shaileshh Bojja, Gupta, Shreyan, Mao, Hongzi, Alizadeh, Mohammad

Placeto: Learning Generalizable Device Placement Algorithms for Distributed Machine Learning

arXiv.org Machine LearningJun-20-2019

We present Placeto, a reinforcement learning (RL) approach to efficiently find device placements for distributed neural network training. Unlike prior approaches that only find a device placement for a specific computation graph, Placeto can learn generalizable device placement policies that can be applied to any graph. We propose two key ideas in our approach: (1) we represent the policy as performing iterative placement improvements, rather than outputting a placement in one shot; (2) we use graph embeddings to capture relevant information about the structure of the computation graph, without relying on node labels for indexing. These ideas allow Placeto to train efficiently and generalize to unseen graphs. Our experiments show that Placeto requires up to 6.1x fewer training steps to find placements that are on par with or better than the best placements found by prior approaches. Moreover, Placeto is able to learn a generalizable placement policy for any given family of graphs, which can then be used without any retraining to predict optimized placements for unseen graphs from the same family. This eliminates the large overhead incurred by prior RL approaches whose lack of generalizability necessitates re-training from scratch every time a new graph is to be placed.

machine learning, natural language, reinforcement learning, (23 more...)

arXiv.org Machine Learning

1906.08879

Country:

North America > United States (0.68)
Europe (0.46)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.94)

Naderializadeh, Navid, Sydir, Jaroslaw, Simsek, Meryem, Nikopour, Hosein, Talwar, Shilpa

When Multiple Agents Learn to Schedule: A Distributed Radio Resource Management Framework

arXiv.org Machine LearningJun-20-2019

Interference among concurrent transmissions in a wireless network is a key factor limiting the system performance. One way to alleviate this problem is to manage the radio resources in order to maximize either the average or the worst-case performance. However, joint consideration of both metrics is often neglected as they are competing in nature. In this article, a mechanism for radio resource management using multi-agent deep reinforcement learning (RL) is proposed, which strikes the right trade-off between maximizing the average and the $5^{th}$ percentile user throughput. Each transmitter in the network is equipped with a deep RL agent, receiving partial observations from the network (e.g., channel quality, interference level, etc.) and deciding whether to be active or inactive at each scheduling interval for given radio resources, a process referred to as link scheduling. Based on the actions of all agents, the network emits a reward to the agents, indicating how good their joint decisions were. The proposed framework enables the agents to make decisions in a distributed manner, and the reward is designed in such a way that the agents strive to guarantee a minimum performance, leading to a fair resource allocation among all users across the network. Simulation results demonstrate the superiority of our approach compared to decentralized baselines in terms of average and $5^{th}$ percentile user throughput, while achieving performance close to that of a centralized exhaustive search approach. Moreover, the proposed framework is robust to mismatches between training and testing scenarios. In particular, it is shown that an agent trained on a network with low transmitter density maintains its performance and outperforms the baselines when deployed in a network with a higher transmitter density.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

1906.08792

Genre: Research Report (0.70)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Lin, Baihan, Bouneffouf, Djallel, Cecchi, Guillermo

Split Q Learning: Reinforcement Learning with Two-Stream Rewards

Drawing an inspiration from behavioral studies of human decision making, we propose here a general parametric framework for a reinforcement learning problem, which extends the standard Q-learning approach to incorporate a two-stream framework of reward processing with biases biologically associated with several neurological and psychiatric conditions, including Parkinson's and Alzheimer's diseases, attention-deficit/hyperactivity disorder (ADHD), addiction, and chronic pain. For AI community, the development of agents that react differently to different types of rewards can enable us to understand a wide spectrum of multi-agent interactions in complex real-world socioeconomic systems. Moreover, from the behavioral modeling perspective, our parametric framework can be viewed as a first step towards a unifying computational model capturing reward processing abnormalities across multiple mental conditions and user preferences in long-term recommendation systems.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

1906.1235

Country: North America > United States (0.15)

Genre: Research Report (0.83)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Therapeutic Area > Neurology > Attention Deficit/Hyperactivity Disorder (0.87)
Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.56)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Palan, Malayandi, Landolfi, Nicholas C., Shevchuk, Gleb, Sadigh, Dorsa

Learning Reward Functions by Integrating Human Demonstrations and Preferences

Our goal is to accurately and efficiently learn reward functions for autonomous robots. Current approaches to this problem include inverse reinforcement learning (IRL), which uses expert demonstrations, and preference-based learning, which iteratively queries the user for her preferences between trajectories. In robotics however, IRL often struggles because it is difficult to get high-quality demonstrations; conversely, preference-based learning is very inefficient since it attempts to learn a continuous, high-dimensional function from binary feedback. We propose a new framework for reward learning, DemPref, that uses both demonstrations and preference queries to learn a reward function. Specifically, we (1) use the demonstrations to learn a coarse prior over the space of reward functions, to reduce the effective size of the space from which queries are generated; and (2) use the demonstrations to ground the (active) query generation process, to improve the quality of the generated queries. Our method alleviates the efficiency issues faced by standard preference-based learning methods and does not exclusively depend on (possibly low-quality) demonstrations. In numerical experiments, we find that DemPref is significantly more efficient than a standard active preference-based learning method. In a user study, we compare our method to a standard IRL method; we find that users rated the robot trained with DemPref as being more successful at learning their desired behavior, and preferred to use the DemPref system (over IRL) to train the robot.

demonstration, machine learning, reinforcement learning, (17 more...)

1906.08928

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)