AITopics

2008.05598

Country:

Europe > Netherlands > South Holland > Leiden (0.04)
North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
North America > United States > Massachusetts > Middlesex County > Reading (0.04)
(4 more...)

Genre: Overview (1.00)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Iskander, Julie, Hossny, Mohammed

An ocular biomechanics environment for reinforcement learning

arXiv.org Artificial IntelligenceAug-11-2020

Reinforcement learning has been applied to human movement through physiologically-based biomechanical models to add insights into the neural control of these movements; it is also useful in the design of prosthetics and robotics. In this paper, we extend the use of reinforcement learning into controlling an ocular biomechanical system to perform saccades, which is one of the fastest eye movement systems. We describe an ocular environment and an agent trained using Deep Deterministic Policy Gradients method to perform saccades. The agent was able to match the desired eye position with a mean deviation angle of 3:5+/-1:25 degrees. The proposed framework is a first step towards using the capabilities of deep reinforcement learning to enhance our understanding of ocular biomechanics.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

2008.05088

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Oceania > Australia (0.04)
North America > United States (0.04)

Genre: Research Report (0.40)

Industry:

Health & Medicine > Health Care Technology (0.62)
Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceAug-11-2020

HEX and Neurodynamic Programming

Banerjee, Debangshu

Hex is a complex game with a high branching factor. For the first time Hex is being attempted to be solved without the use of game tree structures and associated methods of pruning. We also are abstaining from any heuristic information about Virtual Connections or Semi Virtual Connections which were previously used in all previous known computer versions of the game. The H-search algorithm which was the basis of finding such connections and had been used with success in previous Hex playing agents has been forgone. Instead what we use is reinforcement learning through self play and approximations through neural networks to by pass the problem of high branching factor and maintaining large tables for state-action evaluations. Our code is based primarily on NeuroHex. The inspiration is drawn from the recent success of AlphaGo Zero.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

2008.06359

Country:

North America > Canada > Alberta (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(4 more...)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment > Games > Hex (0.46)
Leisure & Entertainment > Games > Go (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

#artificialintelligenceAug-10-2020, 14:20:28 GMT

Army advances learning capabilities of drone swarms

Army researchers developed a reinforcement learning approach that will allow swarms of unmanned aerial and ground vehicles to optimally accomplish various missions while minimizing performance uncertainty.Swarming is a method of operations where multiple autonomous systems act as a cohesive unit by actively coordinating their actions.Army researchers said future multi-domain battles will require swarms of dynamically coupled, coordinated heterogeneous mobile platforms to overmatch enemy capabilities and threats targeting U.S. forces.The Army is looking to swarming technology to be able to execute time-consuming or dangerous tasks, said Dr. Jemin George of the U.S. Army Combat Capabilities Development Command's Army Research Laboratory."Finding optimal guidance policies for these swarming vehicles in real-time is a key requirement for enhancing warfighters' tactical situational awareness, allowing the U.S. Army to dominate in a contested environment," George said.Reinforcement learning ...

machine learning, reinforcement, reinforcement learning, (12 more...)

#artificialintelligence

Country:

North America > United States > Oklahoma (0.05)
North America > United States > North Carolina (0.05)
North America > United States > Maryland > Prince George's County > Adelphi (0.05)

Industry:

Government > Military > Army (1.00)
Government > Regional Government > North America Government > United States Government (0.79)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

AIHubAug-10-2020, 11:01:31 GMT

Maintaining the illusion of reality: transfer in RL by keeping agents in the DARC

Reinforcement learning (RL) is often touted as a promising approach for costly and risk-sensitive applications, yet practicing and learning in those domains directly is expensive. It costs time (e.g., OpenAI's Dota2 project used 10,000 years of experience), it costs money (e.g., "inexpensive" robotic arms used in research typically cost 10,000 to 30,000 dollars), and it could even be dangerous to humans. How can an intelligent agent learn to solve tasks in environments in which it cannot practice? For many tasks, such as assistive robotics and self-driving cars, we may have access to a different practice area, which we will call the source domain. While the source domain has different dynamics than the target domain, experience in the source domain is much cheaper to collect.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

AIHub

Genre: Research Report (0.68)

Industry: Transportation > Ground > Road (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.35)

Ahmed, Ibrahim, Khorasgani, Hamed, Biswas, Gautam

Comparison of Model Predictive and Reinforcement Learning Methods for Fault Tolerant Control

A desirable property in fault-tolerant controllers is adaptability to system changes as they evolve during systems operations. An adaptive controller does not require optimal control policies to be enumerated for possible faults. Instead it can approximate one in real-time. We present two adaptive fault-tolerant control schemes for a discrete time system based on hierarchical reinforcement learning. We compare their performance against a model predictive controller in presence of sensor noise and persistent faults. The controllers are tested on a fuel tank model of a C-130 plane. Our experiments demonstrate that reinforcement learning-based controllers perform more robustly than model predictive controllers under faults, partially observable system models, and varying sensor noise levels.

artificial intelligence, controller, reinforcement learning, (18 more...)

2008.04403

Genre: Research Report (0.50)

Industry: Energy > Oil & Gas (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Ahmed, Ibrahim, Quiñones-Grueiro, Marcos, Biswas, Gautam

Fault-Tolerant Control of Degrading Systems with On-Policy Reinforcement Learning

We propose a novel adaptive reinforcement learning control approach for fault tolerant control of degrading systems that is not preceded by a fault detection and diagnosis step. Therefore, \textit{a priori} knowledge of faults that may occur in the system is not required. The adaptive scheme combines online and offline learning of the on-policy control method to improve exploration and sample efficiency, while guaranteeing stable learning. The offline learning phase is performed using a data-driven model of the system, which is frequently updated to track the system's operating conditions. We conduct experiments on an aircraft fuel transfer system to demonstrate the effectiveness of our approach.

artificial intelligence, downstream oil & gas, tank, (20 more...)

2008.04407

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Industry: Energy > Oil & Gas > Downstream (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Kovač, Grgur, Laversanne-Finot, Adrien, Oudeyer, Pierre-Yves

GRIMGEP: Learning Progress for Robust Goal Sampling in Visual Deep Reinforcement Learning

Although recent work in reinforcement learning has shown that robots can learn complex individual skills such as grasping [2], locomotion [3, 4], and manipulation tasks [5], designing reinforcement learning algorithms that perform well in sparse reward scenarios is still an open challenge of artificial intelligence. Standard reinforcement learning algorithms struggle in the sparse reward scenario because they rely on simple exploration behavior such as random actions. As a result, learning complex tasks often requires manually collecting examples [6, 7] or running learning algorithms over a long period of time which may not be possible in real life scenarios. Designing better exploration schemes would help agents autonomously discover interesting features that can then be used to learn the long term objective. Developing efficient exploration algorithms would thus help create a more autonomous learning agent. Several approaches have been considered in order to improve the exploration performances of reinforcement learning algorithms. One approach is to reward the agent for discovering novel observations in the form of an intrinsic reward that is added to the original reward of the environment [8].

artificial intelligence, machine learning, reinforcement learning, (15 more...)

2008.04388

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.83)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Bilevel Learning Model Towards Industrial Scheduling

Li, Longkang, Zhen, Hui-Ling, Yuan, Mingxuan, Lu, Jiawen, XialiangTong, null, Zeng, Jia, Wang, Jun, Schnieders, Dirk

Automatic industrial scheduling, aiming at optimizing the sequence of jobs over limited resources, is widely needed in manufacturing industries. However, existing scheduling systems heavily rely on heuristic algorithms, which either generate ineffective solutions or compute inefficiently when job scale increases. Thus, it is of great importance to develop new large-scale algorithms that are not only efficient and effective, but also capable of satisfying complex constraints in practice. In this paper, we propose a Bilevel Deep reinforcement learning Scheduler, \textit{BDS}, in which the higher level is responsible for exploring an initial global sequence, whereas the lower level is aiming at exploitation for partial sequence refinements, and the two levels are connected by a sliding-window sampling mechanism. In the implementation, a Double Deep Q Network (DDQN) is used in the upper level and Graph Pointer Network (GPN) lies within the lower level. After the theoretical guarantee for the convergence of BDS, we evaluate it in an industrial automatic warehouse scenario, with job number up to $5000$ in each production line. It is shown that our proposed BDS significantly outperforms two most used heuristics, three strong deep networks, and another bilevel baseline approach. In particular, compared with the most used greedy-based heuristic algorithm in real world which takes nearly an hour, our BDS can decrease the makespan by 27.5\%, 28.6\% and 22.1\% for 3 largest datasets respectively, with computational time less than 200 seconds.

machine learning, reinforcement learning, scheduling problem, (18 more...)

2008.0413

Country: Asia > China > Hong Kong (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Nair, Suraj, Savarese, Silvio, Finn, Chelsea

Goal-Aware Prediction: Learning to Model What Matters

Learned dynamics models combined with both planning and policy learning algorithms have shown promise in enabling artificial agents to learn to perform many diverse tasks with limited supervision. However, one of the fundamental challenges in using a learned forward dynamics model is the mismatch between the objective of the learned model (future state reconstruction), and that of the downstream planner or policy (completing a specified task). This issue is exacerbated by vision-based control tasks in diverse real-world environments, where the complexity of the real world dwarfs model capacity. In this paper, we propose to direct prediction towards task relevant information, enabling the model to be aware of the current task and encouraging it to only model relevant quantities of the state space, resulting in a learning objective that more closely matches the downstream task. Further, we do so in an entirely self-supervised manner, without the need for a reward function or image labels. We find that our method more effectively models the relevant parts of the scene conditioned on the goal, and as a result outperforms standard task-agnostic dynamics models and model-free reinforcement learning.

machine learning, reinforcement learning, trajectory, (15 more...)

2007.0717

Country:

Europe > Austria > Vienna (0.14)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > Florida > Broward County > Fort Lauderdale (0.04)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)