AITopics

2002.08537

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > China (0.04)
(5 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

arXiv.org Machine LearningFeb-19-2020

From Poincar\'e Recurrence to Convergence in Imperfect Information Games: Finding Equilibrium via Regularization

Perolat, Julien, Munos, Remi, Lespiau, Jean-Baptiste, Omidshafiei, Shayegan, Rowland, Mark, Ortega, Pedro, Burch, Neil, Anthony, Thomas, Balduzzi, David, De Vylder, Bart, Piliouras, Georgios, Lanctot, Marc, Tuyls, Karl

In this paper we investigate the Follow the Regularized Leader dynamics in sequential imperfect information games (IIG). We generalize existing results of Poincar\'e recurrence from normal-form games to zero-sum two-player imperfect information games and other sequential game settings. We then investigate how adapting the reward (by adding a regularization term) of the game can give strong convergence guarantees in monotone games. We continue by showing how this reward adaptation technique can be leveraged to build algorithms that converge exactly to the Nash equilibrium. Finally, we show how these insights can be directly used to build state-of-the-art model-free algorithms for zero-sum two-player Imperfect Information Games (IIG).

equilibrium, init, nash equilibrium, (13 more...)

2002.08456

Country:

North America > Canada > Alberta (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Kaspar, Manuel, Osorio, Juan David Munoz, Bock, Jürgen

Sim2Real Transfer for Reinforcement Learning without Dynamics Randomization

arXiv.org Artificial IntelligenceFeb-19-2020

In this work we show how to use the Operational Space Control framework (OSC) under joint and cartesian constraints for reinforcement learning in cartesian space. Our method is therefore able to learn fast and with adjustable degrees of freedom, while we are able to transfer policies without additional dynamics randomizations on a KUKA LBR iiwa peg in-hole task. Before learning in simulation starts, we perform a system identification for aligning the simulation environment as far as possible with the dynamics of a real robot. Adding constraints to the OSC controller allows us to learn in a safe way on the real robot or to learn a flexible, goal conditioned policy that can be easily transferred from simulation to the real robot.

randomization, robot, simulation, (16 more...)

2002.11635

Country: Europe > Germany (0.14)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (0.35)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Zhang, Rusheng, Zhou, Xinze, Tonguz, Ozan K.

Using AI for Mitigating the Impact of Network Delay in Cloud-based Intelligent Traffic Signal Control

arXiv.org Artificial IntelligenceFeb-19-2020

The recent advancements in cloud services, Internet of Things (IoT) and Cellular networks have made cloud computing an attractive option for intelligent traffic signal control (ITSC). Such a method significantly reduces the cost of cables, installation, number of devices used, and maintenance. ITSC systems based on cloud computing lower the cost of the ITSC systems and make it possible to scale the system by utilizing the existing powerful cloud platforms. While such systems have significant potential, one of the critical problems that should be addressed is the network delay. It is well known that network delay in message propagation is hard to prevent, which could potentially degrade the performance of the system or even create safety issues for vehicles at intersections. In this paper, we introduce a new traffic signal control algorithm based on reinforcement learning, which performs well even under severe network delay. The framework introduced in this paper can be helpful for all agent-based systems using remote computing resources where network delay could be a critical concern. Extensive simulation results obtained for different scenarios show the viability of the designed algorithm to cope with network delay.

algorithm, network delay, vehicle, (14 more...)

2002.08303

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.05)

Genre: Research Report (0.64)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)
Information Technology > Services (1.00)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.88)

arXiv.org Artificial IntelligenceFeb-19-2020

Efficient Deep Reinforcement Learning through Policy Transfer

Yang, Tianpei, Hao, Jianye, Meng, Zhaopeng, Zhang, Zongzhang, Wang, Weixun, Hu, Yujing, Cheng, Yingfeng, Fan, Changjie, Wang, Zhaodong, Peng, Jiajie

Transfer Learning (TL) has shown great potential to accelerate Reinforcement Learning (RL) by leveraging prior knowledge from past learned policies of relevant tasks. Existing transfer approaches either explicitly computes the similarity between tasks or select appropriate source policies to provide guided explorations for the target task. However, how to directly optimize the target policy by alternatively utilizing knowledge from appropriate source policies without explicitly measuring the similarity is currently missing. In this paper, we propose a novel Policy Transfer Framework (PTF) to accelerate RL by taking advantage of this idea. Our framework learns when and which source policy is the best to reuse for the target policy and when to terminate it by modeling multi-policy transfer as the option learning problem. PTF can be easily combined with existing deep RL approaches. Experimental results show it significantly accelerates the learning process and surpasses state-of-the-art policy transfer methods in terms of learning efficiency and final performance in both discrete and continuous action spaces.

learning, probability, source policy, (14 more...)

2002.08037

Country:

Oceania > New Zealand > North Island > Auckland Region > Auckland (0.05)
Asia > China > Tianjin Province > Tianjin (0.04)
North America > United States > Washington (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report (0.84)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Machine LearningFeb-19-2020

Value-driven Hindsight Modelling

Guez, Arthur, Viola, Fabio, Weber, Théophane, Buesing, Lars, Kapturowski, Steven, Precup, Doina, Silver, David, Heess, Nicolas

Value estimation is a critical component of the reinforcement learning (RL) paradigm. The question of how to effectively learn predictors for value from data is one of the major problems studied by the RL community, and different approaches exploit structure in the problem domain in different ways. Model learning can make use of the rich transition structure present in sequences of observations, but this approach is usually not sensitive to the reward function. In contrast, model-free methods directly leverage the quantity of interest from the future but have to compose with a potentially weak scalar signal (an estimate of the return). In this paper we develop an approach for representation learning in RL that sits in between these two extremes: we propose to learn what to model in a way that can directly help value prediction. To this end we determine which features of the future trajectory provide useful information to predict the associated return. This provides us with tractable prediction targets that are directly relevant for a task, and can thus accelerate learning of the value function. The idea can be understood as reasoning, in hindsight, about which aspects of the future observations could help past value prediction. We show how this can help dramatically even in simple policy evaluation settings. We then test our approach at scale in challenging domains, including on 57 Atari 2600 games.

information, machine learning, reinforcement learning, (17 more...)

2002.08329

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment > Games (0.68)
Leisure & Entertainment > Sports (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

#artificialintelligenceFeb-18-2020, 08:24:21 GMT

How To Make Sure Your Robot Doesn't Drop Your Wine Glass

From microelectronics to mechanics and machine learning, the modern-day robots are a marvel of multiple engineering disciplines. They use sensors, image processing and reinforcement learning algorithms to move the objects around and move around the obstacles as well. However, this is not the case when it comes to handling objects such as glass. The surface properties of glass are transparent, and non-uniform light reflection makes it difficult for the sensors mounted on the robot to understand how to engage in a simple pick and place operation. To address this problem, researchers at Google AI along with Synthesis AI and Columbia University devised a novel machine-learning algorithm called ClearGrasp, that is capable of estimating accurate 3D data of transparent objects from RGB-D images.

algorithm, cleargrasp, sensor, (7 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.57)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.35)

#artificialintelligenceFeb-18-2020, 04:17:10 GMT

Using Rotation, Translation, and Cropping to Boost Generalization in Deep Reinforcement Learning…

"Generalization" is an AI buzzword these days for good reason: most scientists would love to see the models they're training in simulations and video game environments evolve and expand to take on meaningful real-world challenges -- for example in safety, conservation, medicine, etc. One concerned research area is deep reinforcement learning (DRL), which implements deep learning architectures with reinforcement learning algorithms to enable AI agents to learn the best actions possible to attain their goals in virtual environments. DRL has been widely applied in games and robotics. Such DRL agents have an impressive track record on Starcraft II and Dota-2. But because they were trained in fixed environments, studies suggest DRL agents can fail to generalize to even slight variations of their training environments. In a new paper, researchers from the New York University and Modl.ai, a company applying machine learning to game developing, suggest that simple spacial processing methods such as rotation, translation and cropping could help increase model generality.

agent, generalization, translation, (7 more...)

#artificialintelligence

Country: North America > United States > New York (0.26)

Industry: Leisure & Entertainment > Games > Computer Games (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceFeb-18-2020

MoTiAC: Multi-Objective Actor-Critics for Real-Time Bidding

Yang, Chaoqi, Lu, Junwei, Gao, Xiaofeng, Liu, Haishan, Chen, Qiong, Liu, Gongshen, Chen, Guihai

Online real-time bidding (RTB) is known as a complex auction game where ad platforms seek to consider various influential key performance indicators (KPIs), like revenue and return on investment (ROI). The trade-off among these competing goals needs to be balanced on a massive scale. To address the problem, we propose a multi-objective reinforcement learning algorithm, named MoTiAC, for the problem of bidding optimization with various goals. Specifically, in MoTiAC, instead of using a fixed and linear combination of multiple objectives, we compute adaptive weights overtime on the basis of how well the current state agrees with the agent's prior. In addition, we provide interesting properties of model updating and further prove that Pareto optimality could be guaranteed. We demonstrate the effectiveness of our method on a real-world commercial dataset. Experiments show that the model outperforms all state-of-the-art baselines.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

2002.07408

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > United States > Illinois > Champaign County > Champaign (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.50)

Industry: Marketing (0.95)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Su, Yi, Srinath, Pavithra, Krishnamurthy, Akshay

Adaptive Estimator Selection for Off-Policy Evaluation

arXiv.org Machine LearningFeb-18-2020

We develop a generic data-driven method for estimator selection in off-policy policy evaluation settings. We establish a strong performance guarantee for the method, showing that it is competitive with the oracle estimator, up to a constant factor. Via in-depth case studies in contextual bandits and reinforcement learning, we demonstrate the generality and applicability of the method. We also perform comprehensive experiments, demonstrating the empirical efficacy of our approach and comparing with related approaches. In both case studies, our method compares favorably with existing methods.

cnf, estimator, evaluation, (15 more...)

2002.07729

Country:

North America > United States > New York > Tompkins County > Ithaca (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)