AITopics

2009.09815

Genre: Research Report (0.82)

Industry:

Telecommunications (0.69)
Information Technology > Robotics & Automation (0.34)
Aerospace & Defense > Aircraft (0.34)

Technology:

Information Technology > Communications > Networks (0.94)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)

Srinivas, Aravind, Krishnamurthy, Ramnandan, Kumar, Peeyush, Ravindran, Balaraman

Option Discovery in Hierarchical Reinforcement Learning using Spatio-Temporal Clustering

arXiv.org Artificial IntelligenceSep-21-2020

This paper introduces an automated skill acquisition framework in reinforcement learning which involves identifying a hierarchical description of the given task in terms of abstract states and extended actions between abstract states. Identifying such structures present in the task provides ways to simplify and speed up reinforcement learning algorithms. These structures also help to generalize such algorithms over multiple tasks without relearning policies from scratch. We use ideas from dynamical systems to find metastable regions in the state space and associate them with abstract states. The spectral clustering algorithm PCCA+ is used to identify suitable abstractions aligned to the underlying structure. Skills are defined in terms of the sequence of actions that lead to transitions between such abstract states. The connectivity information from PCCA+ is used to generate these skills or options. These skills are independent of the learning task and can be efficiently reused across a variety of tasks defined over the same model. This approach works well even without the exact model of the environment by using sample trajectories to construct an approximate estimate. We also present our approach to scaling the skill acquisition framework to complex tasks with large state spaces for which we perform state aggregation using the representation learned from an action conditional video prediction network and use the skill acquisition framework on the aggregated state space.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

1605.05359

Country: North America > United States > Washington > King County > Seattle (0.04)

Genre: Research Report (0.64)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)

Bu, Fanchen, Chang, Dong Eui

Double Prioritized State Recycled Experience Replay

arXiv.org Machine LearningSep-21-2020

Experience replay enables online reinforcement learning agents to store and reuse the previous experiences of interacting with the environment. In the original method, the experiences are sampled and replayed uniformly at random. A prior work called prioritized experience replay was developed where experiences are prioritized, so as to replay experiences seeming to be more important more frequently. In this paper, we develop a method called double-prioritized state-recycled (DPSR) experience replay, prioritizing the experiences in both training stage and storing stage, as well as replacing the experiences in the memory with state recycling to make the best of experiences that seem to have low priorities temporarily. We used this method in Deep Q-Networks (DQN), and achieved a state-of-the-art result, outperforming the original method and prioritized experience replay on many Atari games.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Machine Learning

2007.03961

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.71)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Chandak, Yash, Theocharous, Georgios, Shankar, Shiv, White, Martha, Mahadevan, Sridhar, Thomas, Philip S.

Optimizing for the Future in Non-Stationary MDPs

arXiv.org Machine LearningSep-21-2020

Most reinforcement learning methods are based upon the key assumption that the transition dynamics and reward functions are fixed, that is, the underlying Markov decision process is stationary. However, in many real-world applications, this assumption is violated, and using existing algorithms may result in a performance lag. To proactively search for a good future policy, we present a policy gradient algorithm that maximizes a forecast of future performance. This forecast is obtained by fitting a curve to the counter-factual estimates of policy performance over time, without explicitly modeling the underlying non-stationarity. The resulting algorithm amounts to a non-uniform reweighting of past data, and we observe that minimizing performance over some of the data from past episodes can be beneficial when searching for a policy that maximizes future performance. We show that our algorithm, called Prognosticator, is more robust to non-stationarity than two online adaptation techniques, on three simulated problems motivated by real-world applications.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

2005.08158

Country:

North America > Canada > Alberta (0.14)
North America > United States > Texas > Travis County > Austin (0.14)
North America > United States > Massachusetts (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)
Education (0.68)
Leisure & Entertainment (0.67)
Government > Regional Government > North America Government > United States Government > FDA (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)

arXiv.org Artificial IntelligenceSep-21-2020

Aligning AI With Shared Human Values

Hendrycks, Dan, Burns, Collin, Basart, Steven, Critch, Andrew, Li, Jerry, Song, Dawn, Steinhardt, Jacob

We show how to assess a language model's knowledge of basic concepts of morality. We introduce the ETHICS dataset, a new benchmark that spans concepts in justice, well-being, duties, virtues, and commonsense morality. Models predict widespread moral judgments about diverse text scenarios. This requires connecting physical and social world knowledge to value judgements, a capability that may enable us to steer chatbot outputs or eventually regularize open-ended reinforcement learning agents. With the ETHICS dataset, we find that current language models have a promising but incomplete understanding of basic ethical knowledge. Our work shows that progress can be made on machine ethics today, and it provides a steppingstone toward AI that is aligned with human values.

machine learning, natural language, reinforcement learning, (20 more...)

2008.02275

Country:

Asia > India (0.04)
North America > United States (0.04)
North America > Canada (0.04)
Europe > United Kingdom (0.04)

Genre: Research Report (0.81)

Industry:

Law (1.00)
Government (1.00)
Health & Medicine > Therapeutic Area (0.93)
Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

#artificialintelligenceSep-20-2020, 09:00:40 GMT

[D] Quality Contributions Roundup 9/14

Though the community continues to develop new algorithms, state-of-the-art results have stopped improving in the last couple of years. Since RL algorithms that use a tremendous amount of online data to learn from scratch are infeasible to apply in the real-world, much research has moved to fields such as Meta-RL, offline RL, and integrating RL with domain-knowledge, integrating RL and planning, etc. How do you unit test end-to-end ML pipelines?, by u/farmingvillein As perhaps a bit of tldr: once you've got the bare minimum data-replay testing in place ("yeah, it is probably working, because the results are pretty close to what they were before"), I'd encourage you to consider focusing your energy toward thinking of testing as outlier detection. Outliers, in real-world ML systems, tend to be harbingers of things that are wrong systematically, upstream data problems, and logic (pre-/post-processing) problems. How do you transition from a no name international college to FAIR/Brain?, by u/r-sync Coming from a no-name Indian engineering college with meh grades, you do have to get a bit creative, very persistent and build credibility for yourself. The examples above are one way to do so, but you can also maybe articulate your thoughts as really good blog posts and arxiv papers, or show great software engineering skills in open-source (i.e.

machine learning, quality contribution roundup 9 14, reinforcement learning, (15 more...)

#artificialintelligence

Industry: Media > News (0.41)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.35)

Xu, Mengfan, Klabjan, Diego

Regret Bounds and Reinforcement Learning Exploration of EXP-based Algorithms

EXP-based algorithms are often used for exploration in multi-armed bandit. We revisit the EXP3.P algorithm and establish both the lower and upper bounds of regret in the Gaussian multi-armed bandit setting, as well as a more general distribution option. The analyses do not require bounded rewards compared to classical regret assumptions. We also extend EXP4 from multi-armed bandit to reinforcement learning to incentivize exploration by multiple agents. The resulting algorithm has been tested on hard-to-explore games and it shows an improvement on exploration compared to state-of-the-art.

data mining, machine learning, reinforcement learning, (20 more...)

2009.09538

Country:

North America > United States > Illinois > Cook County > Evanston (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)
Information Technology > Data Science > Data Mining > Big Data (0.74)

Blum, Tamir, Paillet, Gabin, Laine, Mickael, Yoshida, Kazuya

RL STaR Platform: Reinforcement Learning for Simulation based Training of Robots

Reinforcement learning (RL) is a promising field to enhance robotic autonomy and decision making capabilities for space robotics, something which is challenging with traditional techniques due to stochasticity and uncertainty within the environment. RL can be used to enable lunar cave exploration with infrequent human feedback, faster and safer lunar surface locomotion or the coordination and collaboration of multi-robot systems. However, there are many hurdles making research challenging for space robotic applications using RL and machine learning, particularly due to insufficient resources for traditional robotics simulators like CoppeliaSim. Our solution to this is an open source modular platform called Reinforcement Learning for Simulation based Training of Robots, or RL STaR, that helps to simplify and accelerate the application of RL to the space robotics research field. This paper introduces the RL STaR platform, and how researchers can use it through a demonstration.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2009.09595

Country:

Asia > Japan > Honshū > Tōhoku > Miyagi Prefecture > Sendai (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre: Research Report (0.50)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (0.61)
Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Hlynsson, Hlynur Davíð, Schüler, Merlin, Schiewer, Robin, Glasmachers, Tobias, Wiskott, Laurenz

Latent Representation Prediction Networks

Deeply-learned planning methods are often based on learning representations that are optimized for unrelated tasks. For example, they might be trained on reconstructing the environment. These representations are then combined with predictor functions for simulating rollouts to navigate the environment. We find this principle of learning representations unsatisfying and propose to learn them such that they are directly optimized for the task at hand: to be maximally predictable for the predictor function. This results in representations that are by design optimal for the downstream task of planning, where the learned predictor function is used as a forward model. To this end, we propose a new way of jointly learning this representation along with the prediction function, a system we dub Latent Representation Prediction Network (LARP). The prediction function is used as a forward model for search on a graph in a viewpoint-matching task and the representation learned to maximize predictability is found to outperform a pre-trained representation. Our approach is shown to be more sample-efficient than standard reinforcement learning methods and our learned representation transfers successfully to dissimilar objects.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

2009.09439

Country:

Europe > Germany (0.04)
Asia > Malaysia (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Manchella, Kaushik, Umrawal, Abhishek K., Aggarwal, Vaneet

FlexPool: A Distributed Model-Free Deep Reinforcement Learning Algorithm for Joint Passengers & Goods Transportation

The growth in online goods delivery is causing a dramatic surge in urban vehicle traffic from last-mile deliveries. On the other hand, ride-sharing has been on the rise with the success of ride-sharing platforms and increased research on using autonomous vehicle technologies for routing and matching. The future of urban mobility for passengers and goods relies on leveraging new methods that minimize operational costs and environmental footprints of transportation systems. This paper considers combining passenger transportation with goods delivery to improve vehicle-based transportation. Even though the problem has been studied with a defined dynamics model of the transportation system environment, this paper considers a model-free approach that has been demonstrated to be adaptable to new or erratic environment dynamics. We propose FlexPool, a distributed model-free deep reinforcement learning algorithm that jointly serves passengers & goods workloads by learning optimal dispatch policies from its interaction with the environment. The proposed algorithm pools passengers for a ride-sharing service and delivers goods using a multi-hop transit method. These flexibilities decrease the fleet's operational cost and environmental footprint while maintaining service levels for passengers and goods. Through simulations on a realistic multi-agent urban mobility platform, we demonstrate that FlexPool outperforms other model-free settings in serving the demands from passengers & goods. FlexPool achieves 30% higher fleet utilization and 35% higher fuel efficiency in comparison to (i) model-free approaches where vehicles transport a combination of passengers & goods without the use of multi-hop transit, and (ii) model-free approaches where vehicles exclusively transport either passengers or goods.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

2007.13699

Country:

North America > United States > New York (0.05)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)