AITopics

2009.14665

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > District of Columbia > Washington (0.14)
North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
Europe > Greece (0.04)

Genre: Research Report (0.82)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Rens, Gavin, Raskin, Jean-François, Reynouad, Raphaël, Marra, Giuseppe

Online Learning of Non-Markovian Reward Models

arXiv.org Artificial IntelligenceSep-30-2020

There are situations in which an agent should receive rewards only after having accomplished a series of previous tasks, that is, rewards are non-Markovian. One natural and quite general way to represent history-dependent rewards is via a Mealy machine, a finite state automaton that produces output sequences from input sequences. In our formal setting, we consider a Markov decision process (MDP) that models the dynamics of the environment in which the agent evolves and a Mealy machine synchronized with this MDP to formalize the non-Markovian reward function. While the MDP is known by the agent, the reward function is unknown to the agent and must be learned. Our approach to overcome this challenge is to use Angluin's $L^*$ active learning algorithm to learn a Mealy machine representing the underlying non-Markovian reward machine (MRM). Formal methods are used to determine the optimal strategy for answering so-called membership queries posed by $L^*$. Moreover, we prove that the expected reward achieved will eventually be at least as much as a given, reasonable value provided by a domain expert. We evaluate our framework on three problems. The results show that using $L^*$ to learn an MRM in a non-Markovian reward decision process is effective.

logic & formal reasoning, machine learning, reinforcement learning, (21 more...)

2009.126

Country:

Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > Germany (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Energy (0.46)
Education > Educational Setting > Online (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.48)
(2 more...)

#artificialintelligenceSep-29-2020, 20:00:28 GMT

Reinforcement Learning - How does it work? (Part I)

Today's post is about a Machine Learning area, the Reinforcement Learning (RL). This article seeks to summarise the principal types of algo...

artificial intelligence, machine learning, reinforcement learning, (2 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.60)

#artificialintelligenceSep-29-2020, 06:05:21 GMT

Reinforcement Learning frameworks

This is the post number 20 in the "Deep Reinforcement Learning Explained" series devoted to Reinforcement Learning frameworks. So far, in previous posts, we have been looking at a basic representation of the corpus of RL algorithms (although we have skipped several) that have been relatively easy to program. But from now on, we need to consider both the scale and complexity of the RL algorithms. In this scenario, programming a Reinforcement Learning implementation from scratch can become tedious work with a high risk of programming errors. To address this, the RL community began to build frameworks and libraries to simplify the development of RL algorithms, both by creating new pieces and especially by involving the combination of various algorithmic components.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

#artificialintelligence

Country: North America > United States > California (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Wang, Xing, Vinel, Alexander

Reannealing of Decaying Exploration Based On Heuristic Measure in Deep Q-Network

Existing exploration strategies in reinforcement learning (RL) often either ignore the history or feedback of search, or are complicated to implement. There is also a very limited literature showing their effectiveness over diverse domains. We propose an algorithm based on the idea of reannealing, that aims at encouraging exploration only when it is needed, for example, when the algorithm detects that the agent is stuck in a local optimum. The approach is simple to implement. We perform an illustrative case study showing that it has potential to both accelerate training and obtain a better policy.

artificial intelligence, exploration, upstream oil & gas, (18 more...)

2009.14297

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Industry: Energy > Oil & Gas > Upstream (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Patil, Vihang P., Hofmarcher, Markus, Dinu, Marius-Constantin, Dorfer, Matthias, Blies, Patrick M., Brandstetter, Johannes, Arjona-Medina, Jose A., Hochreiter, Sepp

Align-RUDDER: Learning From Few Demonstrations by Reward Redistribution

Reinforcement Learning algorithms require a large number of samples to solve complex tasks with sparse and delayed rewards. Complex tasks can often be hierarchically decomposed into sub-tasks. A step in the Q-function can be associated with solving a sub-task, where the expectation of the return increases. RUDDER has been introduced to identify these steps and then redistribute reward to them, thus immediately giving reward if sub-tasks are solved. Since the problem of delayed rewards is mitigated, learning is considerably sped up. However, for complex tasks, current exploration strategies as deployed in RUDDER struggle with discovering episodes with high rewards. Therefore, we assume that episodes with high rewards are given as demonstrations and do not have to be discovered by exploration. Typically the number of demonstrations is small and RUDDER's LSTM model as a deep learning method does not learn well. Hence, we introduce Align-RUDDER, which is RUDDER with two major modifications. First, Align-RUDDER assumes that episodes with high rewards are given as demonstrations, replacing RUDDER's safe exploration and lessons replay buffer. Second, we replace RUDDER's LSTM model by a profile model that is obtained from multiple sequence alignment of demonstrations. Profile models can be constructed from as few as two demonstrations as known from bioinformatics. Align-RUDDER inherits the concept of reward redistribution, which considerably reduces the delay of rewards, thus speeding up learning. Align-RUDDER outperforms competitors on complex artificial tasks with delayed reward and few demonstrations. On the MineCraft ObtainDiamond task, Align-RUDDER is able to mine a diamond, though not frequently. Github: https://github.com/ml-jku/align-rudder, YouTube: https://youtu.be/HO-_8ZUl-UY

demonstration, machine learning, reinforcement learning, (17 more...)

2009.14108

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Oceania > Australia > Queensland > Brisbane (0.04)
(6 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Materials > Metals & Mining (0.93)
Leisure & Entertainment > Games > Computer Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Benhamou, Eric, Saltiel, David, Ungari, Sandrine, Mukhopadhyay, Abhishek, Atif, Jamal

AAMDRL: Augmented Asset Management with Deep Reinforcement Learning

Can an agent learn efficiently in a noisy and self adapting environment with sequential, non-stationary and non-homogeneous observations? Through trading bots, we illustrate how Deep Reinforcement Learning (DRL) can tackle this challenge. Our contributions are threefold: (i) the use of contextual information also referred to as augmented state in DRL, (ii) the impact of a one period lag between observations and actions that is more realistic for an asset management environment, (iii) the implementation of a new repetitive train test method called walk forward analysis, similar in spirit to cross validation for time series. Although our experiment is on trading bots, it can easily be translated to other bot environments that operate in sequential environment with regime changes and noisy data. Our experiment for an augmented asset manager interested in finding the best portfolio for hedging strategies shows that AAMDRL achieves superior returns and lower risk.

deep reinforcement learning, information, portfolio, (10 more...)

2010.08497

Country:

Europe > France (0.05)
Asia > Middle East > Jordan (0.04)
North America > United States > New York (0.04)

Genre: Research Report (0.64)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.93)

AUBER: Automated BERT Regularization

Lee, Hyun Dong, Lee, Seongmin, Kang, U

How can we effectively regularize BERT? Although BERT proves its effectiveness in various downstream natural language processing tasks, it often overfits when there are only a small number of training instances. A promising direction to regularize BERT is based on pruning its attention heads based on a proxy score for head importance. However, heuristic-based methods are usually suboptimal since they predetermine the order by which attention heads are pruned. In order to overcome such a limitation, we propose AUBER, an effective regularization method that leverages reinforcement learning to automatically prune attention heads from BERT. Instead of depending on heuristics or rule-based policies, AUBER learns a pruning policy that determines which attention heads should or should not be pruned for regularization. Experimental results show that AUBER outperforms existing pruning methods by achieving up to 10% better accuracy. In addition, our ablation study empirically demonstrates the effectiveness of our design choices for AUBER.

machine learning, natural language, reinforcement learning, (16 more...)

2009.14409

Country:

Asia > South Korea > Seoul > Seoul (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(3 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.69)

Mozaffar, Mojtaba, Ebrahimi, Ablodghani, Cao, Jian

Toolpath design for additive manufacturing using deep reinforcement learning

Additive Manufacturing (AM) processes offer unique capabilities to build low-volume parts with complex geometries and fast prototyping from a variety of materials. Metal-based AM has become increasingly more popular over the last decade for manufacturing and repairing functional parts in automotive, medical and aerospace industries. Despite the great potential in metal-based AM market, the state-of-the-art practices involve rigorous trial and errors before achieving consistent parts with the desired geometric and material properties, which is mainly due to the sensitivity of the build on process parameters. While the influence of process parameters such as laser power, powder parameters, and scan speed on the microstructure and final properties of the AM build are extensively studied in the literature, the influence of toolpath strategies yet to be fully investigated. Authors in [Steuben et al., 2016] considered three different toolpath patterns for building a part using a fused deposition modeling process and demonstrated that the pattern has a significant effect on the ultimate strength and elastic modulus of the build. Akram et al. [Akram et al., 2018] formulated a microstructure model using a Cellular Automata (CA) and demonstrated a strong correlation between the toolpath pattern (i.e., unidirectional and bidirectional) and the grain orientations.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

2009.14365

Country: North America > United States > Illinois > Cook County > Evanston (0.05)

Genre: Research Report > New Finding (0.46)

Industry:

Aerospace & Defense (0.74)
Machinery > Industrial Machinery (0.64)
Government > Regional Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Gupta, Saurabh, Bhambri, Siddhant, Dhingra, Karan, Buduru, Arun Balaji, Kumaraguru, Ponnurangam

Multi-objective Reinforcement Learning based approach for User-Centric Power Optimization in Smart Home Environments

Smart homes require every device inside them to be connected with each other at all times, which leads to a lot of power wastage on a daily basis. As the devices inside a smart home increase, it becomes difficult for the user to control or operate every individual device optimally. Therefore, users generally rely on power management systems for such optimization but often are not satisfied with the results. In this paper, we present a novel multi-objective reinforcement learning framework with two-fold objectives of minimizing power consumption and maximizing user satisfaction. The framework explores the trade-off between the two objectives and converges to a better power management policy when both objectives are considered while finding an optimal policy. We experiment on real-world smart home data, and show that the multi-objective approaches: i) establish trade-off between the two objectives, ii) achieve better combined user satisfaction and power consumption than single-objective approaches. We also show that the devices that are used regularly and have several fluctuations in device modes at regular intervals should be targeted for optimization, and the experiments on data from other smart homes fetch similar results, hence ensuring transfer-ability of the proposed framework.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

2009.13854

Country:

North America > United States (0.46)
Asia > India > NCT > Delhi (0.04)

Genre: Research Report (0.50)

Industry: Information Technology > Smart Houses & Appliances (1.00)

Technology:

Information Technology > Internet of Things (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)