AITopics

Optimistic initialisation is an effective strategy for efficient exploration in reinforcement learning (RL). In the tabular case, all provably efficient model-free algorithms rely on it. However, model-free deep RL algorithms do not use optimistic initialisation despite taking inspiration from these provably efficient tabular algorithms. In particular, in scenarios with only positive rewards, Q-values are initialised at their lowest possible values due to commonly used network initialisation schemes, a pessimistic initialisation. Merely initialising the network to output optimistic Q-values is not enough, since we cannot ensure that they remain optimistic for novel state-action pairs, which is crucial for exploration. We propose a simple count-based augmentation to pessimistically initialised Q-values that separates the source of optimism from the neural network. We show that this scheme is provably efficient in the tabular setting and extend it to the deep RL setting. Our algorithm, Optimistic Pessimistically Initialised Q-Learning (OPIQ), augments the Q-value estimates of a DQN-based agent with count-derived bonuses to ensure optimism during both action selection and bootstrapping. We show that OPIQ outperforms non-optimistic DQN variants that utilise a pseudocount-based intrinsic motivation in hard exploration tasks, and that it predicts optimistic estimates for novel state-action pairs.

action selection, exploration, state-action pair, (16 more...)

2002.12174

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Improving the Performance of Stochastic Local Search for Maximum Vertex Weight Clique Problem Using Programming by Optimization

Chu, Yi, Luo, Chuan, Hoos, Holger H., Lin, QIngwei, You, Haihang

The maximum vertex weight clique problem (MVWCP) is an important generalization of the maximum clique problem (MCP) that has a wide range of real-world applications. In situations where rigorous guarantees regarding the optimality of solutions are not required, MVWCP is usually solved using stochastic local search (SLS) algorithms, which also define the state of the art for solving this problem. However, there is no single SLS algorithm which gives the best performance across all classes of MVWCP instances, and it is challenging to effectively identify the most suitable algorithm for each class of MVWCP instances. In this work, we follow the paradigm of Programming by Optimization (PbO) to develop a new, flexible and highly parametric SLS framework for solving MVWCP, combining, for the first time, a broad range of effective heuristic mechanisms. By automatically configuring this PbO-MWC framework, we achieve substantial advances in the state-of-the-art in solving MVWCP over a broad range of prominent benchmarks, including two derived from real-world applications in transplantation medicine (kidney exchange) and assessment of research excellence.

algorithm, benchmark, pbo-mwc, (17 more...)

2002.11909

Country:

Europe > Netherlands > South Holland > Leiden (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)

Nguyen, Ngoc Duy, Nguyen, Thanh Thi, Nguyen, Hai, Nahavandi, Saeid

Review, Analyze, and Design a Comprehensive Deep Reinforcement Learning Framework

Reinforcement learning (RL) has emerged as a standard approach for building an intelligent system, which involves multiple self-operated agents to collectively accomplish a designated task. More importantly, there has been a great attention to RL since the introduction of deep learning that essentially makes RL feasible to operate in high-dimensional environments. However, current research interests are diverted into different directions, such as multi-agent and multi-objective learning, and human-machine interactions. Therefore, in this paper, we propose a comprehensive software architecture that not only plays a vital role in designing a connect-the-dots deep RL architecture but also provides a guideline to develop a realistic RL application in a short time span. By inheriting the proposed architecture, software managers can foresee any challenges when designing a deep RL-based system. As a result, they can expedite the design process and actively control every stage of software development, which is especially critical in agile development environments. For this reason, we designed a deep RL-based framework that strictly ensures flexibility, robustness, and scalability. Finally, to enforce generalization, the proposed architecture does not depend on a specific RL algorithm, a network configuration, the number of agents, or the type of agents.

agent, learning, reinforcement learning, (11 more...)

doi: 10.13140/RG.2.2.16789.06883

2002.11883

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report (1.00)
Overview (0.68)

Industry:

Information Technology (1.00)
Transportation > Ground > Road (0.46)
Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Nguyen, Ngoc Duy, Nguyen, Thanh Thi, Nahavandi, Saeid

A Visual Communication Map for Multi-Agent Deep Reinforcement Learning

Multi-agent learning distinctly poses significant challenges in the effort to allocate a concealed communication medium. Agents receive thorough knowledge from the medium to determine subsequent actions in a distributed nature. Apparently, the goal is to leverage the cooperation of multiple agents to achieve a designated objective efficiently. Recent studies typically combine a specialized neural network with reinforcement learning to enable communication between agents. This approach, however, limits the number of agents or necessitates the homogeneity of the system. In this paper, we have proposed a more scalable approach that not only deals with a great number of agents but also enables collaboration between dissimilar functional agents and compatibly combined with any deep reinforcement learning methods. Specifically, we create a global communication map to represent the status of each agent in the system visually. The visual map and the environmental state are fed to a shared-parameter network to train multiple agents concurrently. Finally, we select the Asynchronous Advantage Actor-Critic (A3C) algorithm to demonstrate our proposed scheme, namely Visual communication map for Multi-agent A3C (VMA3C). Simulation results show that the use of visual communication map improves the performance of A3C regarding learning speed, reward achievement, and robustness in multi-agent problems.

agent, milk factory, robot, (12 more...)

doi: 10.13140/RG.2.2.13433.62563

2002.11882

Country: Oceania > Australia > Victoria > Melbourne (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Gamma-Reward: A Novel Multi-Agent Reinforcement Learning Method for Traffic Signal Control

Liu, Junjia, Zhang, Huimin, Fu, Zhuang, Wang, Yao

The intelligent control of traffic signal is critical to the optimization of transportation systems. To solve the problem in large-scale road networks, recent research has focused on interactions among intersections, which have shown promising results. However, existing studies pay more attention to the sensation sharing among agents and do not care about the results after taking each action. In this paper, we propose a novel multi-agent interaction mechanism, defined as Gamma-Reward that includes both original Gamma-Reward and Gamma-Attention-Reward, which use the space-time information in the replay buffer to amend the reward of each action, for traffic signal control based on deep reinforcement learning method. We give a detailed theoretical foundation and prove the proposed method can converge to Nash Equilibrium. By extending the idea of Markov Chain to the road network, this interaction mechanism replaces the graph attention method and realizes the decoupling of the road network, which is more in line with practical applications. Simulation and experiment results demonstrate that the proposed model can get better performance than previous studies, by amending the reward. To our best knowledge, our work appears to be the first to treat the road network itself as a Markov Chain.

agent, intersection, road network, (13 more...)

2002.11874

Country:

Asia > China > Zhejiang Province > Hangzhou (0.05)
Asia > China > Shanghai > Shanghai (0.04)
Asia > Middle East > Jordan (0.04)
(2 more...)

Genre: Research Report (0.84)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.54)

Sarkar, Anurag, Yang, Zhihan, Cooper, Seth

Controllable Level Blending between Games using Variational Autoencoders

Previous work explored blending levels from existing games to create levels for a new game that mixes properties of the original games. In this paper, we use Variational Autoencoders (VAEs) for improving upon such techniques. VAEs are artificial neural networks that learn and use latent representations of datasets to generate novel outputs. We train a VAE on level data from Super Mario Bros. and Kid Icarus, enabling it to capture the latent space spanning both games. We then use this space to generate level segments that combine properties of levels from both games. Moreover, by applying evolutionary search in the latent space, we evolve level segments satisfying specific constraints. We argue that these affordances make the VAE-based approach especially suitable for co-creative level design and compare its performance with similar generative models like the GAN and the VAE-GAN.

latent space, proceedings, vae, (15 more...)

2002.11869

Country:

North America > United States > Minnesota > Rice County > Northfield (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Policy Evaluation Networks

Harb, Jean, Schaul, Tom, Precup, Doina, Bacon, Pierre-Luc

Many reinforcement learning algorithms use value functions to guide the search for better policies. These methods estimate the value of a single policy while generalizing across many states. The core idea of this paper is to flip this convention and estimate the value of many policies, for a single set of states. This approach opens up the possibility of performing direct gradient ascent in policy space without seeing any new data. The main challenge for this approach is finding a way to represent complex policies that facilitates learning and generalization. To address this problem, we introduce a scalable, differentiable fingerprinting mechanism that retains essential policy information in a concise embedding. Our empirical results demonstrate that combining these three elements (learned Policy Evaluation Network, policy fingerprints, gradient ascent) can produce policies that outperform those that generated the training data, in zero-shot manner.

gradient ascent, pvn, value function, (13 more...)

2002.11833

Country:

North America > Canada > Quebec > Montreal (0.14)
Asia > Middle East > Jordan (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(10 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Hanika, Tom, Hirth, Johannes

Knowledge Cores in Large Formal Contexts

Knowledge computation tasks are often infeasible for large data sets. This is in particular true when deriving knowledge bases in formal concept analysis (FCA). Hence, it is essential to come up with techniques to cope with this problem. Many successful methods are based on random processes to reduce the size of the investigated data set. This, however, makes them hardly interpretable with respect to the discovered knowledge. Other approaches restrict themselves to highly supported subsets and omit rare and interesting patterns. An essentially different approach is used in network science, called $k$-cores. These are able to reflect rare patterns if they are well connected in the data set. In this work, we study $k$-cores in the realm of FCA by exploiting the natural correspondence to bi-partite graphs. This structurally motivated approach leads to a comprehensible extraction of knowledge cores from large formal contexts data sets.

concept lattice, int, lattice, (17 more...)

2002.11776

Country:

South America > French Guiana > Guyane > Cayenne (0.04)
North America > Canada > Alberta > Census Division No. 15 > Improvement District No. 9 > Banff (0.04)
Europe > Netherlands > South Holland > Dordrecht (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.48)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Thierry, Constance, Dubois, Jean-Christophe, Gall, Yolande Le, Martin, Arnaud

Modelisation de l'incertitude et de l'imprecision de donnees de crowdsourcing : MONITOR

Crowdsourcing is defined as the outsourcing of tasks to a crowd of contributors. The crowd is very diverse on these platforms and includes malicious contributors attracted by the remuneration of tasks and not conscientiously performing them. It is essential to identify these contributors in order to avoid considering their responses. As not all contributors have the same aptitude for a task, it seems appropriate to give weight to their answers according to their qualifications. This paper, published at the ICTAI 2019 conference, proposes a method, MONITOR, for estimating the profile of the contributor and aggregating the responses using belief function theory.

contributeur, fonction, ponse, (17 more...)

2002.11717

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > France > Brittany > Côtes-d'Armor (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence (0.94)
Information Technology > Communications > Social Media > Crowdsourcing (0.75)

Seth, Taniya, Muhuri, Pranab K.

Type-2 Fuzzy Set based Hesitant Fuzzy Linguistic Term Sets for Linguistic Decision Making

Approaches based on computing with words find good applicability in decision making systems. Predominantly finding their basis in type-1 fuzzy sets, computing with words approaches employ type-1 fuzzy sets as semantics of the linguistic terms. However, type-2 fuzzy sets have been proven to be scientifically more appropriate to represent linguistic information in practical systems. They take into account both the intra-uncertainty as well as the inter-uncertainty in cases where the linguistic information comes from a group of experts. Hence in this paper, we propose to introduce linguistic terms whose semantics are denoted by interval type-2 fuzzy sets within the hesitant fuzzy linguistic term set framework, resulting in type-2 fuzzy sets based hesitant fuzzy linguistic term sets. We also introduce a novel method of computing type-2 fuzzy envelopes out of multiple interval type-2 fuzzy sets with trapezoidal membership functions. Furthermore, the proposed framework with interval type-2 fuzzy sets is applied on a supplier performance evaluation scenario. Since humans are predominantly involved in the entire process of supply chain, their feedback is crucial while deciding many factors. Towards the end of the paper, we compare our presented model with various existing models and demonstrate the advantages of the former.

fuzzy envelope, hfltss, linguistic term, (14 more...)

2002.11714

Country:

Asia > India > NCT > Delhi (0.04)
Asia > India > NCT > New Delhi (0.04)
Europe > Switzerland (0.04)
Asia > Bangladesh (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)