Country
Hierarchical Deep Q-Network with Forgetting from Imperfect Demonstrations in Minecraft
Skrynnik, Alexey, Staroverov, Aleksey, Aitygulov, Ermek, Aksenov, Kirill, Davydov, Vasilii, Panov, Aleksandr I.
We present hierarchical Deep Q-Network with Forgetting (HDQF) that took first place in MineRL competition. HDQF works on imperfect demonstrations utilize hierarchical structure of expert trajectories extracting effective sequence of meta-actions and subgoals. We introduce structured task dependent replay buffer and forgetting technique that allow the HDQF agent to gradually erase poor-quality expert data from the buffer. In this paper we present the details of the HDQF algorithm and give the experimental results in Minecraft domain.
Semantic integration of disease-specific knowledge
Nentidis, Anastasios, Bougiatiotis, Konstantinos, Krithara, Anastasia, Paliouras, Georgios
Motivation: Biomedical researchers working on a specific disease need up-to-date and unified access to knowledge relevant to the disease of their interest. Knowledge is continuously accumulated in scientific literature and other resources such as biomedical ontologies. Identifying the specific information needed is a challenging task and computational tools can be valuable. In this study, we propose a pipeline to automatically retrieve and integrate relevant knowledge based on a semantic graph representation, the iASiS Open Data Graph . Results: The disease-specific semantic graph can provide easy access to resources relevant to specific concepts and individual aspects of these concepts, in the form of concept relations and attributes. The proposed approach is applied to three different case studies: T wo prevalent diseases, Lung Cancer and Dementia, for which a lot of knowledge is available, and one rare disease, Duchenne Muscular Dystrophy, for which knowledge is less abundant and difficult to locate. Results from exemplary queries are presented, investigating the potential of this approach in integrating and accessing knowledge as an automatically generated semantic graph.
Relational Mimic for Visual Adversarial Imitation Learning
Blondé, Lionel, Tang, Yichuan Charlie, Zhang, Jian, Webb, Russ
In this work, we introduce a new method for imitation learning from video demonstrations. Our method, Relational Mimic (RM), improves on previous visual imitation learning methods by combining generative adversarial networks and relational learning. RM is flexible and can be used in conjunction with other recent advances in generative adversarial imitation learning to better address the need for more robust and sample-efficient approaches. In addition, we introduce a new neural network architecture that improves upon the previous state-of-the-art in reinforcement learning and illustrate how increasing the relational reasoning capabilities of the agent enables the latter to achieve increasingly higher performance in a challenging locomotion task with pixel inputs. Finally, we study the effects and contributions of relational learning in policy evaluation, policy improvement and reward learning through ablation studies.
MALA: Cross-Domain Dialogue Generation with Action Learning
Huang, Xinting, Qi, Jianzhong, Sun, Yu, Zhang, Rui
Response generation for task-oriented dialogues involves two basic components: dialogue planning and surface realization. These two components, however, have a discrepancy in their objectives, i.e., task completion and language quality. To deal with such discrepancy, conditioned response generation has been introduced where the generation process is factorized into action decision and language generation via explicit action representations. To obtain action representations, recent studies learn latent actions in an unsupervised manner based on the utterance lexical similarity. Such an action learning approach is prone to diversities of language surfaces, which may impinge task completion and language quality. To address this issue, we propose multistage adaptive latent action learning (MALA) that learns semantic latent actions by distinguishing the effects of utterances on dialogue progress. We model the utterance effect using the transition of dialogue states caused by the utterance and develop a semantic similarity measurement that estimates whether utterances have similar effects. For learning semantic actions on domains without dialogue states, MALA extends the semantic similarity measurement across domains progressively, i.e., from aligning shared actions to learning domain-specific actions. Experiments using multi-domain datasets, SMD and MultiWOZ, show that our proposed model achieves consistent improvements over the baselines models in terms of both task completion and language quality. 1 Introduction Task-oriented dialogue systems complete tasks for users, such as making a restaurant reservation or scheduling a meeting, in a multi-turn conversation (Gao, Galley, and Li 2018; Sun et al. 2016; Sun et al. 2017).
Multi-channel Reverse Dictionary Model
Zhang, Lei, Qi, Fanchao, Liu, Zhiyuan, Wang, Yasheng, Liu, Qun, Sun, Maosong
A reverse dictionary takes the description of a target word as input and outputs the target word together with other words that match the description. Inspired by the description-to-word inference process of humans, we propose the multi-channel reverse dictionary model, which can mitigate the two problems simultaneously. Our model comprises a sentence encoder and multiple predictors. The predictors are expected to identify different characteristics of the target word from the input query. We evaluate our model on English and Chinese datasets including both dictionary definitions and human-written descriptions. Experimental results show that our model achieves the state-of-the-art performance, and even outperforms the most popular commercial reverse dictionary system on the human-written description dataset. We also conduct quantitative analyses and a case study to demonstrate the effectiveness and robustness of our model. All the code and data of this work can be obtained on https://github.com/thunlp/MultiRD. Introduction A regular (forward) dictionary maps words to definitions while a reverse dictionary (Sierra 2000) does the opposite and maps descriptions to corresponding words. In Figure 1, for example, a regular dictionary tells you that "expressway" is "a wide road that allows traffic to travel fast", and when you input "a road where cars go very quickly without stopping" to a reverse dictionary, it might return "expressway" together with other semantically similar words like "freeway". Reverse dictionaries have great practical value.
Collective Embedding-based Entity Alignment via Adaptive Features
Zeng, Wexin, Zhao, Xiang, Tang, Jiuyang, Lin, Xuemin
--Entity alignment (EA) identifies entities that refer to the same real-world object but locate in different knowledge graphs (KGs), and has been harnessed for KG construction and integration. When generating EA results, current embedding-based solutions treat entities independently and fail to take into account the interdependence between entities. In addition, most of embedding-based EA methods either fuse different features on representation-level and generate unified entity embedding for alignment, which potentially causes information loss, or aggregate features on outcome-level with hand-tuned weights, which is not practical with increasing numbers of features. T o tackle these deficiencies, we propose a collective embedding-based EA framework with adaptive feature fusion mechanism. We first employ three representative features, i.e., structural, semantic and string signals, for capturing different aspects of the similarity between entities in heterogeneous KGs. These features are then integrated at outcome-level, with dynamically assigned weights generated by our carefully devised adaptive feature fusion strategy. Eventually, in order to make collective EA decisions, we formulate EA as the classical stable matching problem between entities to be aligned, with preference lists constructed using fused feature matrix. It is further effectively solved by deferred acceptance algorithm. Our proposal is evaluated on both cross-lingual and monolingual EA benchmarks against state-of- the-art solutions, and the empirical results verify its effectiveness and superiority. We also perform ablation study to gain insights into framework modules. I NTRODUCTION Knowledge graph (KG) is playing an increasingly more important role in intelligent information services, e.g., information retrieval [27], automatic question answering [14] and recommendation systems [3]. Despite that a large number of KGs have been constructed over recent years, none of them can reach full coverage . These KGs, however, usually contain complementary contents, making it compelling to study the integration of heterogeneous KGs. To incorporate the knowledge from target KGs into the source KG, an indispensable step would be entity alignment (EA). EA aims to discover entities that have the same meaning but locate in different KGs.
Balancing the Tradeoff between Profit and Fairness in Rideshare Platforms During High-Demand Hours
Nanda, Vedant, Xu, Pan, Sankararaman, Karthik Abinav, Dickerson, John P., Srinivasan, Aravind
Rideshare platforms, when assigning requests to drivers, tend to maximize profit for the system and/or minimize waiting time for riders. Such platforms can exacerbate biases that drivers may have over certain types of requests. We consider the case of peak hours when the demand for rides is more than the supply of drivers. Drivers are well aware of their advantage during the peak hours and can choose to be selective about which rides to accept. Moreover, if in such a scenario, the assignment of requests to drivers (by the platform) is made only to maximize profit and/or minimize wait time for riders, requests of a certain type (e.g. from a non-popular pickup location, or to a non-popular drop-off location) might never be assigned to a driver. Such a system can be highly unfair to riders. However, increasing fairness might come at a cost of the overall profit made by the rideshare platform. To balance these conflicting goals, we present a flexible, non-adaptive algorithm, \lpalg, that allows the platform designer to control the profit and fairness of the system via parameters $\alpha$ and $\beta$ respectively. We model the matching problem as an online bipartite matching where the set of drivers is offline and requests arrive online. Upon the arrival of a request, we use \lpalg to assign it to a driver (the driver might then choose to accept or reject it) or reject the request. We formalize the measures of profit and fairness in our setting and show that by using \lpalg, the competitive ratios for profit and fairness measures would be no worse than $\alpha/e$ and $\beta/e$ respectively. Extensive experimental results on both real-world and synthetic datasets confirm the validity of our theoretical lower bounds. Additionally, they show that $\lpalg$ under some choice of $(\alpha, \beta)$ can beat two natural heuristics, Greedy and Uniform, on \emph{both} fairness and profit.
Artificial Agents Learn Flexible Visual Representations by Playing a Hiding Game
Weihs, Luca, Kembhavi, Aniruddha, Han, Winson, Herrasti, Alvaro, Kolve, Eric, Schwenk, Dustin, Mottaghi, Roozbeh, Farhadi, Ali
The ubiquity of embodied gameplay, observed in a wide variety of animal species including turtles and ravens, has led researchers to question what advantages play provides to the animals engaged in it. Mounting evidence suggests that play is critical in developing the neural flexibility for creative problem solving, socialization, and can improve the plasticity of the medial prefrontal cortex. Comparatively little is known regarding the impact of gameplay upon embodied artificial agents. While recent work has produced artificial agents proficient in abstract games, the environments these agents act within are far removed the real world and thus these agents provide little insight into the advantages of embodied play. Hiding games have arisen in multiple cultures and species, and provide a rich ground for studying the impact of embodied gameplay on representation learning in the context of perspective taking, secret keeping, and false belief understanding. Here we are the first to show that embodied adversarial reinforcement learning agents playing cache, a variant of hide-and-seek, in a high fidelity, interactive, environment, learn representations of their observations encoding information such as occlusion, object permanence, free space, and containment; on par with representations learnt by the most popular modern paradigm for visual representation learning which requires large datasets independently labeled for each new task. Our representations are enhanced by intent and memory, through interaction and play, moving closer to biologically motivated learning strategies. These results serve as a model for studying how facets of vision and perspective taking develop through play, provide an experimental framework for assessing what is learned by artificial agents, and suggest that representation learning should move from static datasets and towards experiential, interactive, learning.
From Reinforcement Learning to Optimal Control: A unified framework for sequential decisions
There are over 15 distinct communities that work in the general area of sequential decisions and information, often referred to as decisions under uncertainty or stochastic optimization. We focus on two of the most important fields: stochastic optimal control, with its roots in deterministic optimal control, and reinforcement learning, with its roots in Markov decision processes. Building on prior work, we describe a unified framework that covers all 15 different communities, and note the strong parallels with the modeling framework of stochastic optimal control. By contrast, we make the case that the modeling framework of reinforcement learning, inherited from discrete Markov decision processes, is quite limited. Our framework (and that of stochastic control) is based on the core problem of optimizing over policies. We describe four classes of policies that we claim are universal, and show that each of these two fields have, in their own way, evolved to include examples of each of these four classes.
Taming an autonomous surface vehicle for path following and collision avoidance using deep reinforcement learning
Meyer, Eivind, Robinson, Haakon, Rasheed, Adil, San, Omer
Eivind Meyer is currently working on his Master's thesis, completing his five-year integrated Master's degree in Cybernetics and Robotics at the Norwegian University of Science and Technology (NTNU) in Trondheim. Having specialized in Real Time Systems, his research interests focus on adopting state-of-the-art Artificial Intelligence methods for Autonomous Vehicle Control. Haakon Robinson is a PhD candidate at the Norwegian University of Science and Technology (NTNU). He received a Bachelors degree in Physics in 2015 and completed a Masters degree in Cybernetics and Robotics in 2019, both at NTNU. His current work investigates the overlap between modern machine learning techniques and established methods within modelling and control, with a focus on improving the interpretability and be-E Meyer et al.: Preprint submitted to Elsevier Page 15 of 16 Taming an ASV for path following and collision avoidance using DRL havioural guarantees of hybrid models that combine first principle models and data-driven components.