AITopics

1905.11591

Country: Asia (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (0.56)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

arXiv.org Artificial IntelligenceMay-27-2019

AI-GAs: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence

Clune, Jeff

Perhaps the most ambitious scientific quest in human history is the creation of general artificial intelligence, which roughly means AI that is as smart or smarter than humans. The dominant approach in the machine learning community is to attempt to discover each of the pieces required for intelligence, with the implicit assumption that some future group will complete the Herculean task of figuring out how to combine all of those pieces into a complex thinking machine. I call this the ``manual AI approach.'' This paper describes another exciting path that ultimately may be more successful at producing general AI. It is based on the clear trend in machine learning that hand-designed solutions eventually are replaced by more effective, learned solutions. The idea is to create an AI-generating algorithm (AI-GA), which automatically learns how to produce general AI. Three Pillars are essential for the approach: (1) meta-learning architectures, (2) meta-learning the learning algorithms themselves, and (3) generating effective learning environments. I argue that either approach could produce general AI first, and both are scientifically worthwhile irrespective of which is the fastest path. Because both are promising, yet the ML community is currently committed to the manual approach, I argue that our community should increase its research investment in the AI-GA approach. To encourage such research, I describe promising work in each of the Three Pillars. I also discuss AI-GA-specific safety and ethical considerations. Because it it may be the fastest path to general AI and because it is inherently scientifically interesting to understand the conditions in which a simple algorithm can produce general AI (as happened on Earth where Darwinian evolution produced human intelligence), I argue that the pursuit of AI-GAs should be considered a new grand challenge of computer science research.

evolutionary algorithm, machine learning, reinforcement learning, (14 more...)

1905.10985

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
South America > Peru > Puno Department (0.04)
South America > Peru > Madre de Dios Department (0.04)
(5 more...)

Genre:

Research Report (0.81)
Instructional Material > Course Syllabus & Notes (0.48)

Industry:

Education (1.00)
Leisure & Entertainment > Games (0.92)
Leisure & Entertainment > Sports (0.92)
Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

arXiv.org Machine LearningMay-27-2019

Policy Search by Target Distribution Learning for Continuous Control

Zhang, Chuheng, Li, Yuanqi, Li, Jian

We observe that several existing policy gradient methods (such as vanilla policy gradient, PPO, A2C) may suffer from overly large gradients when the current policy is close to deterministic (even in some very simple environments), leading to an unstable training process. To address this issue, we propose a new method, called \emph{target distribution learning} (TDL), for policy improvement in reinforcement learning. TDL alternates between proposing a target distribution and training the policy network to approach the target distribution. TDL is more effective in constraining the KL divergence between updated policies, and hence leads to more stable policy improvements over iterations. Our experiments show that TDL algorithms perform comparably to (or better than) state-of-the-art algorithms for most continuous control tasks in the MuJoCo environment while being more stable in training.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

1905.11041

Country:

Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

arXiv.org Machine LearningMay-27-2019

Disentangling Dynamics and Returns: Value Function Decomposition with Future Prediction

Tang, Hongyao, Hao, Jianye, Chen, Guangyong, Chen, Pengfei, Meng, Zhaopeng, Yang, Yaodong, Wang, Li

Value functions are crucial for model-free Reinforcement Learning (RL) to obtain a policy implicitly or guide the policy updates. Value estimation heavily depends on the stochasticity of environmental dynamics and the quality of reward signals. In this paper, we propose a two-step understanding of value estimation from the perspective of future prediction, through decomposing the value function into a reward-independent future dynamics part and a policy-independent trajectory return part. We then derive a practical deep RL algorithm from the above decomposition, consisting of a convolutional trajectory representation model, a conditional variational dynamics model to predict the expected representation of future trajectory and a convex trajectory return model that maps a trajectory representation to its return. Our algorithm is evaluated in MuJoCo continuous control tasks and shows superior results under both common settings and delayed reward settings.

algorithm, conditional vae, representation, (13 more...)

1905.111

Country:

Asia > Middle East > Jordan (0.04)
Asia > China > Tianjin Province > Tianjin (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (0.67)
Media > Television (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Tossou, Aristide, Basu, Debabrota, Dimitrakakis, Christos

Near-optimal Optimistic Reinforcement Learning using Empirical Bernstein Inequalities

arXiv.org Artificial IntelligenceMay-27-2019

We study model-based reinforcement learning in an unknown finite communicating Markov decision process. We propose a simple algorithm that leverages a variance based confidence interval. We show that the proposed algorithm, UCRL-V, achieves the optimal regret $\tilde{\mathcal{O}}(\sqrt{DSAT})$ up to logarithmic factors, and so our work closes a gap with the lower bound without additional assumptions on the MDP. We perform experiments in a variety of environments that validates the theoretical bounds as well as prove UCRL-V to be better than the state-of-the-art algorithms.

algorithm, mdp, probability, (16 more...)

1905.12425

Country:

North America > United States > Illinois > Champaign County > Champaign (0.04)
Europe > Sweden > Vaestra Goetaland > Gothenburg (0.04)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Chuck, Caleb, Chockchowwat, Supawit, Niekum, Scott

Hypothesis-Driven Skill Discovery for Hierarchical Deep Reinforcement Learning

arXiv.org Artificial IntelligenceMay-27-2019

Deep reinforcement learning encompasses many versatile tools for designing learning agents that can perform well on a variety of high-dimensional visual tasks, ranging from video games to robotic manipulation. However, these methods typically suffer from poor sample efficiency, partially because they strive to be largely problem-agnostic. In this work, we demonstrate the utility of a different approach that is extremely sample efficient, but limited to object-centric tasks that (approximately) obey basic physical laws. Specifically, we propose the Hypothesis Proposal and Evaluation (HyPE) algorithm, which utilizes a small set of intuitive assumptions about the behavior of objects in the physical world (or in games that mimic physics) to automatically define and learn hierarchical skills in a highly efficient manner. HyPE does this by discovering objects from raw pixel data, generating hypotheses about the controllability of observed changes in object state, and learning a hierarchy of skills that can test these hypotheses and control increasingly complex interactions with objects. We demonstrate that HyPE can dramatically improve sample efficiency when learning a high-quality pixels-to-actions policy; in the popular benchmark task, Breakout, HyPE learns an order of magnitude faster than common baseline reinforcement learning and evolutionary strategies for policy learning.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

1906.01408

Country: North America > United States > Texas > Travis County > Austin (0.14)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games > Computer Games (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Madumal, Prashan, Miller, Tim, Sonenberg, Liz, Vetere, Frank

Explainable Reinforcement Learning Through a Causal Lens

arXiv.org Artificial IntelligenceMay-26-2019

Prevalent theories in cognitive science propose that humans understand and represent the knowledge of the world through causal relationships. In making sense of the world, we build causal models in our mind to encode cause-effect relations of events and use these to explain why new events happen. In this paper, we use causal models to derive causal explanations of behaviour of reinforcement learning agents. We present an approach that learns a structural causal model during reinforcement learning and encodes causal relationships between variables of interest. This model is then used to generate explanations of behaviour based on counterfactual analysis of the causal model. We report on a study with 120 participants who observe agents playing a real-time strategy game (Starcraft II) and then receive explanations of the agents' behaviour. We investigated: 1) participants' understanding gained by explanations through task prediction; 2) explanation satisfaction and 3) trust. Our results show that causal model explanations perform better on these measures compared to two other baseline explanation models.

explanation, machine learning, reinforcement learning, (18 more...)

1905.10958

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
(2 more...)

arXiv.org Machine LearningMay-26-2019

Transcribing Content from Structural Images with Spotlight Mechanism

Yin, Yu, Huang, Zhenya, Chen, Enhong, Liu, Qi, Zhang, Fuzheng, Xie, Xing, Hu, Guoping

Transcribing content from structural images, e.g., writing notes from music scores, is a challenging task as not only the content objects should be recognized, but the internal structure should also be preserved. Existing image recognition methods mainly work on images with simple content (e.g., text lines with characters), but are not capable to identify ones with more complex content (e.g., structured symbols), which often follow a fine-grained grammar. To this end, in this paper, we propose a hierarchical Spotlight Transcribing Network (STN) framework followed by a two-stage "where-to-what" solution. Specifically, we first decide "where-to-look" through a novel spotlight mechanism to focus on different areas of the original image following its structure. Then, we decide "what-to-write" by developing a GRU based network with the spotlight areas for transcribing the content accordingly. Moreover, we propose two implementations on the basis of STN, i.e., STNM and STNR, where the spotlight movement follows the Markov property and Recurrent modeling, respectively. We also design a reinforcement method to refine the framework by self-improving the spotlight mechanism. We conduct extensive experiments on many structural image datasets, where the results clearly demonstrate the effectiveness of STN framework.

machine learning, pattern recognition, reinforcement learning, (21 more...)

doi: 10.1145/3219819.3219962

1905.10954

Country:

Europe > United Kingdom > England > Greater London > London (0.05)
Asia > China > Anhui Province (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Leisure & Entertainment (0.88)
Media > Music (0.67)
Education > Educational Setting (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(5 more...)

arXiv.org Machine LearningMay-26-2019

Selective Transfer with Reinforced Transfer Network for Partial Domain Adaptation

Chen, Zhihong, Chen, Chao, Cheng, Zhaowei, Fang, Ke, Jin, Xinyu

Partial domain adaptation (PDA) extends standard domain adaptation to a more realistic scenario where the target domain only has a subset of classes from the source domain. The key challenge of PDA is how to select the relevant samples in the shared classes for knowledge transfer. Previous PDA methods tackle this problem by re-weighting the source samples based on the prediction of classifier or discriminator, thus discarding the pixel-level information. In this paper, to utilize both high-level and pixel-level information, we propose a reinforced transfer network (RTNet), which is the first work to apply reinforcement learning to address the PDA problem. The RTNet simultaneously mitigates the negative transfer by adopting a reinforced data selector to filter out outlier source classes, and promotes the positive transfer by employing a domain adaptation model to minimize the distribution discrepancy in the shared label space. Extensive experiments indicate that RTNet can achieve state-of-the-art performance for partial domain adaptation tasks on several benchmark datasets. Codes and datasets will be available online.

machine learning, natural language, reinforcement learning, (16 more...)

1905.10756

Country:

Europe > United Kingdom > England (0.28)
Asia > China (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

#artificialintelligenceMay-25-2019, 00:47:47 GMT

AWS DeepRacer TV - Ep 1 Amsterdam

AWS DeepRacer TV follows the world's first autonomous racing league, an AWS / Intel-sponsored reinforcement learning competition that features developers of every background and skill level hoping to qualify for a chance to win the Championship Cup at AWS re:Invent 2019.

amsterdam, aw deepracer tv, ep 1

#artificialintelligence

Country: Europe > Netherlands > North Holland > Amsterdam (0.40)

Technology:

Information Technology > Communications > Social Media (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.41)