AITopics

1903.03216

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Education (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Dialog-based Interactive Image Retrieval

Guo, Xiaoxiao, Wu, Hui, Cheng, Yu, Rennie, Steven, Tesauro, Gerald, Feris, Rogerio

Existing methods for interactive image retrieval have demonstrated the merit of integrating user feedback, improving retrieval results. However, most current systems rely on restricted forms of user feedback, such as binary relevance responses, or feedback based on a fixed set of relative attributes, which limits their impact. In this paper, we introduce a new approach to interactive image search that enables users to provide feedback via natural language, allowing for more natural and effective interaction. We formulate the task of dialog-based interactive image retrieval as a reinforcement learning problem, and reward the dialog system for improving the rank of the target image during each dialog turn. To mitigate the cumbersome and costly process of collecting human-machine conversations as the dialog system learns, we train our system with a user simulator, which is itself trained to describe the differences between target and candidate images. The efficacy of our approach is demonstrated in a footwear retrieval application. Experiments on both simulated and real-world data show that 1) our proposed learning framework achieves better accuracy than other supervised and reinforcement learning baselines and 2) user feedback based on natural language rather than pre-specified attributes leads to more effective retrieval results, and a more natural and expressive communication interface.

deep learning, image retrieval, neural network, (18 more...)

Country: North America > Canada (0.14)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

Learning Abstract Options

Riemer, Matthew, Liu, Miao, Tesauro, Gerald

Building systems that autonomously create temporal abstractions from data is a key challenge in scaling learning and planning in reinforcement learning. One popular approach for addressing this challenge is the options framework (Sutton et al., 1999). However, only recently in (Bacon et al., 2017) was a policy gradient theorem derived for online learning of general purpose options in an end to end fashion. In this work, we extend previous work on this topic that only focuses on learning a two-level hierarchy including options and primitive actions to enable learning simultaneously at multiple resolutions in time. We achieve this by considering an arbitrarily deep hierarchy of options where high level temporally extended options are composed of lower level options with finer resolutions in time. We extend results from (Bacon et al., 2017) and derive policy gradient theorems for a deep hierarchy of options. Our proposed hierarchical option-critic architecture is capable of learning internal policies, termination conditions, and hierarchical compositions over options without the need for any intrinsic rewards or subgoals. Our empirical results in both discrete and continuous environments demonstrate the efficiency of our framework.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Country:

North America > United States > Massachusetts (0.14)
North America > Canada (0.14)

Industry:

Education (0.66)
Leisure & Entertainment > Games (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Learning Abstract Options

Riemer, Matthew, Liu, Miao, Tesauro, Gerald

Building systems that autonomously create temporal abstractions from data is a key challenge in scaling learning and planning in reinforcement learning. One popular approach for addressing this challenge is the options framework [29]. However, only recently in [1] was a policy gradient theorem derived for online learning of general purpose options in an end to end fashion. In this work, we extend previous work on this topic that only focuses on learning a two-level hierarchy including options and primitive actions to enable learning simultaneously at multiple resolutions in time. We achieve this by considering an arbitrarily deep hierarchy of options where high level temporally extended options are composed of lower level options with finer resolutions in time. We extend results from [1] and derive policy gradient theorems for a deep hierarchy of options. Our proposed hierarchical option-critic architecture is capable of learning internal policies, termination conditions, and hierarchical compositions over options without the need for any intrinsic rewards or subgoals. Our empirical results in both discrete and continuous environments demonstrate the efficiency of our framework.

architecture, artificial intelligence, computer game, (16 more...)

Country:

North America > United States > Massachusetts (0.14)
North America > Canada (0.14)

Industry:

Education (0.66)
Leisure & Entertainment > Games (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Dialog-based Interactive Image Retrieval

Guo, Xiaoxiao, Wu, Hui, Cheng, Yu, Rennie, Steven, Tesauro, Gerald, Feris, Rogerio

Existing methods for interactive image retrieval have demonstrated the merit of integrating user feedback, improving retrieval results. However, most current systems rely on restricted forms of user feedback, such as binary relevance responses, or feedback based on a fixed set of relative attributes, which limits their impact. In this paper, we introduce a new approach to interactive image search that enables users to provide feedback via natural language, allowing for more natural and effective interaction. We formulate the task of dialog-based interactive image retrieval as a reinforcement learning problem, and reward the dialog system for improving the rank of the target image during each dialog turn. To mitigate the cumbersome and costly process of collecting human-machine conversations as the dialog system learns, we train our system with a user simulator, which is itself trained to describe the differences between target and candidate images. The efficacy of our approach is demonstrated in a footwear retrieval application. Experiments on both simulated and real-world data show that 1) our proposed learning framework achieves better accuracy than other supervised and reinforcement learning baselines and 2) user feedback based on natural language rather than pre-specified attributes leads to more effective retrieval results, and a more natural and expressive communication interface.

deep learning, image retrieval, neural network, (17 more...)

Country: North America > Canada (0.14)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.70)

arXiv.org Artificial IntelligenceDec-2-2018

Learning to Learn without Forgetting By Maximizing Transfer and Minimizing Interference

Riemer, Matthew, Cases, Ignacio, Ajemian, Robert, Liu, Miao, Rish, Irina, Tu, Yuhai, Tesauro, Gerald

Lack of performance when it comes to continual learning over non-stationary distributions of data remains a major challenge in scaling neural network learning to more human realistic settings. In this work we propose a new conceptualization of the continual learning problem in terms of a temporally symmetric trade-off between transfer and interference that can be optimized by enforcing gradient alignment across examples. We then propose a new algorithm, Meta-Experience Replay (MER), that directly exploits this view by combining experience replay with optimization based meta-learning. This method learns parameters that make interference based on future gradients less likely and transfer based on future gradients more likely. We conduct experiments across continual lifelong supervised learning benchmarks and non-stationary reinforcement learning environments demonstrating that our approach consistently outperforms recently proposed baselines for continual learning. Our experiments show that the gap between the performance of MER and baseline algorithms grows both as the environment gets more non-stationary and as the fraction of the total experiences stored gets smaller.

educational setting, learning, neural network, (20 more...)

1810.1191

Country: North America > United States > Texas > Travis County > Austin (0.14)

Genre: Research Report (1.00)

Industry:

Education (1.00)
Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)

arXiv.org Artificial IntelligenceNov-6-2018

Learning Abstract Options

Riemer, Matthew, Liu, Miao, Tesauro, Gerald

Building systems that autonomously create temporal abstractions from data is a key challenge in scaling learning and planning in reinforcement learning. One popular approach for addressing this challenge is the options framework (Sutton et al., 1999). However, only recently in (Bacon et al., 2017) was a policy gradient theorem derived for online learning of general purpose options in an end to end fashion. In this work, we extend previous work on this topic that only focuses on learning a two-level hierarchy including options and primitive actions to enable learning simultaneously at multiple resolutions in time. We achieve this by considering an arbitrarily deep hierarchy of options where high level temporally extended options are composed of lower level options with finer resolutions in time. We extend results from (Bacon et al., 2017) and derive policy gradient theorems for a deep hierarchy of options. Our proposed hierarchical option-critic architecture is capable of learning internal policies, termination conditions, and hierarchical compositions over options without the need for any intrinsic rewards or subgoals. Our empirical results in both discrete and continuous environments demonstrate the efficiency of our framework.

deep learning, neural network, terminate, (17 more...)

1810.11583

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment > Games (0.47)
Education > Educational Setting (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceMay-22-2018

Learning to Teach in Cooperative Multiagent Reinforcement Learning

Omidshafiei, Shayegan, Kim, Dong-Ki, Liu, Miao, Tesauro, Gerald, Riemer, Matthew, Amato, Christopher, Campbell, Murray, How, Jonathan P.

We present a framework and algorithm for peer-to-peer teaching in cooperative multiagent reinforcement learning. Our algorithm, Learning to Coordinate and Teach Reinforcement (LeCTR), trains advising policies by using students' learning progress as a teaching reward. Agents using LeCTR learn to assume the role of a teacher or student at the appropriate moments, exchanging action advice to accelerate the entire learning process. Our algorithm supports teaching heterogeneous teammates, advising under communication constraints, and learns both what and when to advise. LeCTR is demonstrated to outperform the final performance and rate of learning of prior teaching methods on multiple benchmark domains. To our knowledge, this is the first approach for learning to teach in a multiagent setting.

agent, artificial intelligence, survey article, (17 more...)

1805.0783

Genre: Research Report (0.82)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

arXiv.org Artificial IntelligenceApr-26-2018

Evidence Aggregation for Answer Re-Ranking in Open-Domain Question Answering

Wang, Shuohang, Yu, Mo, Jiang, Jing, Zhang, Wei, Guo, Xiaoxiao, Chang, Shiyu, Wang, Zhiguo, Klinger, Tim, Tesauro, Gerald, Campbell, Murray

A popular recent approach to answering open-domain questions is to first search for question-related passages and then apply reading comprehension models to extract answers. Existing methods usually extract answers from single passages independently. But some questions require a combination of evidence from across different sources to answer correctly. In this paper, we propose two models which make use of multiple passages to generate their answers. Both use an answer-reranking approach which reorders the answer candidates generated by an existing state-of-the-art QA model. We propose two methods, namely, strength-based re-ranking and coverage-based re-ranking, to make use of the aggregated evidence from different passages to better determine the answer. Our models have achieved state-of-the-art results on three public open-domain QA datasets: Quasar-T, SearchQA and the open-domain version of TriviaQA, with about 8 percentage points of improvement over the former two datasets.

answer candidate, deep learning, neural network, (23 more...)

1711.05116

Country:

Europe > Germany (0.14)
Asia > China (0.14)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Information Management (0.93)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.84)