AITopics | Agarwal, Rishabh

Collaborating Authors

Agarwal, Rishabh

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning

Kumar, Aviral, Agarwal, Rishabh, Ghosh, Dibya, Levine, Sergey

arXiv.org Machine LearningOct-27-2020

We identify an implicit under-parameterization phenomenon in value-based deep RL methods that use bootstrapping: when value functions, approximated using deep neural networks, are trained with gradient descent using iterated regression onto target values generated by previous instances of the value network, more gradient updates decrease the expressivity of the current value network. We characterize this loss of expressivity in terms of a drop in the rank of the learned value network features, and show that this corresponds to a drop in performance. We demonstrate this phenomenon on widely studies domains, including Atari and Gym benchmarks, in both offline and online RL settings. We formally analyze this phenomenon and show that it results from a pathological interaction between bootstrapping and gradient-based optimization. We further show that mitigating implicit under-parameterization by controlling rank collapse improves performance.

effective rank, neural network, singular value, (20 more...)

arXiv.org Machine Learning

2010.14498

Country: North America (0.28)

Genre: Research Report > New Finding (0.92)

Industry:

Leisure & Entertainment > Games (0.47)
Telecommunications > Networks (0.34)
Information Technology > Networks (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.58)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

IITK at SemEval-2020 Task 10: Transformers for Emphasis Selection

Singhal, Vipul, Dhull, Sahil, Agarwal, Rishabh, Modi, Ashutosh

arXiv.org Artificial IntelligenceJul-21-2020

This paper describes the system proposed for addressing the research problem posed in Task 10 of SemEval-2020: Emphasis Selection For Written Text in Visual Media. We propose an end-to-end model that takes as input the text and corresponding to each word gives the probability of the word to be emphasized. Our results show that transformer-based models are particularly effective in this task. We achieved the best Matchm score (described in section 2.2) of 0.810 and were ranked third on the leaderboard.

arxiv preprint arxiv, deep learning, neural network, (16 more...)

arXiv.org Artificial Intelligence

2007.1082

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

RL Unplugged: Benchmarks for Offline Reinforcement Learning

Gulcehre, Caglar, Wang, Ziyu, Novikov, Alexander, Paine, Tom Le, Colmenarejo, Sergio Gomez, Zolna, Konrad, Agarwal, Rishabh, Merel, Josh, Mankowitz, Daniel, Paduraru, Cosmin, Dulac-Arnold, Gabriel, Li, Jerry, Norouzi, Mohammad, Hoffman, Matt, Nachum, Ofir, Tucker, George, Heess, Nicolas, de Freitas, Nando

arXiv.org Machine LearningJul-21-2020

Offline methods for reinforcement learning have a potential to help bridge the gap between reinforcement learning research and real-world applications. They make it possible to learn policies from offline datasets, thus overcoming concerns associated with online data collection in the real-world, including cost, safety, or ethical concerns. In this paper, we propose a benchmark called RL Unplugged to evaluate and compare offline RL methods. RL Unplugged includes data from a diverse range of domains including games (e.g., Atari benchmark) and simulated motor control problems (e.g., DM Control Suite). The datasets include domains that are partially or fully observable, use continuous or discrete actions, and have stochastic vs. deterministic dynamics. We propose detailed evaluation protocols for each domain in RL Unplugged and provide an extensive analysis of supervised learning and offline RL methods using these protocols. We will release data for all our tasks and open-source all algorithms presented in this paper. We hope that our suite of benchmarks will increase the reproducibility of experiments and make it possible to study challenging tasks with a limited computational budget, thus making RL research both more systematic and more accessible across the community. Moving forward, we view RL Unplugged as a living benchmark suite that will evolve and grow with datasets contributed by the research community and ourselves. Our project page is available on https://git.io/JJUhd.

dataset, deep learning, neural network, (17 more...)

arXiv.org Machine Learning

2006.13888

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Revisiting Fundamentals of Experience Replay

Fedus, William, Ramachandran, Prajit, Agarwal, Rishabh, Bengio, Yoshua, Larochelle, Hugo, Rowland, Mark, Dabney, Will

arXiv.org Machine LearningJul-13-2020

Experience replay is central to off-policy algorithms in deep reinforcement learning (RL), but there remain significant gaps in our understanding. We therefore present a systematic and extensive analysis of experience replay in Q-learning methods, focusing on two fundamental properties: the replay capacity and the ratio of learning updates to experience collected (replay ratio). Our additive and ablative studies upend conventional wisdom around experience replay -- greater capacity is found to substantially increase the performance of certain algorithms, while leaving others unaffected. Counterintuitively we show that theoretically ungrounded, uncorrected n-step returns are uniquely beneficial while other techniques confer limited benefit for sifting through larger memory. Separately, by directly controlling the replay ratio we contextualize previous observations in the literature and empirically measure its importance across a variety of deep RL algorithms. Finally, we conclude by testing a set of hypotheses on the nature of these performance benefits.

artificial intelligence, reinforcement learning, replay capacity, (17 more...)

arXiv.org Machine Learning

2007.067

Country: Europe > Austria (0.28)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Striving for Simplicity in Off-policy Deep Reinforcement Learning

Agarwal, Rishabh, Schuurmans, Dale, Norouzi, Mohammad

arXiv.org Artificial IntelligenceJul-10-2019

Reflecting on the advances of off-policy deep reinforcement learning (RL) algorithms since the development of DQN in 2013, it is important to ask: are the complexities of recent off-policy methods really necessary? In an attempt to isolate the contributions of various factors of variation in off-policy deep RL and to help design simpler algorithms, this paper investigates a set of related questions: First, can effective policies be learned given only access to logged offline experience? Second, how much of the benefits of recent distributional RL algorithms is attributed to improvements in exploration versus exploitation behavior? Third, can simpler off-policy RL algorithms outperform distributional RL without learning explicit distributions over returns? This paper uses a batch RL experimental setup on Atari 2600 games to investigate these questions. Unexpectedly, we find that batch RL algorithms trained solely on logged experiences of a DQN agent are able to significantly outperform online DQN. Our experiments suggest that the benefits of distributional RL mainly stem from better exploitation. We present a simple and novel variant of ensemble Q-learning called Random Ensemble Mixture (REM), which enforces optimal Bellman consistency on random convex combinations of the Q-heads of a multi-head Q-network. The batch REM agent trained offline on DQN data outperforms the batch QR-DQN and online C51 algorithms.

agent, artificial intelligence, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

1907.04543

Country: North America > Canada > Alberta (0.14)

Genre:

Research Report > Experimental Study (0.48)
Research Report > New Finding (0.46)

Industry: Leisure & Entertainment (0.71)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Learning to Generalize from Sparse and Underspecified Rewards

Agarwal, Rishabh, Liang, Chen, Schuurmans, Dale, Norouzi, Mohammad

arXiv.org Machine LearningFeb-19-2019

We consider the problem of learning from sparse and underspecified rewards, where an agent receives a complex input, such as a natural language instruction, and needs to generate a complex response, such as an action sequence, while only receiving binary success-failure feedback. Such success-failure rewards are often underspecified: they do not distinguish between purposeful and accidental success. Generalization from underspecified rewards hinges on discounting spurious trajectories that attain accidental success, while learning from sparse feedback requires effective exploration. We address exploration by using a mode covering direction of KL divergence to collect a diverse set of successful trajectories, followed by a mode seeking KL divergence to train a robust policy. We propose Meta Reward Learning (MeRL) to construct an auxiliary reward function that provides more refined feedback for learning. The parameters of the auxiliary reward function are optimized with respect to the validation performance of a trained policy. The MeRL approach outperforms our alternative reward learning technique based on Bayesian Optimization, and achieves the state-of-the-art on weakly-supervised semantic parsing. It improves previous work by 1.2% and 2.4% on WikiTableQuestions and WikiSQL datasets respectively.

artificial intelligence, educational setting, trajectory, (21 more...)

arXiv.org Machine Learning

1902.07198

Country: North America > Canada > Alberta (0.14)

Genre: Research Report (0.64)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.96)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
(2 more...)

Add feedback

Evaluation Function Approximation for Scrabble

Agarwal, Rishabh

arXiv.org Artificial IntelligenceJan-24-2019

The current state-of-the-art Scrabble agents are not learning-based but depend on truncated Monte Carlo simulations and the quality of such agents is contingent upon the time available for running the simulations. This thesis takes steps towards building a learning-based Scrabble agent using self-play. Specifically, we try to find a better function approximation for the static evaluation function used in Scrabble which determines the move goodness at a given board configuration. In this work, we experimented with evolutionary algorithms and Bayesian Optimization to learn the weights for an approximate feature-based evaluation function. However, these optimization methods were not quite effective, which lead us to explore the given problem from an Imitation Learning point of view. We also tried to imitate the ranking of moves produced by the Quackle simulation agent using supervised learning with a neural network function approximator which takes the raw representation of the Scrabble board as the input instead of using only a fixed number of handcrafted features.

evaluation function, fuzzy logic, neural network, (21 more...)

arXiv.org Artificial Intelligence

1901.08728

Country: Asia > India (0.14)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games > Scrabble (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.60)

Add feedback