AITopics | Vaidyanath, Skanda

Collaborating Authors

Vaidyanath, Skanda

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

WILT: A Multi-Turn, Memorization-Robust Inductive Logic Benchmark for LLMs

Banatt, Eryk, Cheng, Jonathan, Vaidyanath, Skanda, Hwu, Tiffany

arXiv.org Artificial IntelligenceOct-14-2024

While large language models (LLMs) have shown impressive capabilities across a wide range of domains, they still encounter significant challenges in reasoning tasks that require gathering evidence over multiple turns and drawing logical conclusions from this evidence. These challenges present significant obstacles for LLM chat user interfaces, which rely on multi-turn interactions to facilitate effective collaboration. This limitation leads to real-world issues; for example, service chatbots must gather necessary information from customers over multiple turns to diagnose and resolve problems effectively. Despite the multi-turn nature of many real-world LLM use cases, most existing benchmarks rely on carefully curated single-turn tests, which often blur the line between memorization and genuine reasoning. To address this, we introduce the Wason Inductive Logic Test (WILT), a simple yet challenging multi-turn reasoning benchmark designed to resist memorization. WILT is inspired by the Wason 2-4-6 task (Wason, 1960), where participants must infer a basic boolean function involving three variables (e.g., x < y < z) by proposing test cases (such as (2, 4, 6)). In WILT, each test starts from a clean slate, with only the initial instructions provided, preventing models from relying on pre-learned responses. Over several turns, models must interact with the environment by suggesting test cases to narrow the possible hypotheses and ultimately infer the hidden function based on the outcomes. Our findings reveal that LLMs struggle with this task, exhibiting distinct strengths and weaknesses: some are better at narrowing down the hypothesis space by proposing valuable test cases, while others are more adept at deducing the hidden function from observed cases. Despite these variations, the best-performing model achieves only 28% accuracy, highlighting a significant gap in LLM performance on complex multi-turn reasoning tasks. Large language models (LLMs) powered by the transformer architecture (Vaswani, 2017) have enabled a new computing paradigm driven by natural language. These models are increasingly integrated into day-to-day life beyond the machine learning research space, where they help many people with common tasks. These models interact with users through multi-turn conversations, a capability of next-token-prediction models bolstered via instruction-tuning (Mishra et al., 2021) and alignment post-training phases (Ouyang et al., 2022).

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2410.10998

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.80)

Add feedback

Differentiable Weight Masks for Domain Transfer

Khanna, Samar, Vaidyanath, Skanda, Velu, Akash

arXiv.org Artificial IntelligenceOct-7-2023

One of the major drawbacks of deep learning models for computer vision has been their inability to retain multiple sources of information in a modular fashion. For instance, given a network that has been trained on a source task, we would like to re-train this network on a similar, yet different, target task while maintaining its performance on the source task. Simultaneously, researchers have extensively studied modularization of network weights to localize and identify the set of weights culpable for eliciting the observed performance on a given task. One set of works studies the modularization induced in the weights of a neural network by learning and analysing weight masks. In this work, we combine these fields to study three such weight masking methods and analyse their ability to mitigate "forgetting'' on the source task while also allowing for efficient finetuning on the target task. We find that different masking techniques have trade-offs in retaining knowledge in the source task without adversely affecting target task performance.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2308.13957

Country: North America > United States (0.14)

Genre: Research Report (0.65)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Hindsight-DICE: Stable Credit Assignment for Deep Reinforcement Learning

Velu, Akash, Vaidyanath, Skanda, Arumugam, Dilip

arXiv.org Artificial IntelligenceAug-18-2023

Reinforcement learning is the classic paradigm for addressing sequential decision-making problems [47]. Naturally, while inheriting the fundamental challenge of generalization across novel states and actions from supervised learning, general-purpose reinforcement-learning agents must also contend with the additional challenges of exploration and credit assignment. While much initial progress in the field was driven largely by innovative machinery for tackling credit assignment [45, 46, 44] alongside simple exploration heuristics (ε-greedy exploration, for example), recent years have seen a reversal with the bulk of attention focused on a broad array of exploration methods (spanning additional heuristics as well as more principled approaches) [51, 38, 15], and relatively little consideration given to issues of credit assignment. This lack of interest in solution concepts, however, has not stopped the proliferation of reinforcement learning into novel application areas characterized by long problem horizons and sparse reward signals; indeed, the current reinforcement learning from human feedback (RLHF) paradigm [28] is now a widely popularized example of an environment that operates in perhaps the harshest setting where a single feedback signal is only obtained after the completion of a long trajectory.

artificial intelligence, machine learning, reinforcement learning, (11 more...)

arXiv.org Artificial Intelligence

2307.11897

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report (0.50)

Industry: Government > Military > Air Force (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

PushWorld: A benchmark for manipulation planning with tools and movable obstacles

Kansky, Ken, Vaidyanath, Skanda, Swingle, Scott, Lou, Xinghua, Lazaro-Gredilla, Miguel, George, Dileep

arXiv.org Artificial IntelligenceFeb-1-2023

While recent advances in artificial intelligence have achieved human-level performance in environments like Starcraft and Go, many physical reasoning tasks remain challenging for modern algorithms. To date, few algorithms have been evaluated on physical tasks that involve manipulating objects when movable obstacles are present and when tools must be used to perform the manipulation. To promote research on such tasks, we introduce PushWorld, an environment with simplistic physics that requires manipulation planning with both movable obstacles and tools. We provide a benchmark of more than 200 PushWorld puzzles in PDDL and in an OpenAI Gym environment. We evaluate state-of-the-art classical planning and reinforcement learning algorithms on this benchmark, and we find that these baseline results are below human-level performance. We then provide a new classical planning heuristic that solves the most puzzles among the baselines, and although it is 40 times faster than the best baseline planner, it remains below human-level performance.

machine learning, puzzle, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2301.10289

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

LISA: Learning Interpretable Skill Abstractions from Language

Garg, Divyansh, Vaidyanath, Skanda, Kim, Kuno, Song, Jiaming, Ermon, Stefano

arXiv.org Artificial IntelligenceDec-6-2022

Learning policies that effectively utilize language instructions in complex, multi-task environments is an important problem in sequential decision-making. While it is possible to condition on the entire language instruction directly, such an approach could suffer from generalization issues. In our work, we propose \emph{Learning Interpretable Skill Abstractions (LISA)}, a hierarchical imitation learning framework that can learn diverse, interpretable primitive behaviors or skills from language-conditioned demonstrations to better generalize to unseen instructions. LISA uses vector quantization to learn discrete skill codes that are highly correlated with language instructions and the behavior of the learned policy. In navigation and robotic manipulation environments, LISA outperforms a strong non-hierarchical Decision Transformer baseline in the low data regime and is able to compose learned skills to solve tasks containing unseen long-range instructions. Our method demonstrates a more natural way to condition on language in sequential decision-making problems and achieve interpretable and controllable behavior with the learned skills.

machine learning, natural language, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2203.00054

Genre: Research Report > New Finding (0.46)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Using Reinforcement Learning to Manage Communications Between Humans and Artificial Agents in an Evacuation Scenario

Vaidyanath, Skanda ( Birla Institute of Technology and Science, Pilani-Hyderabad Campus ) | Georgila, Kallirroi (University of Southern California) | Traum, David (University of Southern California)

AAAI ConferencesMay-16-2020

In search and rescue missions, robots can potentially help save survivors faster than human emergency responders alone would. In our experimental virtual reality simulation environment we have a system which comprises a swarm of unmanned aerial vehicles (UAVs) and a virtual "spokesperson". The system and a human operator work together on locating and guiding survivors to safety away from an active wildfire encroaching on a small town. The UAVs and the spokesperson are equipped with natural language capabilities through which they can communicate with the survivors to convince them to evacuate. If they fail to do so they can ask the human operator to intervene. We use reinforcement learning to automatically learn a policy to be followed when a UAV has located survivors. The system learns the best course of action to help the survivors evacuate, i.e., warn them through the UAV or the spokesperson, ask the human operator to intervene if needed, guide them to safety via their preferred method of transportation, or just wait for more information. We vary the distance of the fire, the level of cooperativeness of the survivors, and how busy the human operator is, and we report results in terms of percentage of survivors saved in each condition.

evacuation scenario, manage communication, reinforcement learning, (1 more...)

AAAI Conferences

The Thirty-Third International Flairs Conference

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.60)
Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (0.53)
Information Technology > Artificial Intelligence > Robots (0.53)

Add feedback