AITopics | Sridhar, Abishek

Collaborating Authors

Sridhar, Abishek

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Synatra: Turning Indirect Knowledge into Direct Demonstrations for Digital Agents at Scale

Ou, Tianyue, Xu, Frank F., Madaan, Aman, Liu, Jiarui, Lo, Robert, Sridhar, Abishek, Sengupta, Sudipta, Roth, Dan, Neubig, Graham, Zhou, Shuyan

arXiv.org Artificial IntelligenceNov-27-2024

LLMs can now act as autonomous agents that interact with digital environments and complete specific objectives (e.g., arranging an online meeting). However, accuracy is still far from satisfactory, partly due to a lack of large-scale, direct demonstrations for digital tasks. Obtaining supervised data from humans is costly, and automatic data collection through exploration or reinforcement learning relies on complex environmental and content setup, resulting in datasets that lack comprehensive coverage of various scenarios. On the other hand, there is abundant knowledge that may indirectly assist task completion, such as online tutorials that were created for human consumption. In this work, we present Synatra, an approach that effectively transforms this indirect knowledge into direct supervision at scale. We define different types of indirect knowledge, and carefully study the available sources to obtain it, methods to encode the structure of direct demonstrations, and finally methods to transform indirect knowledge into direct demonstrations. We use 100k such synthetically-created demonstrations to finetune a 7B CodeLlama, and demonstrate that the resulting agent surpasses all comparably sized models on three web-based task benchmarks Mind2Web, MiniWoB++ and WebArena, as well as surpassing GPT-3.5 on WebArena and Mind2Web. In addition, while synthetic demonstrations prove to be only 3% the cost of human demonstrations (at $0.031 each), we show that the synthetic demonstrations can be more effective than an identical number of human demonstrations collected from limited domains.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2409.15637

Country:

North America > United States (0.28)
Europe > Austria > Vienna (0.14)
Asia > Middle East > UAE (0.14)

Genre:

Workflow (1.00)
Instructional Material (0.92)
Research Report > New Finding (0.45)

Industry:

Information Technology (0.68)
Retail > Online (0.46)
Education > Educational Setting > Online (0.34)

Technology:

Information Technology > Communications > Web (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

Hierarchical Prompting Assists Large Language Model on Web Navigation

Sridhar, Abishek, Lo, Robert, Xu, Frank F., Zhu, Hao, Zhou, Shuyan

arXiv.org Artificial IntelligenceOct-29-2023

Large language models (LLMs) struggle on processing complicated observations in interactive decision making tasks. To alleviate this issue, we propose a simple hierarchical prompting approach. Diverging from previous prompting approaches that always put the full observation (e.g. a web page) to the prompt, we propose to first construct an action-aware observation which is more condensed and relevant with a dedicated SUMMARIZER prompt. The ACTOR prompt then predicts the next action based on the summarized observation. While our method has broad applicability, we particularly demonstrate its efficacy in the complex domain of web navigation where a full observation often contains redundant and irrelevant information. Our approach outperforms the previous state-of-the-art prompting mechanics by 6.2% on task success rate, demonstrating its potential on interactive decision making tasks with long observation traces.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2305.14257

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

WebArena: A Realistic Web Environment for Building Autonomous Agents

Zhou, Shuyan, Xu, Frank F., Zhu, Hao, Zhou, Xuhui, Lo, Robert, Sridhar, Abishek, Cheng, Xianyi, Ou, Tianyue, Bisk, Yonatan, Fried, Daniel, Alon, Uri, Neubig, Graham

arXiv.org Artificial IntelligenceOct-24-2023

With advances in generative AI, there is now potential for autonomous agents to manage daily tasks via natural language commands. However, current agents are primarily created and tested in simplified synthetic environments, leading to a disconnect with real-world scenarios. In this paper, we build an environment for language-guided agents that is highly realistic and reproducible. Specifically, we focus on agents that perform tasks on the web, and create an environment with fully functional websites from four common domains: e-commerce, social forum discussions, collaborative software development, and content management. Our environment is enriched with tools (e.g., a map) and external knowledge bases (e.g., user manuals) to encourage human-like task-solving. Building upon our environment, we release a set of benchmark tasks focusing on evaluating the functional correctness of task completions. The tasks in our benchmark are diverse, long-horizon, and designed to emulate tasks that humans routinely perform on the internet. We experiment with several baseline agents, integrating recent techniques such as reasoning before acting. The results demonstrate that solving complex tasks is challenging: our best GPT-4-based agent only achieves an end-to-end task success rate of 14.41%, significantly lower than the human performance of 78.24%. These results highlight the need for further development of robust agents, that current state-of-the-art large language models are far from perfect performance in these real-life tasks, and that WebArena can be used to measure such progress.

building autonomous agent, large language model, machine learning, (6 more...)

arXiv.org Artificial Intelligence

2307.13854

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.60)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.53)

Add feedback

Exploring the Role of the Bottleneck in Slot-Based Models Through Covariance Regularization

Stange, Andrew, Lo, Robert, Sridhar, Abishek, Rajesh, Kousik

arXiv.org Artificial IntelligenceJun-5-2023

In this project we attempt to make slot-based models with an image reconstruction objective competitive with those that use a feature reconstruction objective on real world datasets. We propose a loss-based approach to constricting the bottleneck of slot-based models, allowing larger-capacity encoder networks to be used with Slot Attention without producing degenerate stripe-shaped masks. We find that our proposed method offers an improvement over the baseline Slot Attention model but does not reach the performance of \dinosaur on the COCO2017 dataset. Throughout this project, we confirm the superiority of a feature reconstruction objective over an image reconstruction objective and explore the role of the architectural bottleneck in slot-based models.

artificial intelligence, bottleneck, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2306.02577

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Vision (0.88)

Add feedback

LIC-GAN: Language Information Conditioned Graph Generative GAN Model

Lo, Robert, Datar, Arnhav, Sridhar, Abishek

arXiv.org Artificial IntelligenceJun-2-2023

Deep generative models for Natural Language data offer a new angle on the problem of graph synthesis: by optimizing differentiable models that directly generate graphs, it is possible to side-step expensive search procedures in the discrete and vast space of possible graphs. We introduce LIC-GAN, an implicit, likelihood-free generative model for small graphs that circumvents the need for expensive graph matching procedures. Our method takes as input a natural language query and using a combination of language modelling and Generative Adversarial Networks (GANs) and returns a graph that closely matches the description of the query. We combine our approach with a reward network to further enhance the graph generation with desired properties. Our experiments, show that LIC-GAN does well on metrics such as PropMatch and Closeness getting scores of 0.36 and 0.48. We also show that LIC-GAN performs as good as ChatGPT, with ChatGPT getting scores of 0.40 and 0.42. We also conduct a few experiments to demonstrate the robustness of our method, while also highlighting a few interesting caveats of the model.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2306.01937

Country:

Europe (0.68)
North America > United States (0.28)

Genre: Research Report (0.64)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Add feedback