Goto

Collaborating Authors

 Zhang, Jack


Dynamic Embeddings with Task-Oriented prompting

arXiv.org Artificial Intelligence

Recent progress in machine learning (ML) and natural language processing (NLP) underscores the pivotal importance of embeddings in enhancing model performance. Embeddings typically transform discrete elements into constant continuous vectors across different tasks, a practice that can restrict their versatility and efficiency, particularly in contexts that demand intricate and nuanced data representations [1, 14, 10]. Dynamic Embeddings with Task-Oriented prompting (DETOT) presents an innovative solution to these limitations by incorporating flexibility into the embedding process, allowing real-time modifications based on specific task demands and performance feedback. This document explores DETOT's ability to tailor embeddings for each task, significantly enhancing model accuracy and computational efficiency [25, 8].


Tur[k]ingBench: A Challenge Benchmark for Web Agents

arXiv.org Artificial Intelligence

Recent chatbots have demonstrated impressive ability to understand and communicate in raw-text form. However, there is more to the world than raw text. For example, humans spend long hours of their time on web pages, where text is intertwined with other modalities and tasks are accomplished in the form of various complex interactions. Can state-of-the-art multi-modal models generalize to such complex domains? To address this question, we introduce TurkingBench, a benchmark of tasks formulated as web pages containing textual instructions with multi-modal context. Unlike existing work which employs artificially synthesized web pages, here we use natural HTML pages that were originally designed for crowdsourcing workers for various annotation purposes. The HTML instructions of each task are also instantiated with various values (obtained from the crowdsourcing tasks) to form new instances of the task. This benchmark contains 32.2K instances distributed across 158 tasks. Additionally, to facilitate the evaluation on TurkingBench, we develop an evaluation framework that connects the responses of chatbots to modifications on web pages (modifying a text box, checking a radio, etc.). We evaluate the performance of state-of-the-art models, including language-only, vision-only, and layout-only models, and their combinations, on this benchmark. Our findings reveal that these models perform significantly better than random chance, yet considerable room exists for improvement. We hope this benchmark will help facilitate the evaluation and development of web-based agents.


Case Study: Testing Model Capabilities in Some Reasoning Tasks

arXiv.org Artificial Intelligence

In the rapidly evolving field of artificial intelligence [30], Large Language Models (LLMs) have emerged as a cornerstone of technological advancement, revolutionizing the way we interact with machines and process information. With their unparalleled ability to generate human-like text, LLMs have found applications across a broad spectrum of domains, from automating customer service interactions to aiding in the creative process of writing and design. Their proficiency in generating personalized content and facilitating interactive dialogues has underscored their versatility and adaptability, making them indispensable tools in the modern digital landscape [4, 5, 6]. Despite these significant achievements, LLMs are not without their shortcomings. One of the critical areas where LLMs still face challenges is in their reasoning abilities and the provision of explainable outputs.


Improving Agent Interactions in Virtual Environments with Language Models

arXiv.org Artificial Intelligence

Enhancing AI systems with efficient communication skills for effective human assistance necessitates proactive initiatives from the system side to discern specific circumstances and interact aptly. This research focuses on a collective building assignment in the Minecraft dataset, employing language modeling to enhance task understanding through state-of-the-art methods. These models focus on grounding multi-modal Figure 1: Within the ambit of a collaborative construction understanding and task-oriented dialogue comprehension endeavor, it is incumbent upon the builder to adhere tasks, providing insights into their scrupulously to the directives issued by the architect.