AITopics | task step

Collaborating Authors

task step

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

TaskBench: BenchmarkingLargeLanguage ModelsforTaskAutomation

Neural Information Processing SystemsFeb-7-2026, 13:26:16 GMT

To address this, we introduceTASKBENCH, a comprehensive framework to evaluate the capability of LLMs in task automation. Specifically, task automation can be divided into three critical stages: task decomposition, tool selection, and parameter prediction. To tackle the complexities inherent in these stages, we introduce the concept of Tool Graph to represent decomposed tasksandadoptaback-instruct method togenerate high-quality userinstructions. We propose TASKEVAL, a multi-faceted evaluation methodology that assesses LLMperformance across thesethreestages.

large language model, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.04)
Asia > China (0.04)

Genre: Research Report (0.46)

Industry: Banking & Finance (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Butter-Bench: Evaluating LLM Controlled Robots for Practical Intelligence

Sharrock, Callum, Petersson, Lukas, Petersson, Hanna, Backlund, Axel, Wennström, Axel, Nordström, Kristoffer, Aronsson, Elias

arXiv.org Artificial IntelligenceOct-28-2025

We present Butter-Bench, a benchmark evaluating large language model (LLM) controlled robots for practical intelligence, defined as the ability to navigate the messiness of the physical world. Current state-of-the-art robotic systems use a hierarchical architecture with LLMs in charge of high-level reasoning, and a Vision Language Action (VLA) model for low-level control. Butter-Bench evaluates the LLM part in isolation from the VLA. Although LLMs have repeatedly surpassed humans in evaluations requiring analytical intelligence, we find humans still outperform LLMs on Butter-Bench. The best LLMs score 40% on Butter-Bench, while the mean human score is 95%. LLMs struggled the most with multi-step spatial planning and social understanding. We also evaluate LLMs that are fine-tuned for embodied reasoning and conclude that this training does not improve their score on Butter-Bench. Language models (LMs) were initially intended for narrow text understanding tasks. The first Transformer-based LM (V aswani et al., 2017) was explicitly trained for translation. However, large-scale training runs of LMs eventually resulted in emergent behaviour - model capabilities that were not explicitly trained for (Brown et al., 2020). For example, LLMs are not trained to be robots, yet companies such as Figure (Helix, 2025) and Google DeepMind (Gemini Robotics 1.5, 2025) use LLMs in their robotic stack.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2510.2186

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (0.93)
Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

LIT: Large Language Model Driven Intention Tracking for Proactive Human-Robot Collaboration -- A Robot Sous-Chef Application

Huang, Zhe, Pohovey, John, Yammanuru, Ananya, Driggs-Campbell, Katherine

arXiv.org Artificial IntelligenceJun-19-2024

Large Language Models (LLM) and Vision Language Models (VLM) enable robots to ground natural language prompts into control actions to achieve tasks in an open world. However, when applied to a long-horizon collaborative task, this formulation results in excessive prompting for initiating or clarifying robot actions at every step of the task. We propose Language-driven Intention Tracking (LIT), leveraging LLMs and VLMs to model the human user's long-term behavior and to predict the next human intention to guide the robot for proactive collaboration. We demonstrate smooth coordination between a LIT-based collaborative robot and the human user in collaborative cooking tasks.

human user, intention, robot, (15 more...)

arXiv.org Artificial Intelligence

2406.13787

Country: North America > United States > Illinois > Champaign County > Urbana (0.04)

Genre:

Workflow (0.68)
Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Agile and versatile bipedal robot tracking control through reinforcement learning

Li, Jiayi, Ye, Linqi, Cheng, Yi, Liu, Houde, Liang, Bin

arXiv.org Artificial IntelligenceApr-12-2024

The remarkable athletic intelligence displayed by humans in complex dynamic movements such as dancing and gymnastics suggests that the balance mechanism in biological beings is decoupled from specific movement patterns. This decoupling allows for the execution of both learned and unlearned movements under certain constraints while maintaining balance through minor whole-body coordination. To replicate this balance ability and body agility, this paper proposes a versatile controller for bipedal robots. This controller achieves ankle and body trajectory tracking across a wide range of gaits using a single small-scale neural network, which is based on a model-based IK solver and reinforcement learning. We consider a single step as the smallest control unit and design a universally applicable control input form suitable for any single-step variation. Highly flexible gait control can be achieved by combining these minimal control units with high-level policy through our extensible control interface. To enhance the trajectory-tracking capability of our controller, we utilize a three-stage training curriculum. After training, the robot can move freely between target footholds at varying distances and heights. The robot can also maintain static balance without repeated stepping to adjust posture. Finally, we evaluate the tracking accuracy of our controller on various bipedal tasks, and the effectiveness of our control framework is verified in the simulation environment.

bipedal robot, robot, trajectory, (14 more...)

arXiv.org Artificial Intelligence

2404.08246

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre:

Research Report (0.50)
Instructional Material > Course Syllabus & Notes (0.34)

Industry: Leisure & Entertainment (0.66)

Technology:

Information Technology > Artificial Intelligence > Robots > Locomotion (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Bootstrap Your Own Skills: Learning to Solve New Tasks with Large Language Model Guidance

Zhang, Jesse, Zhang, Jiahui, Pertsch, Karl, Liu, Ziyi, Ren, Xiang, Chang, Minsuk, Sun, Shao-Hua, Lim, Joseph J.

arXiv.org Artificial IntelligenceOct-17-2023

We propose BOSS, an approach that automatically learns to solve new long-horizon, complex, and meaningful tasks by growing a learned skill library with minimal supervision. Prior work in reinforcement learning require expert supervision, in the form of demonstrations or rich reward functions, to learn long-horizon tasks. Instead, our approach BOSS (BOotStrapping your own Skills) learns to accomplish new tasks by performing "skill bootstrapping," where an agent with a set of primitive skills interacts with the environment to practice new skills without receiving reward feedback for tasks outside of the initial skill set. This bootstrapping phase is guided by large language models (LLMs) that inform the agent of meaningful skills to chain together. Through this process, BOSS builds a wide range of complex and useful behaviors from a basic set of primitive skills. We demonstrate through experiments in realistic household environments that agents trained with our LLM-guided bootstrapping procedure outperform those trained with naive bootstrapping as well as prior unsupervised skill acquisition methods on zero-shot execution of unseen, long-horizon tasks in new environments. Website at clvrai.com/boss.

agent, international conference, llm, (16 more...)

arXiv.org Artificial Intelligence

2310.10021

Country:

North America > United States > California (0.14)
Asia > Taiwan (0.04)

Genre:

Research Report (0.82)
Instructional Material (0.67)

Industry:

Education (1.00)
Leisure & Entertainment > Sports > Tennis (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

DiLogics: Creating Web Automation Programs With Diverse Logics

Pu, Kevin, Yang, Jim, Yuan, Angel, Ma, Minyi, Dong, Rui, Wang, Xinyu, Chen, Yan, Grossman, Tovi

arXiv.org Artificial IntelligenceAug-18-2023

Knowledge workers frequently encounter repetitive web data entry tasks, like updating records or placing orders. Web automation increases productivity, but translating tasks to web actions accurately and extending to new specifications is challenging. Existing tools can automate tasks that perform the same logical trace of UI actions (e.g., input text in each field in order), but do not support tasks requiring different executions based on varied input conditions. We present DiLogics, a programming-by-demonstration system that utilizes NLP to assist users in creating web automation programs that handle diverse specifications. DiLogics first semantically segments input data to structured task steps. By recording user demonstrations for each step, DiLogics generalizes the web macros to novel but semantically similar task requirements. Our evaluation showed that non-experts can effectively use DiLogics to create automation programs that fulfill diverse input instructions. DiLogics provides an efficient, intuitive, and expressive method for developing web automation programs satisfying diverse specifications.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3586183.3606822

2308.05828

Country:

North America > Canada > Ontario > Toronto (0.28)
North America > United States > California > San Francisco County > San Francisco (0.16)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre:

Workflow (1.00)
Research Report (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(2 more...)

Add feedback

Improving Proactive Dialog Agents Using Socially-Aware Reinforcement Learning

Kraus, Matthias, Wagner, Nicolas, Riekenbrauck, Ron, Minker, Wolfgang

arXiv.org Artificial IntelligenceJun-22-2023

The next step for intelligent dialog agents is to escape their role as silent bystanders and become proactive. Well-defined proactive behavior may improve human-machine cooperation, as the agent takes a more active role during interaction and takes off responsibility from the user. However, proactivity is a double-edged sword because poorly executed pre-emptive actions may have a devastating effect not only on the task outcome but also on the relationship with the user. For designing adequate proactive dialog strategies, we propose a novel approach including both social as well as task-relevant features in the dialog. Here, the primary goal is to optimize proactive behavior so that it is task-oriented - this implies high task success and efficiency - while also being socially effective by fostering user trust. Including both aspects in the reward function for training a proactive dialog agent using reinforcement learning showed the benefit of our approach for more successful human-machine cooperation.

machine learning, natural language, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3565472.3595611

2211.15359

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Washington > King County > Redmond (0.04)
North America > United States > Massachusetts (0.04)
(4 more...)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Development of a Trust-Aware User Simulator for Statistical Proactive Dialog Modeling in Human-AI Teams

Kraus, Matthias, Riekenbrauck, Ron, Minker, Wolfgang

arXiv.org Artificial IntelligenceJun-18-2023

HAIT requires close coordination between humans and AI teammates to work together towards a common goal [40]. Effective communication, prediction of teammates' actions, and high-level coordination are essential components of this collaborative effort. In this regard, the proactive behavior of AI-based systems and the communication thereof during collaboration is an important research topic concerning HAITs, e.g., see Horvitz et al. [8]. Proactivity can be defined as an AI's self-initiating, anticipatory behavior for contributing to effective and efficient task completion. It has been shown to be essential for human teamwork as it leads to higher job and team performance and is associated with leadership and innovation [3]. However, the design of adequate proactivity for AI-based systems to support humans is still an open question and a challenging topic. It is essential to study the impact of proactive system actions on the human-agent trust relationship and how to use information about an AI agent's perceived trustworthiness to model appropriate proactive dialog strategies for forming effective HAITs.

machine learning, natural language, task step, (21 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3563359.3597403

2304.11913

Country:

Europe > Middle East > Cyprus > Limassol > Limassol (0.06)
North America > United States > New York > New York County > New York City (0.05)
Europe > Germany (0.05)
(5 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Abstract Demonstrations and Adaptive Exploration for Efficient and Stable Multi-step Sparse Reward Reinforcement Learning

Yang, Xintong, Ji, Ze, Wu, Jing, Lai, Yu-kun

arXiv.org Artificial IntelligenceJul-19-2022

Although Deep Reinforcement Learning (DRL) has been popular in many disciplines including robotics, state-of-the-art DRL algorithms still struggle to learn long-horizon, multi-step and sparse reward tasks, such as stacking several blocks given only a task-completion reward signal. To improve learning efficiency for such tasks, this paper proposes a DRL exploration technique, termed A^2, which integrates two components inspired by human experiences: Abstract demonstrations and Adaptive exploration. A^2 starts by decomposing a complex task into subtasks, and then provides the correct orders of subtasks to learn. During training, the agent explores the environment adaptively, acting more deterministically for well-mastered subtasks and more stochastically for ill-learnt subtasks. Ablation and comparative experiments are conducted on several grid-world tasks and three robotic manipulation tasks. We demonstrate that A^2 can aid popular DRL algorithms (DQN, DDPG, and SAC) to learn more efficiently and stably in these environments.

demonstration, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ICAC55051.2022.9911100

2207.09243

Country: Europe > United Kingdom > Wales > Cardiff (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback