task flow
Foresee and Act Ahead: Task Prediction and Pre-Scheduling Enabled Efficient Robotic Warehousing
Cao, B., Liu, Z., Han, X., Zhou, S., Zhang, H., Wang, H.
In warehousing systems, to enhance logistical efficiency amid surging demand volumes, much focus is placed on how to reasonably allocate tasks to robots. However, the robots labor is still inevitably wasted to some extent. In response to this, we propose a pre-scheduling enhanced warehousing framework that predicts task flow and acts in advance. It consists of task flow prediction and hybrid tasks allocation. For task prediction, we notice that it is possible to provide a spatio-temporal representation of task flow, so we introduce a periodicity-decoupled mechanism tailored for the generation patterns of aggregated orders, and then further extract spatial features of task distribution with novel combination of graph structures. In hybrid tasks allocation, we consider the known tasks and predicted future tasks simultaneously and optimize the allocation dynamically. In addition, we consider factors such as predicted task uncertainty and sector-level efficiency evaluation in warehousing to realize more balanced and rational allocations. We validate our task prediction model across actual datasets derived from real factories, achieving SOTA performance. Furthermore, we implement our compelte scheduling system in a real-world robotic warehouse for months of lifelong validation, demonstrating large improvements in key metrics of warehousing, such as empty running rate, by more than 50%.
DFlow: Diverse Dialogue Flow Simulation with Large Language Models
Du, Wanyu, Feng, Song, Gung, James, Sun, Lijia, Zhang, Yi, Mansour, Saab, Qi, Yanjun
Developing language model-based dialogue agents requires effective data to train models that can follow specific task logic. However, most existing data augmentation methods focus on increasing diversity in language, topics, or dialogue acts at the utterance level, largely neglecting a critical aspect of task logic diversity at the dialogue level. This paper proposes a novel data augmentation method designed to enhance the diversity of synthetic dialogues by focusing on task execution logic. Our method uses LLMs to generate decision tree-structured task plans, which enables the derivation of diverse dialogue trajectories for a given task. Each trajectory, referred to as a "dialog flow", guides the generation of a multi-turn dialogue that follows a unique trajectory. We apply this method to generate a task-oriented dialogue dataset comprising 3,886 dialogue flows across 15 different domains. We validate the effectiveness of this dataset using the next action prediction task, where models fine-tuned on our dataset outperform strong baselines, including GPT-4. Upon acceptance of this paper, we plan to release the code and data publicly.
Flow as the Cross-Domain Manipulation Interface
Xu, Mengda, Xu, Zhenjia, Xu, Yinghao, Chi, Cheng, Wetzstein, Gordon, Veloso, Manuela, Song, Shuran
We present Im2Flow2Act, a scalable learning framework that enables robots to acquire manipulation skills from diverse data sources. The key idea behind Im2Flow2Act is to use object flow as the manipulation interface, bridging domain gaps between different embodiments (i.e., human and robot) and training environments (i.e., real-world and simulated). Im2Flow2Act comprises two components: a flow generation network and a flow-conditioned policy. The flow generation network, trained on human demonstration videos, generates object flow from the initial scene image, conditioned on the task description. The flow-conditioned policy, trained on simulated robot play data, maps the generated object flow to robot actions to realize the desired object movements. By using flow as input, this policy can be directly deployed in the real world with a minimal sim-to-real gap. By leveraging real-world human videos and simulated robot play data, we bypass the challenges of teleoperating physical robots in the real world, resulting in a scalable system for diverse tasks. We demonstrate Im2Flow2Act's capabilities in a variety of real-world tasks, including the manipulation of rigid, articulated, and deformable objects.
Efficient Orchestrated AI Workflows Execution on Scale-out Spatial Architecture
Deng, Jinyi, Tang, Xinru, Yue, Zhiheng, Lu, Guangyang, Yang, Qize, Zhang, Jiahao, Li, Jinxi, Li, Chao, Wei, Shaojun, Hu, Yang, Yin, Shouyi
Given the increasing complexity of AI applications, traditional spatial architectures frequently fall short. Our analysis identifies a pattern of interconnected, multi-faceted tasks encompassing both AI and general computational processes. In response, we have conceptualized "Orchestrated AI Workflows," an approach that integrates various tasks with logic-driven decisions into dynamic, sophisticated workflows. Specifically, we find that the intrinsic Dual Dynamicity of Orchestrated AI Workflows, namely dynamic execution times and frequencies of Task Blocks, can be effectively represented using the Orchestrated Workflow Graph. Furthermore, the intrinsic Dual Dynamicity poses challenges to existing spatial architecture, namely Indiscriminate Resource Allocation, Reactive Load Rebalancing, and Contagious PEA Idleness. To overcome these challenges, we present Octopus, a scale-out spatial architecture and a suite of advanced scheduling strategies optimized for executing Orchestrated AI Workflows, such as the Discriminate Dual-Scheduling Mechanism, Adaptive TBU Scheduling Strategy, and Proactive Cluster Scheduling Strategy. Our evaluations demonstrate that Octopus significantly outperforms traditional architectures in handling the dynamic demands of Orchestrated AI Workflows, and possesses robust scalability in large scale hardware such as wafer-scale chip.
LEGOEval: An Open-Source Toolkit for Dialogue System Evaluation via Crowdsourcing
Li, Yu, Arnold, Josh, Yan, Feifan, Shi, Weiyan, Yu, Zhou
We present LEGOEval, an open-source toolkit that enables researchers to easily evaluate dialogue systems in a few lines of code using the online crowdsource platform, Amazon Mechanical Turk. Compared to existing toolkits, LEGOEval features a flexible task design by providing a Python API that maps to commonly used React.js interface components. Researchers can personalize their evaluation procedures easily with our built-in pages as if playing with LEGO blocks. Thus, LEGOEval provides a fast, consistent method for reproducing human evaluation results. Besides the flexible task design, LEGOEval also offers an easy API to review collected data.