Zhang, Jiahui
VidSketch: Hand-drawn Sketch-Driven Video Generation with Diffusion Control
Jiang, Lifan, Chen, Shuang, Wu, Boxi, Guan, Xiaotong, Zhang, Jiahui
With the advancement of generative artificial intelligence, previous studies have achieved the task of generating aesthetic images from hand-drawn sketches, fulfilling the public's needs for drawing. However, these methods are limited to static images and lack the ability to control video animation generation using hand-drawn sketches. To address this gap, we propose VidSketch, the first method capable of generating high-quality video animations directly from any number of hand-drawn sketches and simple text prompts, bridging the divide between ordinary users and professional artists. Specifically, our method introduces a Level-Based Sketch Control Strategy to automatically adjust the guidance strength of sketches during the generation process, accommodating users with varying drawing skills. Furthermore, a TempSpatial Attention mechanism is designed to enhance the spatiotemporal consistency of generated video animations, significantly improving the coherence across frames. You can find more detailed cases on our official website.
Learning an Adaptive Fall Recovery Controller for Quadrupeds on Complex Terrains
Lu, Yidan, Dong, Yinzhao, Ma, Ji, Zhang, Jiahui, Lu, Peng
Legged robots have made significant strides in locomotion However, in extreme or complex natural environments, capabilities, demonstrating impressive performance in robots still face the inevitability of falling. A major challenge tasks such as dynamic walking, running, and even complex in current research lies in developing adaptive controllers maneuvers like backflips [8], [2]. However, the ability to for robots to effectively recover from falls, allowing them recover from falls, especially on challenging and unpredictable to resume movement or efficiently complete tasks. However, terrains, remains a critical challenge in the field of legged model-based methods are often inadequate for these dynamic robotics. While substantial progress has been made in recovery tasks. For example, Mordatch et al. [12] proposed a framework strategies for flat or moderately uneven surfaces [7], [13], that optimizes automatic recovery through contact invariance, the problem of robust recovery on highly irregular terrains - but the reliance on predefined potential contact points limits such as rocky landscapes, steep inclines, or complex gaps - the exploration of flexible behaviors. In addition, classical has received limited attention.
SPRINT: Scalable Policy Pre-Training via Language Instruction Relabeling
Zhang, Jesse, Pertsch, Karl, Zhang, Jiahui, Lim, Joseph J.
Pre-training robot policies with a rich set of skills can substantially accelerate the learning of downstream tasks. Prior works have defined pre-training tasks via natural language instructions, but doing so requires tedious human annotation of hundreds of thousands of instructions. Thus, we propose SPRINT, a scalable offline policy pre-training approach which substantially reduces the human effort needed for pre-training a diverse set of skills. Our method uses two core ideas to automatically expand a base set of pre-training tasks: instruction relabeling via large language models and cross-trajectory skill chaining through offline reinforcement learning. As a result, SPRINT pre-training equips robots with a much richer repertoire of skills. Experimental results in a household simulator and on a real robot kitchen manipulation task show that SPRINT leads to substantially faster learning of new long-horizon tasks than previous pre-training approaches. Website at https://clvrai.com/sprint.
Bootstrap Your Own Skills: Learning to Solve New Tasks with Large Language Model Guidance
Zhang, Jesse, Zhang, Jiahui, Pertsch, Karl, Liu, Ziyi, Ren, Xiang, Chang, Minsuk, Sun, Shao-Hua, Lim, Joseph J.
We propose BOSS, an approach that automatically learns to solve new long-horizon, complex, and meaningful tasks by growing a learned skill library with minimal supervision. Prior work in reinforcement learning require expert supervision, in the form of demonstrations or rich reward functions, to learn long-horizon tasks. Instead, our approach BOSS (BOotStrapping your own Skills) learns to accomplish new tasks by performing "skill bootstrapping," where an agent with a set of primitive skills interacts with the environment to practice new skills without receiving reward feedback for tasks outside of the initial skill set. This bootstrapping phase is guided by large language models (LLMs) that inform the agent of meaningful skills to chain together. Through this process, BOSS builds a wide range of complex and useful behaviors from a basic set of primitive skills. We demonstrate through experiments in realistic household environments that agents trained with our LLM-guided bootstrapping procedure outperform those trained with naive bootstrapping as well as prior unsupervised skill acquisition methods on zero-shot execution of unseen, long-horizon tasks in new environments. Website at clvrai.com/boss.
An Intelligent Self-driving Truck System For Highway Transportation
Wang, Dawei, Gao, Lingping, Lan, Ziquan, Li, Wei, Ren, Jiaping, Zhang, Jiahui, Zhang, Peng, Zhou, Pei, Wang, Shengao, Pan, Jia, Manocha, Dinesh, Yang, Ruigang
Recently, there have been many advances in autonomous driving society, attracting a lot of attention from academia and industry. However, existing works mainly focus on cars, extra development is still required for self-driving truck algorithms and models. In this paper, we introduce an intelligent self-driving truck system. Our presented system consists of three main components, 1) a realistic traffic simulation module for generating realistic traffic flow in testing scenarios, 2) a high-fidelity truck model which is designed and evaluated for mimicking real truck response in real-world deployment, 3) an intelligent planning module with learning-based decision making algorithm and multi-mode trajectory planner, taking into account the truck's constraints, road slope changes, and the surrounding traffic flow. We provide quantitative evaluations for each component individually to demonstrate the fidelity and performance of each part. We also deploy our proposed system on a real truck and conduct real world experiments which shows our system's capacity of mitigating sim-to-real gap. Our code is available at https://github.com/InceptioResearch/IITS
Awakening Latent Grounding from Pretrained Language Models for Semantic Parsing
Liu, Qian, Yang, Dejian, Zhang, Jiahui, Guo, Jiaqi, Zhou, Bin, Lou, Jian-Guang
Recent years pretrained language models (PLMs) hit a success on several downstream tasks, showing their power on modeling language. To better understand and leverage what PLMs have learned, several techniques have emerged to explore syntactic structures entailed by PLMs. However, few efforts have been made to explore grounding capabilities of PLMs, which are also essential. In this paper, we highlight the ability of PLMs to discover which token should be grounded to which concept, if combined with our proposed erasing-then-awakening approach. Empirical studies on four datasets demonstrate that our approach can awaken latent grounding which is understandable to human experts, even if it is not exposed to such labels during training. More importantly, our approach shows great potential to benefit downstream semantic parsing models. Taking text-to-SQL as a case study, we successfully couple our approach with two off-the-shelf parsers, obtaining an absolute improvement of up to 9.8%.