Goto

Collaborating Authors

 input action


Think-Then-React: Towards Unconstrained Human Action-to-Reaction Generation

Tan, Wenhui, Li, Boyuan, Jin, Chuhao, Huang, Wenbing, Wang, Xiting, Song, Ruihua

arXiv.org Artificial Intelligence

Wenhui T an, Boyuan Li, Chuhao Jin, Wenbing Huang, Xiting Wang & Ruihua Song Gaoling School of Artificial Intelligence Renmin University of China Beijing, China {tanwenhui404,liboyuan,jinchuhao,hwenbing,xitingwang,rsong } @ruc.edu.cn Figure 1: Given a human action as input, our Think-Then-React model first thinks by generating an action description and reasons out a reaction prompt. It then reacts to the action based on the results of this thinking process. TTR reacts in a real-time manner at every timestep and periodically re-thinks at specific interval (every two timesteps in the illustration) to mitigate accumulated errors. Modeling human-like action-to-reaction generation has significant real-world applications, like human-robot interaction and games. Despite recent advancements in single-person motion generation, it is still challenging to well handle action-to-reaction generation, due to the difficulty of directly predicting reaction from action sequence without prompts, and the absence of a unified representation that effectively encodes multi-person motion. To address these challenges, we introduce Think-Then-React (TTR), a large language-model-based framework designed to generate human-like reactions. First, with our fine-grained multimodal training strategy, TTR is capable to unify two processes during inference: a thinking process that explicitly infers action intentions and reasons corresponding reaction description, which serve as semantic prompts, and a reacting process that predicts reactions based on input action and the inferred semantic prompts. Second, to effectively represent multi-person motion in language models, we propose a unified motion tokenizer by decoupling egocentric pose and absolute space features, which effectively represents action and reaction motion with same encoding. Extensive experiments demonstrate that TTR outperforms existing baselines, achieving significant improvements in evaluation metrics, such as reducing FID from 3.988 to 1.942. Predicting human reaction to human action in real world scenario is an online and unconstrained task, i.e., future states and text prompts are inaccessible, and it has board applications in virtual reality, human-robot interaction and gaming. Furthermore, Large Language Models (LLMs) have been applied to human motion generation, demonstrating superior performance (Jiang et al., 2023; Zhang et al., 2024).


Constructing Behavior Trees from Temporal Plans for Robotic Applications

Zapf, Josh, Roveri, Marco, Martin, Francisco, Manzanares, Juan Carlos

arXiv.org Artificial Intelligence

Executing temporal plans in the real and open world requires adapting to uncertainty both in the environment and in the plan actions. A plan executor must therefore be flexible to dispatch actions based on the actual execution conditions. In general, this involves considering both event and time-based constraints between the actions in the plan. A simple temporal network (STN) is a convenient framework for specifying the constraints between actions in the plan. Likewise, a behavior tree (BT) is a convenient framework for controlling the execution flow of the actions in the plan. The principle contributions of this paper are i) an algorithm for transforming a plan into an STN, and ii) an algorithm for transforming an STN into a BT. When combined, these algorithms define a systematic approach for executing total-order (time-triggered) plans in robots operating in the real world. Our approach is based on creating a graph describing a deordered (state-triggered) plan and then creating a BT representing a partial-order (determined at runtime) plan. This approach ensures the correct execution of plans, including those with required concurrency. We demonstrate the validity of our approach within the PlanSys2 framework on real robots.


Google DeepMind's new generative model makes Super Mario–like games from scratch

MIT Technology Review

Genie often adds this effect to the games it generates. While Genie is an in-house research project and won't be released, Guzdial notes that the Google DeepMind team says it could one day be turned into a game-making tool--something he's working on too. "I'm definitely interested to see what they build," he says.