Goto

Collaborating Authors

 skateboard


GM3: A General Physical Model for Micro-Mobility Vehicles

Cai, Grace, Parepally, Nithin, Zheng, Laura, Lin, Ming C.

arXiv.org Artificial Intelligence

Modeling the dynamics of micro-mobility vehicles (MMV) is becoming increasingly important for training autonomous vehicle systems and building urban traffic simulations. However, mainstream tools rely on variants of the Kinematic Bicycle Model (KBM) or mode-specific physics that miss tire slip, load transfer, and rider/vehicle lean. To our knowledge, no unified, physics-based model captures these dynamics across the full range of common MMVs and wheel layouts. We propose the "Generalized Micro-mobility Model" (GM3), a tire-level formulation based on the tire brush representation that supports arbitrary wheel configurations, including single/double track and multi-wheel platforms. We introduce an interactive model-agnostic simulation framework that decouples vehicle/layout specification from dynamics to compare the GM3 with the KBM and other models, consisting of fixed step RK4 integration, human-in-the-loop and scripted control, real-time trajectory traces and logging for analysis. We also empirically validate the GM3 on the Stanford Drone Dataset's deathCircle (roundabout) scene for biker, skater, and cart classes.


Quadrupedal Robot Skateboard Mounting via Reverse Curriculum Learning

Belov, Danil, Erkhov, Artem, Pestova, Elizaveta, Osokin, Ilya, Tsetserukou, Dzmitry, Osinenko, Pavel

arXiv.org Artificial Intelligence

-- The aim of this work is to enable quadrupedal robots to mount skateboards using Reverse Curriculum Reinforcement Learning. Although prior work has demonstrated skateboarding for quadrupeds that are already positioned on the board, the initial mounting phase still poses a significant challenge. A goal-oriented methodology was adopted, beginning with the terminal phases of the task and progressively increasing the complexity of the problem definition to approximate the desired objective. The learning process was initiated with the skateboard rigidly fixed within the global coordinate frame and the robot positioned directly above it. Through gradual relaxation of these initial conditions, the learned policy demonstrated robustness to variations in skateboard position and orientation, ultimately exhibiting a successful transfer to scenarios involving a mobile skateboard. Legged robot locomotion has a number of advantages over the other motion types.


Discrete-Time Hybrid Automata Learning: Legged Locomotion Meets Skateboarding

Liu, Hang, Teng, Sangli, Liu, Ben, Zhang, Wei, Ghaffari, Maani

arXiv.org Artificial Intelligence

The controller enables the robot to perform smooth and natural skateboarding motions, with reliable mode identification and transitions under disturbances. Abstract --This paper introduces Discrete-time Hybrid Automata Learning (DHAL), a framework using on-policy Reinforcement Learning to identify and execute mode-switching without trajectory segmentation or event function learning. Hybrid dynamical systems, which include continuous flow and discrete mode switching, can model robotics tasks like legged robot locomotion. Model-based methods usually depend on predefined gaits, while model-free approaches lack explicit mode-switching knowledge. Current methods identify discrete modes via segmentation before regressing continuous flow, but learning high-dimensional complex rigid body dynamics without trajectory labels or segmentation is a challenging open problem. Our approach incorporates a beta policy distribution and a multi-critic architecture to model contact-guided motions, exemplified by a challenging quadrupedal robot skateboard task. I. INTRODUCTION Legged robots are often regarded as the ideal embodiment of robotic systems, designed to perform a wide range of tasks and navigate diverse destinations. Many of these tasks, such as skateboarding and boxing, are inherently contact-guided, involving complex sequences of contact events [1]. Designing and executing such contact-guided control is highly non-trivial due to two major challenges: (1) the hybrid dynamics system problem arising from the abrupt transitions introduced by contact events [2], and (2) the sparsity of contact events, which poses significant difficulties for both model-based and model-free control strategies. In model-based control, Hybrid Automata has been proposed as a powerful framework to model systems with both discrete and continuous dynamics [3, 4]. This framework has been widely applied to behavior planning [5] and legged locomotion. However, due to the combinatorial nature of hybrid dynamics, finding optimal policies for hybrid systems through model-based optimization is computationally challenging, especially for tasks with high-dimensional state and action spaces. Model-free RL requires minimal assumptions and can be applied to a diverse range of tasks across different dynamic systems [6, 7]. However, RL policies, often represented by deep neural networks, lack interpretability and fail to explicitly model hybrid dynamics [8].


Microsoft plans to restart the Three Mile Island nuclear plant that narrowly avoided disaster

Engadget

Microsoft is in the midst of a deal that would bring the infamous Three Mile Island nuclear power plant back to life, according to reporting by The Washington Post. If the name sounds familiar, it's because the Pennsylvania plant was home to a partial meltdown of one of its reactors back in 1979. The deal would make Microsoft the plant's sole customer for 20 years, meaning it'll hoover up 100 percent of the power all for itself. Why does the company need so much juice? It's for AI, which is notoriously power hungry.


Learning Skateboarding for Humanoid Robots through Massively Parallel Reinforcement Learning

Thibault, William, Rajendran, Vidyasagar, Melek, William, Mombaur, Katja

arXiv.org Artificial Intelligence

Abstract-- Learning-based methods have proven useful at generating complex motions for robots, including humanoids. Reinforcement learning (RL) has been used to learn locomotion policies, some of which leverage a periodic reward formulation. This work extends the periodic reward formulation of locomotion to skateboarding for the REEM-C robot. Brax/MJX is used to implement the RL problem to achieve fast training. Initial results in simulation are presented with hardware experiments in progress.


Induction Heads as an Essential Mechanism for Pattern Matching in In-context Learning

Crosbie, J., Shutova, E.

arXiv.org Artificial Intelligence

As Large language models have shown a remarkable a significant milestone in this area, Elhage et al. ability to learn and perform complex tasks through (2021) demonstrated the existence of induction in-context learning (ICL) (Brown et al., 2020; Touvron heads in Transformer LMs. These heads scan the et al., 2023b). In ICL, the model receives context for previous instances of the current token a demonstration context and a query question as using a prefix matching mechanism, which identifies a prompt for prediction. Unlike supervised learning, if and where a token has appeared before. ICL utilises the pretrained model's capabilities If a matching token is found, the head employs to recognise and replicate patterns within the a copying mechanism to increase the probability demonstration context, thereby enabling accurate of the subsequent token, facilitating exact or approximate predictions for the query without the use of gradient repetition of sequences and embodying updates.


Closed-loop Teaching via Demonstrations to Improve Policy Transparency

Lee, Michael S., Simmons, Reid, Admoni, Henny

arXiv.org Artificial Intelligence

Demonstrations are a powerful way of increasing the transparency of AI policies. Though informative demonstrations may be selected a priori through the machine teaching paradigm, student learning may deviate from the preselected curriculum in situ. This paper thus explores augmenting a curriculum with a closed-loop teaching framework inspired by principles from the education literature, such as the zone of proximal development and the testing effect. We utilize tests accordingly to close to the loop and maintain a novel particle filter model of human beliefs throughout the learning process, allowing us to provide demonstrations that are targeted to the human's current understanding in real time. A user study finds that our proposed closed-loop teaching framework reduces the regret in human test responses by 43% over a baseline.


Kinematic Characterization of Micro-Mobility Vehicles During Evasive Maneuvers

Terranova, Paolo, Liu, Shu-Yuan, Jain, Sparsh, Engstrom, Johan, Perez, Miguel

arXiv.org Artificial Intelligence

There is an increasing need to comprehensively characterize the kinematic performances of different Micromobility Vehicles (MMVs). This study aims to: 1) characterize the kinematic behaviors of different MMVs during emergency maneuvers; 2) explore the influence of different MMV power sources on the device performances; 3) investigate if piecewise linear models are suitable for modeling MMV trajectories. A test track experiment where 40 frequent riders performed emergency braking and swerving maneuvers riding a subset of electric MMVs, their traditional counterparts, and, in some cases, behaving as running pedestrians. A second experiment was conducted to determine the MMVs swerving lower boundaries. Device power source resulted having a statistically significant influence on kinematic capabilities of the MMVs: while e-MMVs displayed superior braking capabilities compared to their traditional counterparts, the opposite was observed in terms of swerving performance. Furthermore, performances varied significantly across the different MMV typologies, with handlebar-based devices consistently outperforming the handlebar-less devices across the metrics considered. The piecewise linear models used for braking profiles fit well for most MMVs, except for skateboards and pedestrians due to foot-ground engagement. These findings underscore that the effectiveness of steering or braking in preventing collisions may vary depending on the type and power source of the device. This study also demonstrates the applicability of piecewise linear models for generating parameterized functions that accurately model braking trajectories, providing a valuable resource for automated systems developers. The model, however, also reveals that the single brake ramp assumption does not apply for certain types of MMVs or for pedestrians, indicating the necessity for further improvements.


Efficient End-to-End Visual Document Understanding with Rationale Distillation

Zhu, Wang, Agarwal, Alekh, Joshi, Mandar, Jia, Robin, Thomason, Jesse, Toutanova, Kristina

arXiv.org Artificial Intelligence

Understanding visually situated language requires recognizing text and visual elements, and interpreting complex layouts. State-of-the-art methods commonly use specialized pre-processing tools, such as optical character recognition (OCR) systems, that map document image inputs to extracted information in the space of textual tokens, and sometimes also employ large language models (LLMs) to reason in text token space. However, the gains from external tools and LLMs come at the cost of increased computational and engineering complexity. In this paper, we ask whether small pretrained image-to-text models can learn selective text or layout recognition and reasoning as an intermediate inference step in an end-to-end model for pixel-level visual language understanding. We incorporate the outputs of such OCR tools, LLMs, and larger multimodal models as intermediate ``rationales'' on training data, and train a small student model to predict both rationales and answers for input questions based on those training examples. A student model based on Pix2Struct (282M parameters) achieves consistent improvements on three visual document understanding benchmarks representing infographics, scanned documents, and figures, with improvements of more than 4\% absolute over a comparable Pix2Struct model that predicts answers directly.


Benchmarking Spatial Relationships in Text-to-Image Generation

Gokhale, Tejas, Palangi, Hamid, Nushi, Besmira, Vineet, Vibhav, Horvitz, Eric, Kamar, Ece, Baral, Chitta, Yang, Yezhou

arXiv.org Artificial Intelligence

Spatial understanding is a fundamental aspect of computer vision and integral for human-level reasoning about images, making it an important component for grounded language understanding. While recent text-to-image synthesis (T2I) models have shown unprecedented improvements in photorealism, it is unclear whether they have reliable spatial understanding capabilities. We investigate the ability of T2I models to generate correct spatial relationships among objects and present VISOR, an evaluation metric that captures how accurately the spatial relationship described in text is generated in the image. To benchmark existing models, we introduce a dataset, $\mathrm{SR}_{2D}$, that contains sentences describing two or more objects and the spatial relationships between them. We construct an automated evaluation pipeline to recognize objects and their spatial relationships, and employ it in a large-scale evaluation of T2I models. Our experiments reveal a surprising finding that, although state-of-the-art T2I models exhibit high image quality, they are severely limited in their ability to generate multiple objects or the specified spatial relations between them. Our analyses demonstrate several biases and artifacts of T2I models such as the difficulty with generating multiple objects, a bias towards generating the first object mentioned, spatially inconsistent outputs for equivalent relationships, and a correlation between object co-occurrence and spatial understanding capabilities. We conduct a human study that shows the alignment between VISOR and human judgement about spatial understanding. We offer the $\mathrm{SR}_{2D}$ dataset and the VISOR metric to the community in support of T2I reasoning research.