Wang, Ziyun
EqNIO: Subequivariant Neural Inertial Odometry
Jayanth, Royina Karegoudra, Xu, Yinshuang, Wang, Ziyun, Chatzipantazis, Evangelos, Gehrig, Daniel, Daniilidis, Kostas
Neural networks are seeing rapid adoption in purely inertial odometry, where accelerometer and gyroscope measurements from commodity inertial measurement units (IMU) are used to regress displacements and associated uncertainties. They can learn informative displacement priors, which can be directly fused with the raw data with off-the-shelf non-linear filters. Nevertheless, these networks do not consider the physical roto-reflective symmetries inherent in IMU data, leading to the need to memorize the same priors for every possible motion direction, which hinders generalization. In this work, we characterize these symmetries and show that the IMU data and the resulting displacement and covariance transform equivariantly, when rotated around the gravity vector and reflected with respect to arbitrary planes parallel to gravity. We design a neural network that respects these symmetries by design through equivariant processing in three steps: First, it estimates an equivariant gravity-aligned frame from equivariant vectors and invariant scalars derived from IMU data, leveraging expressive linear and non-linear layers tailored to commute with the underlying symmetry transformation. We then map the IMU data into this frame, thereby achieving an invariant canonicalization that can be directly used with off-the-shelf inertial odometry networks. Finally, we map these network outputs back into the original frame, thereby obtaining equivariant covariances and displacements. We demonstrate the generality of our framework by applying it to the filter-based approach based on TLIO, and the end-to-end RONIN architecture, and show better performance on the TLIO, Aria, RIDI and OxIOD datasets than existing methods.
Motion-prior Contrast Maximization for Dense Continuous-Time Motion Estimation
Hamann, Friedhelm, Wang, Ziyun, Asmanis, Ioannis, Chaney, Kenneth, Gallego, Guillermo, Daniilidis, Kostas
Current optical flow and point-tracking methods rely heavily on synthetic datasets. Event cameras are novel vision sensors with advantages in challenging visual conditions, but state-of-the-art frame-based methods cannot be easily adapted to event data due to the limitations of current event simulators. We introduce a novel self-supervised loss combining the Contrast Maximization framework with a non-linear motion prior in the form of pixel-level trajectories and propose an efficient solution to solve the high-dimensional assignment problem between non-linear trajectories and events. Their effectiveness is demonstrated in two scenarios: In dense continuous-time motion estimation, our method improves the zero-shot performance of a synthetically trained model on the real-world dataset EVIMO2 by 29%. In optical flow estimation, our method elevates a simple UNet to achieve state-of-the-art performance among self-supervised methods on the DSEC optical flow benchmark. Our code is available at https://github.com/tub-rip/MotionPriorCMax.
EV-Catcher: High-Speed Object Catching Using Low-latency Event-based Neural Networks
Wang, Ziyun, Ojeda, Fernando Cladera, Bisulco, Anthony, Lee, Daewon, Taylor, Camillo J., Daniilidis, Kostas, Hsieh, M. Ani, Lee, Daniel D., Isler, Volkan
Event-based sensors have recently drawn increasing interest in robotic perception due to their lower latency, higher dynamic range, and lower bandwidth requirements compared to standard CMOS-based imagers. These properties make them ideal tools for real-time perception tasks in highly dynamic environments. In this work, we demonstrate an application where event cameras excel: accurately estimating the impact location of fast-moving objects. We introduce a lightweight event representation called Binary Event History Image (BEHI) to encode event data at low latency, as well as a learning-based approach that allows real-time inference of a confidence-enabled control signal to the robot. To validate our approach, we present an experimental catching system in which we catch fast-flying ping-pong balls. We show that the system is capable of achieving a success rate of 81% in catching balls targeted at different locations, with a velocity of up to 13 m/s even on compute-constrained embedded platforms such as the Nvidia Jetson NX.
FewRel: A Large-Scale Supervised Few-Shot Relation Classification Dataset with State-of-the-Art Evaluation
Han, Xu, Zhu, Hao, Yu, Pengfei, Wang, Ziyun, Yao, Yuan, Liu, Zhiyuan, Sun, Maosong
We present a Few-Shot Relation Classification Dataset (FewRel), consisting of 70, 000 sentences on 100 relations derived from Wikipedia and annotated by crowdworkers. The relation of each sentence is first recognized by distant supervision methods, and then filtered by crowdworkers. We adapt the most recent state-of-the-art few-shot learning methods for relation classification and conduct a thorough evaluation of these methods. Empirical results show that even the most competitive few-shot learning models struggle on this task, especially as compared with humans. We also show that a range of different reasoning skills are needed to solve our task. These results indicate that few-shot relation classification remains an open problem and still requires further research. Our detailed analysis points multiple directions for future research. All details and resources about the dataset and baselines are released on http://zhuhao.me/fewrel.