Goto

Collaborating Authors

 sonic


SONIC: Supersizing Motion Tracking for Natural Humanoid Whole-Body Control

Luo, Zhengyi, Yuan, Ye, Wang, Tingwu, Li, Chenran, Chen, Sirui, Castañeda, Fernando, Cao, Zi-Ang, Li, Jiefeng, Minor, David, Ben, Qingwei, Da, Xingye, Ding, Runyu, Hogg, Cyrus, Song, Lina, Lim, Edy, Jeong, Eugene, He, Tairan, Xue, Haoru, Xiao, Wenli, Wang, Zi, Yuen, Simon, Kautz, Jan, Chang, Yan, Iqbal, Umar, Fan, Linxi "Jim", Zhu, Yuke

arXiv.org Artificial Intelligence

Despite the rise of billion-parameter foundation models trained across thousands of GPUs, similar scaling gains have not been shown for humanoid control. Current neural controllers for humanoids remain modest in size, target a limited set of behaviors, and are trained on a handful of GPUs over several days. We show that scaling up model capacity, data, and compute yields a generalist humanoid controller capable of creating natural and robust whole-body movements. Specifically, we posit motion tracking as a natural and scalable task for humanoid control, leveraging dense supervision from diverse motion-capture data to acquire human motion priors without manual reward engineering. We build a foundation model for motion tracking by scaling along three axes: network size (from 1.2M to 42M parameters), dataset volume (over 100M frames, 700 hours of high-quality motion data), and compute (9k GPU hours). Beyond demonstrating the benefits of scale, we show the practical utility of our model through two mechanisms: (1) a real-time universal kinematic planner that bridges motion tracking to downstream task execution, enabling natural and interactive control, and (2) a unified token space that supports various motion input interfaces, such as VR teleoperation devices, human videos, and vision-language-action (VLA) models, all using the same policy. Scaling motion tracking exhibits favorable properties: performance improves steadily with increased compute and data diversity, and learned representations generalize to unseen motions, establishing motion tracking at scale as a practical foundation for humanoid control.


SpikePingpong: High-Frequency Spike Vision-based Robot Learning for Precise Striking in Table Tennis Game

Wang, Hao, Hou, Chengkai, Li, Xianglong, Fu, Yankai, Li, Chenxuan, Chen, Ning, Dai, Gaole, Liu, Jiaming, Huang, Tiejun, Zhang, Shanghang

arXiv.org Artificial Intelligence

Learning to control high-speed objects in the real world remains a challenging frontier in robotics. Table tennis serves as an ideal testbed for this problem, demanding both rapid interception of fast-moving balls and precise adjustment of their trajectories. This task presents two fundamental challenges: it requires a high-precision vision system capable of accurately predicting ball trajectories, and it necessitates intelligent strategic planning to ensure precise ball placement to target regions. The dynamic nature of table tennis, coupled with its real-time response requirements, makes it particularly well-suited for advancing robotic control capabilities in fast-paced, precision-critical domains. In this paper, we present SpikePingpong, a novel system that integrates spike-based vision with imitation learning for high-precision robotic table tennis. Our approach introduces two key attempts that directly address the aforementioned challenges: SONIC, a spike camera-based module that achieves millimeter-level precision in ball-racket contact prediction by compensating for real-world uncertainties such as air resistance and friction; and IMPACT, a strategic planning module that enables accurate ball placement to targeted table regions. The system harnesses a 20 kHz spike camera for high-temporal resolution ball tracking, combined with efficient neural network models for real-time trajectory correction and stroke planning. Experimental results demonstrate that SpikePingpong achieves a remarkable 91% success rate for 30 cm accuracy target area and 71% in the more challenging 20 cm accuracy task, surpassing previous state-of-the-art approaches by 38% and 37% respectively. These significant performance improvements enable the robust implementation of sophisticated tactical gameplay strategies, providing a new research perspective for robotic control in high-speed dynamic tasks.


Shinobi is the latest video game to get the big screen treatment

Engadget

Back in the old days, there was no sure-fire indicator of box office poison more than a video game adaptation. That has changed in recent years and now all kinds of gaming mascots are getting their chance to appear in a major motion picture or, at the very least, a streaming series. They're now making a movie based on Shinobi, as reported by Deadline. For the uninitiated, Shinobi is a famous hack-and-slash game developed by Sega in which you play as a ninja. There have been plenty of sequels throughout the years, though they mostly share the same basic story.


Sega's ninja game Shinobi to get the movie treatment

The Japan Times

One of Sega's most popular games, Shinobi, will be made into a movie in a joint project with Universal Pictures, the Japanese gamemaker announced Wednesday, aiming to emulate the success of "The Super Mario Bros. Movie." Sega did not give a target date for the release but said it had "started the development of a film production" with the Hollywood behemoth. Shinobi was originally created for Japanese arcades in 1987 and features a ninja character who fights to stop a criminal organization that kidnaps child ninjas. It is the latest effort to cash in on a video-game adaptation craze after "The Super Mario Bros. Movie" became the second-highest grossing film of 2023, following a 2020 adaptation of Sega's "Sonic the Hedgehog." "Shinobi is one of Sega's most popular series worldwide, along with Sonic the Hedgehog," Sega said on Wednesday.


SONICS: Synthetic Or Not -- Identifying Counterfeit Songs

Rahman, Md Awsafur, Hakim, Zaber Ibn Abdul, Sarker, Najibul Haque, Paul, Bishmoy, Fattah, Shaikh Anowarul

arXiv.org Artificial Intelligence

The recent surge in AI-generated songs presents exciting possibilities and challenges. While these tools democratize music creation, they also necessitate the ability to distinguish between human-composed and AI-generated songs for safeguarding artistic integrity and content curation. Existing research and datasets in fake song detection only focus on singing voice deepfake detection (SVDD), where the vocals are AI-generated but the instrumental music is sourced from real songs. However, this approach is inadequate for contemporary end-to-end AI-generated songs where all components (vocals, lyrics, music, and style) could be AI-generated. Additionally, existing datasets lack lyrics-music diversity, long-duration songs, and open fake songs. To address these gaps, we introduce SONICS, a novel dataset for end-to-end Synthetic Song Detection (SSD), comprising over 97k songs with over 49k synthetic songs from popular platforms like Suno and Udio. Furthermore, we highlight the importance of modeling long-range temporal dependencies in songs for effective authenticity detection, an aspect overlooked in existing methods. To capture these patterns, we propose a novel model, SpecTTTra, that is up to 3 times faster and 6 times more memory efficient compared to popular CNN and Transformer-based models while maintaining competitive performance. Finally, we offer both AI-based and Human evaluation benchmarks, addressing another deficiency in current research.


Sonic: Fast and Transferable Data Poisoning on Clustering Algorithms

Villani, Francesco, Lazzaro, Dario, Cinà, Antonio Emanuele, Dell'Amico, Matteo, Biggio, Battista, Roli, Fabio

arXiv.org Artificial Intelligence

Data poisoning attacks on clustering algorithms have received limited attention, with existing methods struggling to scale efficiently as dataset sizes and feature counts increase. These attacks typically require re-clustering the entire dataset multiple times to generate predictions and assess the attacker's objectives, significantly hindering their scalability. This paper addresses these limitations by proposing Sonic, a novel genetic data poisoning attack that leverages incremental and scalable clustering algorithms, e.g., FISHDBC, as surrogates to accelerate poisoning attacks against graph-based and density-based clustering methods, such as HDBSCAN. We empirically demonstrate the effectiveness and efficiency of Sonic in poisoning the target clustering algorithms. We then conduct a comprehensive analysis of the factors affecting the scalability and transferability of poisoning attacks against clustering algorithms, and we conclude by examining the robustness of hyperparameters in our attack strategy Sonic.


SoNIC: Safe Social Navigation with Adaptive Conformal Inference and Constrained Reinforcement Learning

Yao, Jianpeng, Zhang, Xiaopan, Xia, Yu, Wang, Zejin, Roy-Chowdhury, Amit K., Li, Jiachen

arXiv.org Artificial Intelligence

Reinforcement Learning (RL) has enabled social robots to generate trajectories without human-designed rules or interventions, which makes it more effective than hard-coded systems for generalizing to complex real-world scenarios. However, social navigation is a safety-critical task that requires robots to avoid collisions with pedestrians while previous RL-based solutions fall short in safety performance in complex environments. To enhance the safety of RL policies, to the best of our knowledge, we propose the first algorithm, SoNIC, that integrates adaptive conformal inference (ACI) with constrained reinforcement learning (CRL) to learn safe policies for social navigation. More specifically, our method augments RL observations with ACI-generated nonconformity scores and provides explicit guidance for agents to leverage the uncertainty metrics to avoid safety-critical areas by incorporating safety constraints with spatial relaxation. Our method outperforms state-of-the-art baselines in terms of both safety and adherence to social norms by a large margin and demonstrates much stronger robustness to out-of-distribution scenarios. Our code and video demos are available on our project website: https://sonic-social-nav.github.io/.


SONIC: Sonar Image Correspondence using Pose Supervised Learning for Imaging Sonars

Gode, Samiran, Hinduja, Akshay, Kaess, Michael

arXiv.org Artificial Intelligence

In this paper, we address the challenging problem of data association for underwater SLAM through a novel method for sonar image correspondence using learned features. We introduce SONIC (SONar Image Correspondence), a pose-supervised network designed to yield robust feature correspondence capable of withstanding viewpoint variations. The inherent complexity of the underwater environment stems from the dynamic and frequently limited visibility conditions, restricting vision to a few meters of often featureless expanses. This makes camera-based systems suboptimal in most open water application scenarios. Consequently, multibeam imaging sonars emerge as the preferred choice for perception sensors. However, they too are not without their limitations. While imaging sonars offer superior long-range visibility compared to cameras, their measurements can appear different from varying viewpoints. This inherent variability presents formidable challenges in data association, particularly for feature-based methods. Our method demonstrates significantly better performance in generating correspondences for sonar images which will pave the way for more accurate loop closure constraints and sonar-based place recognition. Code as well as simulated and real-world datasets will be made public to facilitate further development in the field.


'Pizza Tower' is the 'Wario Land' 'Sonic' crossover I didn't know I wanted

Engadget

My favorite video game of 2023 involves a portly, balding pizza chef named Peppino Spaghetti scaling a medieval tower to defeat a sentient floating pie threatening to blow up his pizzeria. It was developed by a small independent studio named Tour de Pizza, led by a designer named McPig. Its soundtrack was largely composed by a first-time composer and a high school student. Its art style is at once expressive and grotesque. It's called Pizza Tower, and it is, in all seriousness, one of the best 2D platformers I've played in a long time. I'm late here, as Pizza Tower arrived on PC in January.


Contrastive Decoding: Open-ended Text Generation as Optimization

Li, Xiang Lisa, Holtzman, Ari, Fried, Daniel, Liang, Percy, Eisner, Jason, Hashimoto, Tatsunori, Zettlemoyer, Luke, Lewis, Mike

arXiv.org Artificial Intelligence

Given a language model (LM), maximum probability is a poor decoding objective for open-ended generation, because it produces short and repetitive text. On the other hand, sampling can often produce incoherent text that drifts from the original topics. We propose contrastive decoding (CD), a reliable decoding approach that optimizes a contrastive objective subject to a plausibility constraint. The contrastive objective returns the difference between the likelihood under a large LM (called the expert, e.g. OPT-13B) and a small LM (called the amateur, e.g. OPT-125M), and the constraint ensures that the outputs are plausible. CD is inspired by the fact that the failures of larger LMs (e.g., repetition, incoherence) are even more prevalent in smaller LMs, and that this difference signals which texts should be preferred. CD requires zero additional training, and produces higher quality text than decoding from the larger LM alone. It also works across model scales (OPT-13B and GPT2-1.5B) and significantly outperforms four strong decoding algorithms (e.g., nucleus, top-k) in automatic and human evaluations across wikipedia, news and story domains.