sonic
SONIC: Supersizing Motion Tracking for Natural Humanoid Whole-Body Control
Luo, Zhengyi, Yuan, Ye, Wang, Tingwu, Li, Chenran, Chen, Sirui, Castañeda, Fernando, Cao, Zi-Ang, Li, Jiefeng, Minor, David, Ben, Qingwei, Da, Xingye, Ding, Runyu, Hogg, Cyrus, Song, Lina, Lim, Edy, Jeong, Eugene, He, Tairan, Xue, Haoru, Xiao, Wenli, Wang, Zi, Yuen, Simon, Kautz, Jan, Chang, Yan, Iqbal, Umar, Fan, Linxi "Jim", Zhu, Yuke
Despite the rise of billion-parameter foundation models trained across thousands of GPUs, similar scaling gains have not been shown for humanoid control. Current neural controllers for humanoids remain modest in size, target a limited set of behaviors, and are trained on a handful of GPUs over several days. We show that scaling up model capacity, data, and compute yields a generalist humanoid controller capable of creating natural and robust whole-body movements. Specifically, we posit motion tracking as a natural and scalable task for humanoid control, leveraging dense supervision from diverse motion-capture data to acquire human motion priors without manual reward engineering. We build a foundation model for motion tracking by scaling along three axes: network size (from 1.2M to 42M parameters), dataset volume (over 100M frames, 700 hours of high-quality motion data), and compute (9k GPU hours). Beyond demonstrating the benefits of scale, we show the practical utility of our model through two mechanisms: (1) a real-time universal kinematic planner that bridges motion tracking to downstream task execution, enabling natural and interactive control, and (2) a unified token space that supports various motion input interfaces, such as VR teleoperation devices, human videos, and vision-language-action (VLA) models, all using the same policy. Scaling motion tracking exhibits favorable properties: performance improves steadily with increased compute and data diversity, and learned representations generalize to unseen motions, establishing motion tracking at scale as a practical foundation for humanoid control.
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
- Asia > Middle East > Jordan (0.04)
SpikePingpong: High-Frequency Spike Vision-based Robot Learning for Precise Striking in Table Tennis Game
Wang, Hao, Hou, Chengkai, Li, Xianglong, Fu, Yankai, Li, Chenxuan, Chen, Ning, Dai, Gaole, Liu, Jiaming, Huang, Tiejun, Zhang, Shanghang
Learning to control high-speed objects in the real world remains a challenging frontier in robotics. Table tennis serves as an ideal testbed for this problem, demanding both rapid interception of fast-moving balls and precise adjustment of their trajectories. This task presents two fundamental challenges: it requires a high-precision vision system capable of accurately predicting ball trajectories, and it necessitates intelligent strategic planning to ensure precise ball placement to target regions. The dynamic nature of table tennis, coupled with its real-time response requirements, makes it particularly well-suited for advancing robotic control capabilities in fast-paced, precision-critical domains. In this paper, we present SpikePingpong, a novel system that integrates spike-based vision with imitation learning for high-precision robotic table tennis. Our approach introduces two key attempts that directly address the aforementioned challenges: SONIC, a spike camera-based module that achieves millimeter-level precision in ball-racket contact prediction by compensating for real-world uncertainties such as air resistance and friction; and IMPACT, a strategic planning module that enables accurate ball placement to targeted table regions. The system harnesses a 20 kHz spike camera for high-temporal resolution ball tracking, combined with efficient neural network models for real-time trajectory correction and stroke planning. Experimental results demonstrate that SpikePingpong achieves a remarkable 91% success rate for 30 cm accuracy target area and 71% in the more challenging 20 cm accuracy task, surpassing previous state-of-the-art approaches by 38% and 37% respectively. These significant performance improvements enable the robust implementation of sophisticated tactical gameplay strategies, providing a new research perspective for robotic control in high-speed dynamic tasks.
Shinobi is the latest video game to get the big screen treatment
Back in the old days, there was no sure-fire indicator of box office poison more than a video game adaptation. That has changed in recent years and now all kinds of gaming mascots are getting their chance to appear in a major motion picture or, at the very least, a streaming series. They're now making a movie based on Shinobi, as reported by Deadline. For the uninitiated, Shinobi is a famous hack-and-slash game developed by Sega in which you play as a ninja. There have been plenty of sequels throughout the years, though they mostly share the same basic story.
- Media > Film (1.00)
- Leisure & Entertainment > Games > Computer Games (0.96)
Sega's ninja game Shinobi to get the movie treatment
One of Sega's most popular games, Shinobi, will be made into a movie in a joint project with Universal Pictures, the Japanese gamemaker announced Wednesday, aiming to emulate the success of "The Super Mario Bros. Movie." Sega did not give a target date for the release but said it had "started the development of a film production" with the Hollywood behemoth. Shinobi was originally created for Japanese arcades in 1987 and features a ninja character who fights to stop a criminal organization that kidnaps child ninjas. It is the latest effort to cash in on a video-game adaptation craze after "The Super Mario Bros. Movie" became the second-highest grossing film of 2023, following a 2020 adaptation of Sega's "Sonic the Hedgehog." "Shinobi is one of Sega's most popular series worldwide, along with Sonic the Hedgehog," Sega said on Wednesday.
- Media > Film (1.00)
- Leisure & Entertainment > Games > Computer Games (1.00)
SONICS: Synthetic Or Not -- Identifying Counterfeit Songs
Rahman, Md Awsafur, Hakim, Zaber Ibn Abdul, Sarker, Najibul Haque, Paul, Bishmoy, Fattah, Shaikh Anowarul
The recent surge in AI-generated songs presents exciting possibilities and challenges. While these tools democratize music creation, they also necessitate the ability to distinguish between human-composed and AI-generated songs for safeguarding artistic integrity and content curation. Existing research and datasets in fake song detection only focus on singing voice deepfake detection (SVDD), where the vocals are AI-generated but the instrumental music is sourced from real songs. However, this approach is inadequate for contemporary end-to-end AI-generated songs where all components (vocals, lyrics, music, and style) could be AI-generated. Additionally, existing datasets lack lyrics-music diversity, long-duration songs, and open fake songs. To address these gaps, we introduce SONICS, a novel dataset for end-to-end Synthetic Song Detection (SSD), comprising over 97k songs with over 49k synthetic songs from popular platforms like Suno and Udio. Furthermore, we highlight the importance of modeling long-range temporal dependencies in songs for effective authenticity detection, an aspect overlooked in existing methods. To capture these patterns, we propose a novel model, SpecTTTra, that is up to 3 times faster and 6 times more memory efficient compared to popular CNN and Transformer-based models while maintaining competitive performance. Finally, we offer both AI-based and Human evaluation benchmarks, addressing another deficiency in current research.
- North America > United States > Indiana (0.04)
- Europe > United Kingdom > England > Greater London > London > Wimbledon (0.04)
- Asia > Middle East > Republic of Türkiye > Batman Province > Batman (0.04)
- (18 more...)
- Media > Music (1.00)
- Media > Film (1.00)
- Leisure & Entertainment > Sports (1.00)
- (2 more...)
Sonic: Fast and Transferable Data Poisoning on Clustering Algorithms
Villani, Francesco, Lazzaro, Dario, Cinà, Antonio Emanuele, Dell'Amico, Matteo, Biggio, Battista, Roli, Fabio
Data poisoning attacks on clustering algorithms have received limited attention, with existing methods struggling to scale efficiently as dataset sizes and feature counts increase. These attacks typically require re-clustering the entire dataset multiple times to generate predictions and assess the attacker's objectives, significantly hindering their scalability. This paper addresses these limitations by proposing Sonic, a novel genetic data poisoning attack that leverages incremental and scalable clustering algorithms, e.g., FISHDBC, as surrogates to accelerate poisoning attacks against graph-based and density-based clustering methods, such as HDBSCAN. We empirically demonstrate the effectiveness and efficiency of Sonic in poisoning the target clustering algorithms. We then conduct a comprehensive analysis of the factors affecting the scalability and transferability of poisoning attacks against clustering algorithms, and we conclude by examining the robustness of hyperparameters in our attack strategy Sonic.
- Europe > Italy > Sardinia > Cagliari (0.04)
- North America > United States > New York (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Finland > North Karelia > Joensuu (0.04)
SoNIC: Safe Social Navigation with Adaptive Conformal Inference and Constrained Reinforcement Learning
Yao, Jianpeng, Zhang, Xiaopan, Xia, Yu, Wang, Zejin, Roy-Chowdhury, Amit K., Li, Jiachen
Reinforcement Learning (RL) has enabled social robots to generate trajectories without human-designed rules or interventions, which makes it more effective than hard-coded systems for generalizing to complex real-world scenarios. However, social navigation is a safety-critical task that requires robots to avoid collisions with pedestrians while previous RL-based solutions fall short in safety performance in complex environments. To enhance the safety of RL policies, to the best of our knowledge, we propose the first algorithm, SoNIC, that integrates adaptive conformal inference (ACI) with constrained reinforcement learning (CRL) to learn safe policies for social navigation. More specifically, our method augments RL observations with ACI-generated nonconformity scores and provides explicit guidance for agents to leverage the uncertainty metrics to avoid safety-critical areas by incorporating safety constraints with spatial relaxation. Our method outperforms state-of-the-art baselines in terms of both safety and adherence to social norms by a large margin and demonstrates much stronger robustness to out-of-distribution scenarios. Our code and video demos are available on our project website: https://sonic-social-nav.github.io/.
SONIC: Sonar Image Correspondence using Pose Supervised Learning for Imaging Sonars
Gode, Samiran, Hinduja, Akshay, Kaess, Michael
In this paper, we address the challenging problem of data association for underwater SLAM through a novel method for sonar image correspondence using learned features. We introduce SONIC (SONar Image Correspondence), a pose-supervised network designed to yield robust feature correspondence capable of withstanding viewpoint variations. The inherent complexity of the underwater environment stems from the dynamic and frequently limited visibility conditions, restricting vision to a few meters of often featureless expanses. This makes camera-based systems suboptimal in most open water application scenarios. Consequently, multibeam imaging sonars emerge as the preferred choice for perception sensors. However, they too are not without their limitations. While imaging sonars offer superior long-range visibility compared to cameras, their measurements can appear different from varying viewpoints. This inherent variability presents formidable challenges in data association, particularly for feature-based methods. Our method demonstrates significantly better performance in generating correspondences for sonar images which will pave the way for more accurate loop closure constraints and sonar-based place recognition. Code as well as simulated and real-world datasets will be made public to facilitate further development in the field.
'Pizza Tower' is the 'Wario Land' 'Sonic' crossover I didn't know I wanted
My favorite video game of 2023 involves a portly, balding pizza chef named Peppino Spaghetti scaling a medieval tower to defeat a sentient floating pie threatening to blow up his pizzeria. It was developed by a small independent studio named Tour de Pizza, led by a designer named McPig. Its soundtrack was largely composed by a first-time composer and a high school student. Its art style is at once expressive and grotesque. It's called Pizza Tower, and it is, in all seriousness, one of the best 2D platformers I've played in a long time. I'm late here, as Pizza Tower arrived on PC in January.
- Leisure & Entertainment > Games > Computer Games (0.70)
- Education > Educational Setting > K-12 Education > Secondary School (0.55)
Contrastive Decoding: Open-ended Text Generation as Optimization
Li, Xiang Lisa, Holtzman, Ari, Fried, Daniel, Liang, Percy, Eisner, Jason, Hashimoto, Tatsunori, Zettlemoyer, Luke, Lewis, Mike
Given a language model (LM), maximum probability is a poor decoding objective for open-ended generation, because it produces short and repetitive text. On the other hand, sampling can often produce incoherent text that drifts from the original topics. We propose contrastive decoding (CD), a reliable decoding approach that optimizes a contrastive objective subject to a plausibility constraint. The contrastive objective returns the difference between the likelihood under a large LM (called the expert, e.g. OPT-13B) and a small LM (called the amateur, e.g. OPT-125M), and the constraint ensures that the outputs are plausible. CD is inspired by the fact that the failures of larger LMs (e.g., repetition, incoherence) are even more prevalent in smaller LMs, and that this difference signals which texts should be preferred. CD requires zero additional training, and produces higher quality text than decoding from the larger LM alone. It also works across model scales (OPT-13B and GPT2-1.5B) and significantly outperforms four strong decoding algorithms (e.g., nucleus, top-k) in automatic and human evaluations across wikipedia, news and story domains.
- North America > United States > Montana (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Colorado > El Paso County > Colorado Springs (0.04)
- (34 more...)
- Personal (0.46)
- Research Report (0.40)
- Transportation > Air (1.00)
- Media > Film (1.00)
- Transportation > Infrastructure & Services (0.93)
- (4 more...)
- Information Technology > Communications (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.46)