Goto

Collaborating Authors

 yuan


The 19 Most Exciting Cars at the Beijing Auto Show 2026

WIRED

The cars that debuted at the Beijing Auto Show demonstrate that the Chinese market is now at the forefront of electrification and intelligence. These are the 19 most intriguing models we saw. The newest concept car from Lynk & Co was revealed at the 2026 Beijing Auto Show. While major motor shows in Europe and the United States are being forced to downsize or change their format, those in China continue to expand. With 1,451 vehicles on display, including 181 world premieres, the 2026 Beijing International Automotive Exhibition 2026 (also known as Auto China 2026) has become the largest auto show in history--and that's in terms of both exhibition space and the number of vehicles on display. This fact itself reflects a shift in the center of gravity of the automotive industry, but that's not all. A much larger structural transformation is actually taking place in China today. Previously, the focus was on low-priced electric vehicle models, but now price is no longer the primary point of competition.



50ee6db59fca8643dc625829d4a0eab9-Paper-Conference.pdf

Neural Information Processing Systems

To uncover the factual basis, we delve into this ambiguity and detail it into two flaws according to experimental insight. Specifically, the first flaw lies in that SAM prediction is sensitive to slightly different prompt variants.



VTC-LFC: VisionTransformerCompressionwith Low-FrequencyComponents

Neural Information Processing Systems

However,thecompression only in the spatial domain suffers from a dramatic performance drop without finetuning and is not robust to noise, as the noise in the spatial domain can easily confuse the pruning criteria, leading to some parameters/channels being pruned incorrectly.



Pre-Trained Image Encoder for Generalizable Visual Reinforcement Learning

Neural Information Processing Systems

Learning generalizable policies that can adapt to unseen environments remains challenging in visual Reinforcement Learning (RL). Existing approaches try to acquire a robust representation via diversifying the appearances of in-domain observations for better generalization. Limited by the specific observations of the environment, these methods ignore the possibility of exploring diverse real-world image datasets. In this paper, we investigate how a visual RL agent would benefit from the off-the-shelf visual representations. Surprisingly, we find that the early layers in an ImageNet pre-trained ResNet model could provide rather generalizable representations for visual RL. Hence, we propose Pre-trained Image Encoder for Generalizable visual reinforcement learning (PIE-G), a simple yet effective framework that can generalize to the unseen visual scenarios in a zero-shot manner. Extensive experiments are conducted on DMControl Generalization Benchmark, DMControl Manipulation Tasks, Drawer World, and CARLA to verify the effectiveness of PIE-G. Empirical evidence suggests PIE-G improves sample efficiency and significantly outperforms previous state-of-the-art methods in terms of generalization performance. In particular, PIE-G boasts a 55% generalization performance gain on average in the challenging video background setting.


Language Models as Hierarchy Encoders

Neural Information Processing Systems

Interpreting hierarchical structures latent in language is a key limitation of current language models (LMs). While previous research has implicitly leveraged these hierarchies to enhance LMs, approaches for their explicit encoding are yet to be explored. To address this, we introduce a novel approach to re-train transformer encoder-based LMs as Hierarchy Transformer encoders (HiTs), harnessing the expansive nature of hyperbolic space. Our method situates the output embedding space of pre-trained LMs within a Poincaré ball with a curvature that adapts to the embedding dimension, followed by re-training on hyperbolic clustering and centripetal losses. These losses are designed to effectively cluster related entities (input as texts) and organise them hierarchically. We evaluate HiTs against pre-trained LMs, standard fine-tuned LMs, and several hyperbolic embedding baselines, focusing on their capabilities in simulating transitive inference, predicting subsumptions, and transferring knowledge across hierarchies. The results demonstrate that HiTs consistently outperform all baselines in these tasks, underscoring the effectiveness and transferability of our re-trained hierarchy encoders.


Tenrec: A Large-scale Multipurpose Benchmark Dataset for Recommender Systems

Neural Information Processing Systems

Existing benchmark datasets for recommender systems (RS) either are created at a small scale or involve very limited forms of user feedback. RS models evaluated on such datasets often lack practical values for large-scale real-world applications. In this paper, we describe Tenrec, a novel and publicly available data collection for RS that records various user feedback from four different recommendation scenarios. To be specific, Tenrec has the following five characteristics: (1) it is large-scale, containing around 5 million users and 140 million interactions; (2) it has not only positive user feedback, but also true negative feedback (vs.


Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper

Zhu, Xinyue, Huang, Binghao, Li, Yunzhu

arXiv.org Artificial Intelligence

Handheld grippers are increasingly used to collect human demonstrations due to their ease of deployment and versatility. However, most existing designs lack tactile sensing, despite the critical role of tactile feedback in precise manipulation. We present a portable, lightweight gripper with integrated tactile sensors that enables synchronized collection of visual and tactile data in diverse, real-world, and in-the-wild settings. Building on this hardware, we propose a cross-modal representation learning framework that integrates visual and tactile signals while preserving their distinct characteristics. The learning procedure allows the emergence of interpretable representations that consistently focus on contacting regions relevant for physical interactions. When used for downstream manipulation tasks, these representations enable more efficient and effective policy learning, supporting precise robotic manipulation based on multimodal feedback. We validate our approach on fine-grained tasks such as test tube insertion and pipette-based fluid transfer, demonstrating improved accuracy and robustness under external disturbances. Our project page is available at https://binghao-huang.github.io/touch_in_the_wild/ .