Goto

Collaborating Authors

 yuan



50ee6db59fca8643dc625829d4a0eab9-Paper-Conference.pdf

Neural Information Processing Systems

To uncover the factual basis, we delve into this ambiguity and detail it into two flaws according to experimental insight. Specifically, the first flaw lies in that SAM prediction is sensitive to slightly different prompt variants.



VTC-LFC: VisionTransformerCompressionwith Low-FrequencyComponents

Neural Information Processing Systems

However,thecompression only in the spatial domain suffers from a dramatic performance drop without finetuning and is not robust to noise, as the noise in the spatial domain can easily confuse the pruning criteria, leading to some parameters/channels being pruned incorrectly.



Pre-Trained Image Encoder for Generalizable Visual Reinforcement Learning

Neural Information Processing Systems

Learning generalizable policies that can adapt to unseen environments remains challenging in visual Reinforcement Learning (RL). Existing approaches try to acquire a robust representation via diversifying the appearances of in-domain observations for better generalization. Limited by the specific observations of the environment, these methods ignore the possibility of exploring diverse real-world image datasets. In this paper, we investigate how a visual RL agent would benefit from the off-the-shelf visual representations. Surprisingly, we find that the early layers in an ImageNet pre-trained ResNet model could provide rather generalizable representations for visual RL. Hence, we propose Pre-trained Image Encoder for Generalizable visual reinforcement learning (PIE-G), a simple yet effective framework that can generalize to the unseen visual scenarios in a zero-shot manner. Extensive experiments are conducted on DMControl Generalization Benchmark, DMControl Manipulation Tasks, Drawer World, and CARLA to verify the effectiveness of PIE-G. Empirical evidence suggests PIE-G improves sample efficiency and significantly outperforms previous state-of-the-art methods in terms of generalization performance. In particular, PIE-G boasts a 55% generalization performance gain on average in the challenging video background setting.


Language Models as Hierarchy Encoders

Neural Information Processing Systems

Interpreting hierarchical structures latent in language is a key limitation of current language models (LMs). While previous research has implicitly leveraged these hierarchies to enhance LMs, approaches for their explicit encoding are yet to be explored. To address this, we introduce a novel approach to re-train transformer encoder-based LMs as Hierarchy Transformer encoders (HiTs), harnessing the expansive nature of hyperbolic space. Our method situates the output embedding space of pre-trained LMs within a Poincaré ball with a curvature that adapts to the embedding dimension, followed by re-training on hyperbolic clustering and centripetal losses. These losses are designed to effectively cluster related entities (input as texts) and organise them hierarchically. We evaluate HiTs against pre-trained LMs, standard fine-tuned LMs, and several hyperbolic embedding baselines, focusing on their capabilities in simulating transitive inference, predicting subsumptions, and transferring knowledge across hierarchies. The results demonstrate that HiTs consistently outperform all baselines in these tasks, underscoring the effectiveness and transferability of our re-trained hierarchy encoders.


Tenrec: A Large-scale Multipurpose Benchmark Dataset for Recommender Systems

Neural Information Processing Systems

Existing benchmark datasets for recommender systems (RS) either are created at a small scale or involve very limited forms of user feedback. RS models evaluated on such datasets often lack practical values for large-scale real-world applications. In this paper, we describe Tenrec, a novel and publicly available data collection for RS that records various user feedback from four different recommendation scenarios. To be specific, Tenrec has the following five characteristics: (1) it is large-scale, containing around 5 million users and 140 million interactions; (2) it has not only positive user feedback, but also true negative feedback (vs.



MemOrb: A Plug-and-Play Verbal-Reinforcement Memory Layer for E-Commerce Customer Service

Huang, Yizhe, Liu, Yang, Zhao, Ruiyu, Zhong, Xiaolong, Yue, Xingming, Jiang, Ling

arXiv.org Artificial Intelligence

Large Language Model-based agents(LLM-based agents) are increasingly deployed in customer service, yet they often forget across sessions, repeat errors, and lack mechanisms for continual self-improvement. This makes them unreliable in dynamic settings where stability and consistency are critical. To address the limitations of existing approaches, we propose MemOrb, a lightweight and plug-and-play verbal reinforcement memory layer that distills multi-turn interactions into compact strategy reflections. These reflections are stored in a shared memory bank and retrieved to guide decision-making, without requiring any fine-tuning. Experiments show that MemOrb significantly improves both success rate and stability, achieving up to a 63 percentage-point gain in multi-turn success rate and delivering more consistent performance across repeated trials. Our results demonstrate that structured reflection is a powerful mechanism for enhancing long-term reliability of frozen LLM agents in customer service scenarios. Large Language Model-based agents (LLM-based agents) are increasingly adopted in large-scale customer service systems, where they act as interactive assistants for diverse users (Brown et al., 2020). Despite their rapid deployment, these agents face persistent challenges: they often lose critical information across sessions, repeat errors without systematic correction, and struggle to adapt to rapidly changing product catalogs. Such limitations undermine their reliability in dynamic environments such as e-commerce. Existing memory solutions typically rely on short-term caching or user-specific profiles (Chhikara et al., 2025; Zhong et al., 2023). Consequently, purely per-user or short-horizon memories are insufficient for robust long-term improvement.