retr
- North America > United States (0.04)
- Europe > Italy > Emilia-Romagna > Metropolitan City of Bologna > Bologna (0.04)
- Asia > Japan (0.04)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.67)
- Automobiles & Trucks (0.68)
- Information Technology (0.68)
- Health & Medicine (0.46)
ReTR: Modeling Rendering Via Transformer for Generalizable Neural Surface Reconstruction
Generalizable neural surface reconstruction techniques have attracted great attention in recent years. However, they encounter limitations of low confidence depth distribution and inaccurate surface reasoning due to the oversimplified volume rendering process employed. In this paper, we present Reconstruction TRansformer (ReTR), a novel framework that leverages the transformer architecture to redesign the rendering process, enabling complex render interaction modeling. It introduces a learnable $\textit{meta-ray token}$ and utilizes the cross-attention mechanism to simulate the interaction of rendering process with sampled points and render the observed color. Meanwhile, by operating within a high-dimensional feature space rather than the color space, ReTR mitigates sensitivity to projected colors in source views. Such improvements result in accurate surface assessment with high confidence. We demonstrate the effectiveness of our approach on various datasets, showcasing how our method outperforms the current state-of-the-art approaches in terms of reconstruction quality and generalization ability.
RETR: Multi-View Radar Detection Transformer for Indoor Perception
Indoor radar perception has seen rising interest due to affordable costs driven by emerging automotive imaging radar developments and the benefits of reduced privacy concerns and reliability under hazardous conditions (e.g., fire and smoke). However, existing radar perception pipelines fail to account for distinctive characteristics of the multi-view radar setting. In this paper, we propose Radar dEtection TRansformer (RETR), an extension of the popular DETR architecture, tailored for multi-view radar perception. RETR inherits the advantages of DETR, eliminating the need for hand-crafted components for object detection and segmentation in the image plane. More importantly, RETR incorporates carefully designed modifications such as 1) depth-prioritized feature similarity via a tunable positional encoding (TPE); 2) a tri-plane loss from both radar and camera coordinates; and 3) a learnable radar-to-camera transformation via reparameterization, to account for the unique multi-view radar setting. Evaluated on two indoor radar perception datasets, our approach outperforms existing state-of-the-art methods by a margin of 15.38+ AP for object detection and 11.91+ IoU for instance segmentation, respectively.
- North America > United States (0.04)
- Europe > Italy > Emilia-Romagna > Metropolitan City of Bologna > Bologna (0.04)
- Asia > Japan (0.04)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.67)
- Information Technology (0.93)
- Automobiles & Trucks (0.68)
KVzip: Query-Agnostic KV Cache Compression with Context Reconstruction
Kim, Jang-Hyun, Kim, Jinuk, Kwon, Sangwoo, Lee, Jae W., Yun, Sangdoo, Song, Hyun Oh
Transformer-based large language models (LLMs) cache context as key-value (KV) pairs during inference. As context length grows, KV cache sizes expand, leading to substantial memory overhead and increased attention latency. This paper introduces KVzip, a query-agnostic KV cache eviction method enabling effective reuse of compressed KV caches across diverse queries. KVzip quantifies the importance of a KV pair using the underlying LLM to reconstruct original contexts from cached KV pairs, subsequently evicting pairs with lower importance. Extensive empirical evaluations demonstrate that KVzip reduces KV cache size by $3$-$4\times$ and FlashAttention decoding latency by approximately $2\times$, with negligible performance loss in question-answering, retrieval, reasoning, and code comprehension tasks. Evaluations include various models such as LLaMA3.1, Qwen2.5, and Gemma3, with context lengths reaching up to 170K tokens. KVzip significantly outperforms existing query-aware KV eviction methods, which suffer from performance degradation even at a 90% cache budget ratio under multi-query scenarios.
Scalable and Robust LLM Unlearning by Correcting Responses with Retrieved Exclusions
Kim, Junbeom, Kim, Kyuyoung, Tack, Jihoon, Lim, Dongha, Shin, Jinwoo
Language models trained on web-scale corpora risk memorizing and exposing sensitive information, prompting the need for effective machine unlearning. Prior methods mainly focus on input queries to suppress sensitive outputs, yet this often fails to eliminate the underlying knowledge and limits scalability. To address this, we propose Corrective Unlearning with Retrieved Exclusions (CURE), a novel unlearning framework that verifies model outputs for leakage and revises them into safe responses. Specifically, CURE employs a lightweight corrector that is applied to the original model to verify whether outputs contain target knowledge and to rewrite them if any leakage is detected. To efficiently handle large-scale unlearning requests, CURE retrieves unlearning targets that are relevant to the initial response and provides them as in-context references to the corrector for detection and conditional revision. By leveraging this retrieval augmentation, the corrector can adapt to new unlearning requests without additional training. Extensive evaluations demonstrate that CURE substantially reduces information leakage, even from indirect queries where prior works fall short, while maintaining response quality and general utility. Moreover, it demonstrates robustness under continual unlearning scenarios, making it practical for real-world applications.
- Asia > Taiwan > Taiwan Province > Taipei (0.04)
- North America > United States > Virginia (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Security & Privacy (0.88)
RETR: Multi-View Radar Detection Transformer for Indoor Perception
Indoor radar perception has seen rising interest due to affordable costs driven by emerging automotive imaging radar developments and the benefits of reduced privacy concerns and reliability under hazardous conditions (e.g., fire and smoke). However, existing radar perception pipelines fail to account for distinctive characteristics of the multi-view radar setting. In this paper, we propose Radar dEtection TRansformer (RETR), an extension of the popular DETR architecture, tailored for multi-view radar perception. RETR inherits the advantages of DETR, eliminating the need for hand-crafted components for object detection and segmentation in the image plane. More importantly, RETR incorporates carefully designed modifications such as 1) depth-prioritized feature similarity via a tunable positional encoding (TPE); 2) a tri-plane loss from both radar and camera coordinates; and 3) a learnable radar-to-camera transformation via reparameterization, to account for the unique multi-view radar setting. Evaluated on two indoor radar perception datasets, our approach outperforms existing state-of-the-art methods by a margin of 15.38 AP for object detection and 11.91 IoU for instance segmentation, respectively.
ReTR: Modeling Rendering Via Transformer for Generalizable Neural Surface Reconstruction
Generalizable neural surface reconstruction techniques have attracted great attention in recent years. However, they encounter limitations of low confidence depth distribution and inaccurate surface reasoning due to the oversimplified volume rendering process employed. In this paper, we present Reconstruction TRansformer (ReTR), a novel framework that leverages the transformer architecture to redesign the rendering process, enabling complex render interaction modeling. It introduces a learnable \textit{meta-ray token} and utilizes the cross-attention mechanism to simulate the interaction of rendering process with sampled points and render the observed color. Meanwhile, by operating within a high-dimensional feature space rather than the color space, ReTR mitigates sensitivity to projected colors in source views.