smg
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.67)
Focus On What Matters: Separated Models For Visual-Based RL Generalization
A primary challenge for visual-based Reinforcement Learning (RL) is to generalize effectively across unseen environments. Although previous studies have explored different auxiliary tasks to enhance generalization, few adopt image reconstruction due to concerns about exacerbating overfitting to task-irrelevant features during training. Perceiving the pre-eminence of image reconstruction in representation learning, we propose SMG (\blue{S}eparated \blue{M}odels for \blue{G}eneralization), a novel approach that exploits image reconstruction for generalization. SMG introduces two model branches to extract task-relevant and task-irrelevant representations separately from visual observations via cooperatively reconstruction. Built upon this architecture, we further emphasize the importance of task-relevant features for generalization. Specifically, SMG incorporates two additional consistency losses to guide the agent's focus toward task-relevant areas across different scenarios, thereby achieving free from overfitting. Extensive experiments in DMC demonstrate the SOTA performance of SMG in generalization, particularly excelling in video-background settings. Evaluations on robotic manipulation tasks further confirm the robustness of SMG in real-world applications.
MG-Nav: Dual-Scale Visual Navigation via Sparse Spatial Memory
Wang, Bo, Lin, Jiehong, Liu, Chenzhi, Hu, Xinting, Yu, Yifei, Liu, Tianjia, Wang, Zhongrui, Qi, Xiaojuan
We present MG-Nav (Memory-Guided Navigation), a dual-scale framework for zero-shot visual navigation that unifies global memory-guided planning with local geometry-enhanced control. At its core is the Sparse Spatial Memory Graph (SMG), a compact, region-centric memory where each node aggregates multi-view keyframe and object semantics, capturing both appearance and spatial structure while preserving viewpoint diversity. At the global level, the agent is localized on SMG and a goal-conditioned node path is planned via an image-to-instance hybrid retrieval, producing a sequence of reachable waypoints for long-horizon guidance. At the local level, a navigation foundation policy executes these waypoints in point-goal mode with obstacle-aware control, and switches to image-goal mode when navigating from the final node towards the visual target. To further enhance viewpoint alignment and goal recognition, we introduce VGGT-adapter, a lightweight geometric module built on the pre-trained VGGT model, which aligns observation and goal features in a shared 3D-aware space. MG-Nav operates global planning and local control at different frequencies, using periodic re-localization to correct errors. Experiments on HM3D Instance-Image-Goal and MP3D Image-Goal benchmarks demonstrate that MG-Nav achieves state-of-the-art zero-shot performance and remains robust under dynamic rearrangements and unseen scene conditions.
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.56)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.46)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.67)
Multi-Robot Navigation in Social Mini-Games: Definitions, Taxonomy, and Algorithms
Chandra, Rohan, Singh, Shubham, Luo, Wenhao, Sycara, Katia
The ``Last Mile Challenge'' has long been considered an important, yet unsolved, challenge for autonomous vehicles, public service robots, and delivery robots. A central issue in this challenge is the ability of robots to navigate constrained and cluttered environments that have high agency (e.g., doorways, hallways, corridor intersections), often while competing for space with other robots and humans. We refer to these environments as ``Social Mini-Games'' (SMGs). Traditional navigation approaches designed for MRN do not perform well in SMGs, which has led to focused research on dedicated SMG solvers. However, publications on SMG navigation research make different assumptions (on centralized versus decentralized, observability, communication, cooperation, etc.), and have different objective functions (safety versus liveness). These assumptions and objectives are sometimes implicitly assumed or described informally. This makes it difficult to establish appropriate baselines for comparison in research papers, as well as making it difficult for practitioners to find the papers relevant to their concrete application. Such ad-hoc representation of the field also presents a barrier to new researchers wanting to start research in this area. SMG navigation research requires its own taxonomy, definitions, and evaluation protocols to guide effective research moving forward. This survey is the first to catalog SMG solvers using a well-defined and unified taxonomy and to classify existing methods accordingly. It also discusses the essential properties of SMG solvers, defines what SMGs are and how they appear in practice, outlines how to evaluate SMG solvers, and highlights the differences between SMG solvers and general navigation systems. The survey concludes with an overview of future directions and open challenges in the field. Our project is open-sourced at https://socialminigames.github.io/.
- Overview (0.68)
- Research Report (0.50)
- Transportation (1.00)
- Leisure & Entertainment > Games (0.67)
- Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- (3 more...)
Focus On What Matters: Separated Models For Visual-Based RL Generalization
A primary challenge for visual-based Reinforcement Learning (RL) is to generalize effectively across unseen environments. Although previous studies have explored different auxiliary tasks to enhance generalization, few adopt image reconstruction due to concerns about exacerbating overfitting to task-irrelevant features during training. Perceiving the pre-eminence of image reconstruction in representation learning, we propose SMG (\blue{S}eparated \blue{M}odels for \blue{G}eneralization), a novel approach that exploits image reconstruction for generalization. SMG introduces two model branches to extract task-relevant and task-irrelevant representations separately from visual observations via cooperatively reconstruction. Built upon this architecture, we further emphasize the importance of task-relevant features for generalization.
Focus On What Matters: Separated Models For Visual-Based RL Generalization
Zhang, Di, Lv, Bowen, Zhang, Hai, Yang, Feifan, Zhao, Junqiao, Yu, Hang, Huang, Chang, Zhou, Hongtu, Ye, Chen, Jiang, Changjun
A primary challenge for visual-based Reinforcement Learning (RL) is to generalize effectively across unseen environments. Although previous studies have explored different auxiliary tasks to enhance generalization, few adopt image reconstruction due to concerns about exacerbating overfitting to task-irrelevant features during training. Perceiving the pre-eminence of image reconstruction in representation learning, we propose SMG (Separated Models for Generalization), a novel approach that exploits image reconstruction for generalization. SMG introduces two model branches to extract task-relevant and task-irrelevant representations separately from visual observations via cooperatively reconstruction. Built upon this architecture, we further emphasize the importance of task-relevant features for generalization. Specifically, SMG incorporates two additional consistency losses to guide the agent's focus toward task-relevant areas across different scenarios, thereby achieving free from overfitting. Extensive experiments in DMC demonstrate the SOTA performance of SMG in generalization, particularly excelling in video-background settings. Evaluations on robotic manipulation tasks further confirm the robustness of SMG in real-world applications. Source code is available at https://anonymous.4open.science/r/SMG/.
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.67)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Hybrid Robotic Grasping with a Soft Multimodal Gripper and a Deep Multistage Learning Scheme
Liu, Fukang, Sun, Fuchun, Fang, Bin, Li, Xiang, Sun, Songyu, Liu, Huaping
Grasping has long been considered an important and practical task in robotic manipulation. Yet achieving robust and efficient grasps of diverse objects is challenging, since it involves gripper design, perception, control and learning, etc. Recent learning-based approaches have shown excellent performance in grasping a variety of novel objects. However, these methods either are typically limited to one single grasping mode, or else more end effectors are needed to grasp various objects. In addition, gripper design and learning methods are commonly developed separately, which may not adequately explore the ability of a multimodal gripper. In this paper, we present a deep reinforcement learning (DRL) framework to achieve multistage hybrid robotic grasping with a new soft multimodal gripper. A soft gripper with three grasping modes (i.e., enveloping, sucking, and enveloping_then_sucking) can both deal with objects of different shapes and grasp more than one object simultaneously. We propose a novel hybrid grasping method integrated with the multimodal gripper to optimize the number of grasping actions. We evaluate the DRL framework under different scenarios (i.e., with different ratios of objects of two grasp types). The proposed algorithm is shown to reduce the number of grasping actions (i.e., enlarge the grasping efficiency, with maximum values of 161% in simulations and 154% in real-world experiments) compared to single grasping modes.
- North America > United States > California > Los Angeles County > Los Angeles (0.28)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
- Asia > China > Beijing > Beijing (0.04)
- (2 more...)
Seoul launches VR simulator to test autonomous driving
The Seoul Metropolitan Government (SMG) has announced it is building a pilot driving zone for autonomous cars. Forming part of the cooperative intelligent transport system (C-ITS) construction project, the virtual reality autonomous driving simulator will reflect road, traffic, and weather conditions by using digital twin technologies. According to SMG, by expanding the virtual territory to Gangnam and the city centre, it will enable Seoul to "leap forward" as a city of commercialised self-driving vehicles. The autonomous driving simulator will be open to the public, and anyone from companies to research institutes, start-ups, and universities can use it free of charge. SMG's rationale is the greater the numbers of developers who test the simulator the more opportunity there is to improve their technologies, and help the industry to further advance.
- Transportation > Ground > Road (1.00)
- Information Technology > Robotics & Automation (1.00)
A Look Inside the Evolution of Customer Experience
Customer experience has evolved in exciting ways in the past few years. Michael Iacobucci, CEO at Interactions Corporation, cites two major reasons for this industry-wide evolution: the increasingly critical need for better customer experiences as a brand differentiator, and the AI advancement to enable those experiences. According to a recent survey, 75% of customers say they will pay more to do business with a company that provides good CX. "Exceptional customer service is not optional for today's businesses," Iacobucci said. "In order to succeed, CX must be a top priority."