navigate
Creating Multi-Level Skill Hierarchies in Reinforcement Learning S
They had four primitive actions: north, south, east, and west. Multi-Floor Office is an extension of Office to multiple floors. Pick-up and put-down have the intended effect when appropriate; otherwise they do not change the state. T owers of Hanoi contains four discs of different sizes, placed on three poles. Options generated using alternative methods called primitive actions directly.
Paper: Generalization of Reinforcement Learners with Working and Episodic Memory
We thank the reviewers for their thoughtful and constructive feedback on our manuscript. This should help both contextualize each task's difficulty and illustrate what it involves. Reviewer 3 noted the Section 2 task descriptions could be better presented. We have reformatted it so that "the order We also changed our description of IMP ALA to match Reviewer 5's suggestion. Regarding the task suite, Reviewer 4 raised a thoughtful consideration on whether "most of the findings translate when Some 3D tasks in the suite already have '2D-like' semi-counterparts that do not require navigation, '2D-like' because everything is fully observable and the agent has a first-person point of view from a fixed point, without Spot the Difference level, was overall harder than Change Detection for our ablation models.
- North America > United States (0.04)
- North America > Dominican Republic (0.04)
- Asia > China > Hong Kong (0.04)
Learning Active Camera for Multi-Object Navigation
Getting robots to navigate to multiple objects autonomously is essential yet difficult in robot applications. One of the key challenges is how to explore environments efficiently with camera sensors only. Existing navigation methods mainly focus on fixed cameras and few attempts have been made to navigate with active cameras. As a result, the agent may take a very long time to perceive the environment due to limited camera scope. In contrast, humans typically gain a larger field of view by looking around for a better perception of the environment. How to make robots perceive the environment as efficiently as humans is a fundamental problem in robotics.
Frequency-Enhanced Data Augmentation for Vision-and-Language Navigation
Vision-and-Language Navigation (VLN) is a challenging task that requires an agent to navigate through complex environments based on natural language instructions. In contrast to conventional approaches, which primarily focus on the spatial domain exploration, we propose a paradigm shift toward the Fourier domain. This alternative perspective aims to enhance visual-textual matching, ultimately improving the agent's ability to understand and execute navigation tasks based on the given instructions. In this study, we first explore the significance of high-frequency information in VLN and provide evidence that it is instrumental in bolstering visual-textual matching processes. Building upon this insight, we further propose a sophisticated and versatile Frequency-enhanced Data Augmentation (FDA) technique to improve the VLN model's capability of capturing critical high-frequency information. Specifically, this approach requires the agent to navigate in environments where only a subset of high-frequency visual information corresponds with the provided textual instructions, ultimately fostering the agent's ability to selectively discern and capture pertinent high-frequency features according to the given instructions. Promising results on R2R, RxR, CVDN and REVERIE demonstrate that our FDA can be readily integrated with existing VLN approaches, improving performance without adding extra parameters, and keeping models simple and efficient. The code is available at https://github.com/hekj/FDA.
See-Control: A Multimodal Agent Framework for Smartphone Interaction with a Robotic Arm
Zhao, Haoyu, Ding, Weizhong, Yang, Yuhao, Tian, Zheng, Yang, Linyi, Shao, Kun, Wang, Jun
Recent advances in Multimodal Large Language Models (MLLMs) have enabled their use as intelligent agents for smartphone operation. However, existing methods depend on the Android Debug Bridge (ADB) for data transmission and action execution, limiting their applicability to Android devices. In this work, we introduce the novel Embodied Smartphone Operation (ESO) task and present See-Control, a framework that enables smartphone operation via direct physical interaction with a low-DoF robotic arm, offering a platform-agnostic solution. See-Control comprises three key components: (1) an ESO benchmark with 155 tasks and corresponding evaluation metrics; (2) an MLLM-based embodied agent that generates robotic control commands without requiring ADB or system back-end access; and (3) a richly annotated dataset of operation episodes, offering valuable resources for future research. By bridging the gap between digital agents and the physical world, See-Control provides a concrete step toward enabling home robots to perform smartphone-dependent tasks in realistic environments.
- Europe > United Kingdom > England > Greater London > London (0.50)
- Europe > Spain (0.04)
- Europe > Italy > Veneto > Venice (0.04)
- (5 more...)
- Information Technology > Services (1.00)
- Consumer Products & Services (1.00)
- Leisure & Entertainment > Sports (0.93)
- Media > Music (0.68)
- Information Technology > Communications > Mobile (1.00)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Australia's beloved weather website got a makeover - and infuriated users
Australia's beloved weather website got a makeover - and infuriated users It was an unseasonably warm spring day in Sydney on 22 October, with a forecast of 39C (99F) - a real scorcher. The day before, the state of New South Wales had reported its hottest day in over a century, a high of 44.8C in the outback town of Bourke. But little did the team at the national Bureau of Meteorology foresee that they, in particular, would soon be feeling the heat. Affectionately known by Australians as the Bom, the agency's long-awaited website redesign went live that morning, more than a decade after the last update. Within hours, the Bom was flooded with a deluge of complaints.
- North America > United States (0.29)
- Oceania > Australia > New South Wales (0.25)
- North America > Central America (0.14)
- (16 more...)
- Leisure & Entertainment (0.71)
- Information Technology > Security & Privacy (0.48)
- Government > Regional Government (0.47)
SweeperBot: Making 3D Browsing Accessible through View Analysis and Visual Question Answering
Chen, Chen, Nguyen, Cuong, Siu, Alexa, Li, Dingzeyu, Weibel, Nadir
Accessing 3D models remains challenging for Screen Reader (SR) users. While some existing 3D viewers allow creators to provide alternative text, they often lack sufficient detail about the 3D models. Grounded on a formative study, this paper introduces SweeperBot, a system that enables SR users to leverage visual question answering to explore and compare 3D models. SweeperBot answers SR users' visual questions by combining an optimal view selection technique with the strength of generative- and recognition-based foundation models. An expert review with 10 Blind and Low-Vision (BLV) users with SR experience demonstrated the feasibility of using SweeperBot to assist BLV users in exploring and comparing 3D models. The quality of the descriptions generated by SweeperBot was validated by a second survey study with 30 sighted participants.
- North America > United States > New York > New York County > New York City (0.15)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > California > San Diego County > San Diego (0.04)
- (20 more...)
- Research Report > New Finding (1.00)
- Questionnaire & Opinion Survey (1.00)
- Overview (0.87)
- Research Report > Experimental Study (0.67)
- Information Technology > Services (0.67)
- Health & Medicine > Therapeutic Area (0.46)
- North America > United States > New York (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Communications (0.97)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.95)
Learning to Navigate in Cities Without a Map
Piotr Mirowski, Matt Grimes, Mateusz Malinowski, Karl Moritz Hermann, Keith Anderson, Denis Teplyashin, Karen Simonyan, koray kavukcuoglu, Andrew Zisserman, Raia Hadsell
The majority of algorithms involve building an explicit map during an exploration phase and then planning and acting via that representation. In this work, we are interested in pushing the limits of end-to-end deep reinforcement learning for navigation by proposing new methods and demonstrating their performance in large-scale, real-world environments.
- North America > United States > New York > New York County > New York City (0.05)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)