Goto

Collaborating Authors

 dcon


When Engineering Outruns Intelligence: Rethinking Instruction-Guided Navigation

arXiv.org Artificial Intelligence

Recent ObjectNav systems credit large language models (LLMs) for sizable zero-shot gains, yet it remains unclear how much comes from language versus geometry. We conduct a controlled study on HM3D and MP3D that revisits language-for-navigation through the lens of geometry-first exploration. Beyond ObjectNav, large foundation models are increasingly being employed in various other embodied tasks. ObjectNav asks an agent to reach any instance of a named object category (e.g., Find a At each time step, RGB-D and pose are fused into a 2D navigability map; free space vs. obstacles Islands will later serve as anchor sets for scoring or selection. InstructNav (Long et al., 2024) turns the instruction and When the named goal object is observed, InstructNav's LFG (Shah et al., 2023) is a complementary paradigm: instead of composing multiple value maps, it LFG does not assume open-vocabulary detectors or a VLM "intuition" map; its only learned SHF's prompt templates are included in Appendix B. All experiments run in Habitat (release 3) with default navigation mesh and physics (Puig et al., Success is declared when the goal object is visible and the agent is within 0.25 m.


InstructNav: Zero-shot System for Generic Instruction Navigation in Unexplored Environment

arXiv.org Artificial Intelligence

Enabling robots to navigate following diverse language instructions in unexplored environments is an attractive goal for human-robot interaction. However, this goal is challenging because different navigation tasks require different strategies. The scarcity of instruction navigation data hinders training an instruction navigation model with varied strategies. Therefore, previous methods are all constrained to one specific type of navigation instruction. In this work, we propose InstructNav, a generic instruction navigation system. InstructNav makes the first endeavor to handle various instruction navigation tasks without any navigation training or pre-built maps. To reach this goal, we introduce Dynamic Chain-of-Navigation (DCoN) to unify the planning process for different types of navigation instructions. Furthermore, we propose Multi-sourced Value Maps to model key elements in instruction navigation so that linguistic DCoN planning can be converted into robot actionable trajectories. With InstructNav, we complete the R2R-CE task in a zero-shot way for the first time and outperform many task-training methods. Besides, InstructNav also surpasses the previous SOTA method by 10.48% on the zero-shot Habitat ObjNav and by 86.34% on demand-driven navigation DDN. Real robot experiments on diverse indoor scenes further demonstrate our method's robustness in coping with the environment and instruction variations.


A Globally Convergent Algorithm for Neural Network Parameter Optimization Based on Difference-of-Convex Functions

arXiv.org Artificial Intelligence

We propose an algorithm for optimizing the parameters of single hidden layer neural networks. Specifically, we derive a blockwise difference-of-convex (DC) functions representation of the objective function. Based on the latter, we propose a block coordinate descent (BCD) approach that we combine with a tailored difference-of-convex functions algorithm (DCA). We prove global convergence of the proposed algorithm. Furthermore, we mathematically analyze the convergence rate of parameters and the convergence rate in value (i.e., the training loss). We give conditions under which our algorithm converges linearly or even faster depending on the local shape of the loss function. We confirm our theoretical derivations numerically and compare our algorithm against state-of-the-art gradient-based solvers in terms of both training loss and test loss.


A Semantic Infrastructure for Personalizable Context-Aware Environments

AI Magazine

Although a number of initiatives provide personalized context-aware guidance for niche use cases, a standard framework for context awareness remains lacking. This article explains how semantic technology has been exploited to generate a centralized repository of personal activity context. This data drives advanced features such as personal situation recognition and customizable rules for the context-sensitive management of personal devices and data sharing. As a proof of concept, we demonstrate how an innovative context-aware system has successfully adopted such an infrastructure. By treating these devices as part of a personal sensor network and analyzing the generated information collectively, valuable context information can be gathered and interpreted in an endless number of scenarios.


DCON: Interoperable Context Representation for Pervasive Environments

AAAI Conferences

Efforts by the pervasive, context-aware system development community have over the years produced a wide variety of context-aware techniques and frameworks. However, a bulk of this technology tends to be strictly tied to a native system, thus largely limiting its external adoption. In addressing this limitation, we introduce an interoperable context representation format, in the form of an ontology, which models core context-aware concepts for re-use within pervasive computing environments. The DCON Context Ontology is proposed as a novel vocabulary for the representation of activity context as experienced by a user, and sensed through one or more of their devices. We demonstrate how, combined with other domain ontologies, DCON provides for richer representations of multi-level context interpretations that are integrated with other known background information about a user.