AITopics | Gu, Hongjian

Collaborating Authors

Gu, Hongjian

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Mem2Ego: Empowering Vision-Language Models with Global-to-Ego Memory for Long-Horizon Embodied Navigation

Zhang, Lingfeng, Liu, Yuecheng, Zhang, Zhanguang, Aghaei, Matin, Hu, Yaochen, Gu, Hongjian, Alomrani, Mohammad Ali, Bravo, David Gamaliel Arcos, Karimi, Raika, Hamidizadeh, Atia, Xu, Haoping, Huang, Guowei, Zhang, Zhanpeng, Cao, Tongtong, Qiu, Weichao, Quan, Xingyue, Hao, Jianye, Zhuang, Yuzheng, Zhang, Yingxue

arXiv.org Artificial IntelligenceFeb-19-2025

Recent advancements in Large Language Models (LLMs) and Vision-Language Models (VLMs) have made them powerful tools in embodied navigation, enabling agents to leverage commonsense and spatial reasoning for efficient exploration in unfamiliar environments. Existing LLM-based approaches convert global memory, such as semantic or topological maps, into language descriptions to guide navigation. While this improves efficiency and reduces redundant exploration, the loss of geometric information in language-based representations hinders spatial reasoning, especially in intricate environments. To address this, VLM-based approaches directly process ego-centric visual inputs to select optimal directions for exploration. However, relying solely on a first-person perspective makes navigation a partially observed decision-making problem, leading to suboptimal decisions in complex environments. In this paper, we present a novel vision-language model (VLM)-based navigation framework that addresses these challenges by adaptively retrieving task-relevant cues from a global memory module and integrating them with the agent's egocentric observations. By dynamically aligning global contextual information with local perception, our approach enhances spatial reasoning and decision-making in long-horizon tasks. Experimental results demonstrate that the proposed method surpasses previous state-of-the-art approaches in object navigation tasks, providing a more effective and scalable solution for embodied navigation.

large language model, natural language, navigation, (18 more...)

arXiv.org Artificial Intelligence

2502.14254

Genre:

Overview (0.88)
Research Report > New Finding (0.48)

Industry: Health & Medicine > Consumer Health (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Path-of-Thoughts: Extracting and Following Paths for Robust Relational Reasoning with Large Language Models

Zhang, Ge, Alomrani, Mohammad Ali, Gu, Hongjian, Zhou, Jiaming, Hu, Yaochen, Wang, Bin, Liu, Qun, Coates, Mark, Zhang, Yingxue, Hao, Jianye

arXiv.org Artificial IntelligenceDec-23-2024

Large language models (LLMs) possess vast semantic knowledge but often struggle with complex reasoning tasks, particularly in relational reasoning problems such as kinship or spatial reasoning. In this paper, we present Path-of-Thoughts (PoT), a novel framework designed to tackle relation reasoning by decomposing the task into three key stages: graph extraction, path identification, and reasoning. Unlike previous approaches, PoT efficiently extracts a task-agnostic graph that identifies crucial entities, relations, and attributes within the problem context. Subsequently, PoT identifies relevant reasoning chains within the graph corresponding to the posed question, facilitating inference of potential answers. Experimental evaluations on four benchmark datasets, demanding long reasoning chains, demonstrate that PoT surpasses state-of-the-art baselines by a significant margin (maximum 21.3%) without necessitating fine-tuning or extensive LLM calls. Furthermore, as opposed to prior neuro-symbolic methods, PoT exhibits improved resilience against LLM errors by leveraging the compositional nature of graphs.

large language model, natural language, relation, (15 more...)

arXiv.org Artificial Intelligence

2412.17963

Country: North America > Canada > Quebec (0.27)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

ET-Plan-Bench: Embodied Task-level Planning Benchmark Towards Spatial-Temporal Cognition with Foundation Models

Zhang, Lingfeng, Wang, Yuening, Gu, Hongjian, Hamidizadeh, Atia, Zhang, Zhanguang, Liu, Yuecheng, Wang, Yutong, Bravo, David Gamaliel Arcos, Dong, Junyi, Zhou, Shunbo, Cao, Tongtong, Zhuang, Yuzheng, Zhang, Yingxue, Hao, Jianye

arXiv.org Artificial IntelligenceOct-2-2024

Recent advancements in Large Language Models (LLMs) have spurred numerous attempts to apply these technologies to embodied tasks, particularly focusing on high-level task planning and task decomposition. To further explore this area, we introduce a new embodied task planning benchmark, ET-Plan-Bench, which specifically targets embodied task planning using LLMs. It features a controllable and diverse set of embodied tasks varying in different levels of difficulties and complexities, and is designed to evaluate two critical dimensions of LLMs' application in embodied task understanding: spatial (relation constraint, occlusion for target objects) and temporal & causal understanding of the sequence of actions in the environment. By using multi-source simulators as the backend simulator, it can provide immediate environment feedback to LLMs, which enables LLMs to interact dynamically with the environment and re-plan as necessary. We evaluated the state-of-the-art open source and closed source foundation models, including GPT-4, LLAMA and Mistral on our proposed benchmark. While they perform adequately well on simple navigation tasks, their performance can significantly deteriorate when faced with tasks that require a deeper understanding of spatial, temporal, and causal relationships. Thus, our benchmark distinguishes itself as a large-scale, quantifiable, highly automated, and fine-grained diagnostic framework that presents a significant challenge to the latest foundation models. We hope it can spark and drive further research in embodied task planning using foundation models.

constraint, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2410.14682

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

Preference and Concurrence Aware Bayesian Graph Neural Networks for Recommender Systems

Gu, Hongjian, Hu, Yaochen, Zhang, Yingxue

arXiv.org Artificial IntelligenceJan-22-2024

Graph-based collaborative filtering methods have prevailing performance for recommender systems since they can capture high-order information between users and items, in which the graphs are constructed from the observed user-item interactions that might miss links or contain spurious positive interactions in industrial scenarios. The Bayesian Graph Neural Network framework approaches this issue with generative models for the interaction graphs. The critical problem is to devise a proper family of graph generative models tailored to recommender systems. We propose an efficient generative model that jointly considers the preferences of users, the concurrence of items and some important graph structure information. Experiments on four popular benchmark datasets demonstrate the effectiveness of our proposed graph generative methods for recommender systems.

artificial intelligence, graph, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2312.11486

Country: North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback