Goto

Collaborating Authors

 xiang fang


Hierarchical Semantic-Augmented Navigation: Optimal Transport and Graph-Driven Reasoning for Vision-Language Navigation

Neural Information Processing Systems

Vision-Language Navigation in Continuous Environments (VLN-CE) poses a formidable challenge for autonomous agents, requiring seamless integration of natural language instructions and visual observations to navigate complex 3D indoor spaces. Existing approaches often falter in long-horizon tasks due to limited scene understanding, inefficient planning, and lack of robust decision-making frameworks. We introduce the Hierarchical Semantic-Augmented Navigation (HSAN) framework, a groundbreaking approach that redefines VLN-CE through three synergistic innovations. First, HSAN constructs a dynamic hierarchical semantic scene graph, leveraging vision-language models to capture multi-level environmental representations--from objects to regions to zones--enabling nuanced spatial reasoning. Second, it employs an optimal transport-based topological planner, grounded in Kantorovich's duality, to select long-term goals by balancing semantic relevance and spatial accessibility with theoretical guarantees of optimality. Third, a graph-aware reinforcement learning policy ensures precise low-level control, navigating subgoals while robustly avoiding obstacles. By integrating spectral graph theory, optimal transport, and advanced multi-modal learning, HSAN addresses the shortcomings of static maps and heuristic planners prevalent in prior work. Extensive experiments on multiple challenging VLN-CE datasets demonstrate that HSAN achieves state-of-the-art performance, with significant improvements in navigation success and generalization to unseen environments.


#AAAI2026 social media round up: part 2

AIHub

The 40th AAAI Conference on Artificial Intelligence took place in Singapore from 20-27 January, the first time that the event has been held outside of North America. In our first social media round up we had a peak at the first half of the conference which hosted the tutorials, the bridge programme, and the doctoral and undergraduate consortia, as well as the start of the technical programme. Now, we pick some highlights from the second half, which saw a number of invited talks, technical sessions, posters, and the workshops. Do VLMs actually'see' or just rely on priors? He showed how models fail to count stripes on a shoe simply because they recognize the'Adidas' logo and hallucinate the standard 3 stripes.





Interview with Xiang Fang: Multi-modal learning and embodied intelligence

AIHub

His research focuses on multi-modal learning, specifically advancing large vision-language models, embodied intelligence, and out-of-distribution detection. Xiang has published over 40 papers in top-tier venues, including CVPR, NeurIPS, ICML, AAAI, and ACM MM. He is the recipient of multiple awards, including the NTU Research Excellence Award and Best Student Paper at MIPR 2024, and serves as a reviewer for major AI conferences."


Learning from Few Samples: A Novel Approach for High-Quality Malcode Generation

arXiv.org Artificial Intelligence

Intrusion Detection Systems (IDS) play a crucial role in network security defense. However, a significant challenge for IDS in training detection models is the shortage of adequately labeled malicious samples. To address these issues, this paper introduces a novel semi-supervised framework \textbf{GANGRL-LLM}, which integrates Generative Adversarial Networks (GANs) with Large Language Models (LLMs) to enhance malicious code generation and SQL Injection (SQLi) detection capabilities in few-sample learning scenarios. Specifically, our framework adopts a collaborative training paradigm where: (1) the GAN-based discriminator improves malicious pattern recognition through adversarial learning with generated samples and limited real samples; and (2) the LLM-based generator refines the quality of malicious code synthesis using reward signals from the discriminator. The experimental results demonstrate that even with a limited number of labeled samples, our training framework is highly effective in enhancing both malicious code generation and detection capabilities. This dual enhancement capability offers a promising solution for developing adaptive defense systems capable of countering evolving cyber threats.