Chiu, Te-Chuan
Fine-Grained Alignment in Vision-and-Language Navigation through Bayesian Optimization
Song, Yuhang, Gianni, Mario, Yang, Chenguang, Lin, Kunyang, Chiu, Te-Chuan, Nguyen, Anh, Lee, Chun-Yi
This paper addresses the challenge of fine-grained alignment in Vision-and-Language Navigation (VLN) tasks, where robots navigate realistic 3D environments based on natural language instructions. Current approaches use contrastive learning to align language with visual trajectory sequences. Nevertheless, they encounter difficulties with fine-grained vision negatives. To enhance cross-modal embeddings, we introduce a novel Bayesian Optimization-based adversarial optimization framework for creating fine-grained contrastive vision samples. To validate the proposed methodology, we conduct a series of experiments to assess the effectiveness of the enriched embeddings on fine-grained vision negatives. We conduct experiments on two common VLN benchmarks R2R and REVERIE, experiments on the them demonstrate that these embeddings benefit navigation, and can lead to a promising performance enhancement. Our source code and trained models are available at: https://anonymous.4open.science/r/FGVLN.
Reducing Non-IID Effects in Federated Autonomous Driving with Contrastive Divergence Loss
Do, Tuong, Nguyen, Binh X., Nguyen, Hien, Tjiputra, Erman, Tran, Quang D., Chiu, Te-Chuan, Nguyen, Anh
Abstract-- Federated learning has been widely applied in autonomous driving since it enables training a learning model among vehicles without sharing users' data. In this paper, we propose a new contrastive divergence loss to address the non-IID problem in autonomous driving by reducing the impact of divergence factors from transmitted models during the local learning process of each silo. We also analyze the effects of contrastive divergence in various autonomous driving scenarios, under multiple network infrastructures, and with different centralized/distributed learning schemes. Autonomous driving is an emerging field that enables On the other hand, DFL does not require a server and uses vehicles to operate without a human driver by using a a fully distributed network. In autonomous driving, several combination of vision, learning, and control algorithms to works have explored both DFL and SFL to address different observe and respond to changes in the environment [1].