AITopics | Yap, Kim-Hui

Collaborating Authors

Yap, Kim-Hui

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

CL-HOI: Cross-Level Human-Object Interaction Distillation from Vision Large Language Models

Gao, Jianjun, Cai, Chen, Wang, Ruoyu, Liu, Wenyang, Yap, Kim-Hui, Garg, Kratika, Han, Boon-Siew

arXiv.org Artificial IntelligenceOct-21-2024

Human-object interaction (HOI) detection has seen advancements with Vision Language Models (VLMs), but these methods often depend on extensive manual annotations. Vision Large Language Models (VLLMs) can inherently recognize and reason about interactions at the image level but are computationally heavy and not designed for instance-level HOI detection. To overcome these limitations, we propose a Cross-Level HOI distillation (CL-HOI) framework, which distills instance-level HOIs from VLLMs image-level understanding without the need for manual annotations. Our approach involves two stages: context distillation, where a Visual Linguistic Translator (VLT) converts visual information into linguistic form, and interaction distillation, where an Interaction Cognition Network (ICN) reasons about spatial, visual, and context relations. We design contrastive distillation losses to transfer image-level context and interaction knowledge from the teacher to the student model, enabling instance-level HOI detection. Evaluations on HICO-DET and V-COCO datasets demonstrate that our CL-HOI surpasses existing weakly supervised methods and VLLM supervised methods, showing its efficacy in detecting HOIs without manual labels.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.15657

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Sports (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Empowering Large Language Model for Continual Video Question Answering with Collaborative Prompting

Cai, Chen, Wang, Zheng, Gao, Jianjun, Liu, Wenyang, Lu, Ye, Zhang, Runzhong, Yap, Kim-Hui

arXiv.org Artificial IntelligenceOct-1-2024

In recent years, the rapid increase in online video content has underscored the limitations of static Video Question Answering (VideoQA) models trained on fixed datasets, as they struggle to adapt to new questions or tasks posed by newly available content. In this paper, we explore the novel challenge of VideoQA within a continual learning framework, and empirically identify a critical issue: fine-tuning a large language model (LLM) for a sequence of tasks often results in catastrophic forgetting. To address this, we propose Collaborative Prompting (ColPro), which integrates specific question constraint prompting, knowledge acquisition prompting, and visual temporal awareness prompting. These prompts aim to capture textual question context, visual content, and video temporal dynamics in VideoQA, a perspective underexplored in prior research. Experimental results on the NExT-QA and DramaQA datasets show that ColPro achieves superior performance compared to existing approaches, achieving 55.14\% accuracy on NExT-QA and 71.24\% accuracy on DramaQA, highlighting its practical relevance and effectiveness.

large language model, natural language, question answering, (18 more...)

arXiv.org Artificial Intelligence

2410.00771

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.72)

Add feedback

CM2-Net: Continual Cross-Modal Mapping Network for Driver Action Recognition

Wang, Ruoyu, Cai, Chen, Wang, Wenqian, Gao, Jianjun, Lin, Dan, Liu, Wenyang, Yap, Kim-Hui

arXiv.org Artificial IntelligenceJun-18-2024

Driver action recognition has significantly advanced in enhancing driver-vehicle interactions and ensuring driving safety by integrating multiple modalities, such as infrared and depth. Nevertheless, compared to RGB modality only, it is always laborious and costly to collect extensive data for all types of non-RGB modalities in car cabin environments. Therefore, previous works have suggested independently learning each non-RGB modality by fine-tuning a model pre-trained on RGB videos, but these methods are less effective in extracting informative features when faced with newly-incoming modalities due to large domain gaps. In contrast, we propose a Continual Cross-Modal Mapping Network (CM2-Net) to continually learn each newly-incoming modality with instructive prompts from the previously-learned modalities. Specifically, we have developed Accumulative Cross-modal Mapping Prompting (ACMP), to map the discriminative and informative features learned from previous modalities into the feature space of newly-incoming modalities. Then, when faced with newly-incoming modalities, these mapped features are able to provide effective prompts for which features should be extracted and prioritized. These prompts are accumulating throughout the continual learning process, thereby boosting further recognition performances. Extensive experiments conducted on the Drive&Act dataset demonstrate the performance superiority of CM2-Net on both uni- and multi-modal driver action recognition.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2406.1134

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Video sentence grounding with temporally global textual knowledge

Chen, Cai, Zhang, Runzhong, Gao, Jianjun, Wu, Kejun, Yap, Kim-Hui, Wang, Yi

arXiv.org Artificial IntelligenceJun-1-2024

Temporal sentence grounding involves the retrieval of a video moment with a natural language query. Many existing works directly incorporate the given video and temporally localized query for temporal grounding, overlooking the inherent domain gap between different modalities. In this paper, we utilize pseudo-query features containing extensive temporally global textual knowledge sourced from the same video-query pair, to enhance the bridging of domain gaps and attain a heightened level of similarity between multi-modal features. Specifically, we propose a Pseudo-query Intermediary Network (PIN) to achieve an improved alignment of visual and comprehensive pseudo-query features within the feature space through contrastive learning. Subsequently, we utilize learnable prompts to encapsulate the knowledge of pseudo-queries, propagating them into the textual encoder and multi-modal fusion module, further enhancing the feature alignment between visual and language for better temporal grounding. Extensive experiments conducted on the Charades-STA and ActivityNet-Captions datasets demonstrate the effectiveness of our method.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2404.13611

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

OccluTrack: Rethinking Awareness of Occlusion for Enhancing Multiple Pedestrian Tracking

Gao, Jianjun, Wang, Yi, Yap, Kim-Hui, Garg, Kratika, Han, Boon Siew

arXiv.org Artificial IntelligenceSep-19-2023

Multiple pedestrian tracking faces the challenge of tracking pedestrians in the presence of occlusion. Existing methods suffer from inaccurate motion estimation, appearance feature extraction, and association due to occlusion, leading to inadequate Identification F1-Score (IDF1), excessive ID switches (IDSw), and insufficient association accuracy and recall (AssA and AssR). We found that the main reason is abnormal detections caused by partial occlusion. In this paper, we suggest that the key insight is explicit motion estimation, reliable appearance features, and fair association in occlusion scenes. Specifically, we propose an adaptive occlusion-aware multiple pedestrian tracker, OccluTrack. We first introduce an abnormal motion suppression mechanism into the Kalman Filter to adaptively detect and suppress outlier motions caused by partial occlusion. Second, we propose a pose-guided re-ID module to extract discriminative part features for partially occluded pedestrians. Last, we design a new occlusion-aware association method towards fair IoU and appearance embedding distance measurement for occluded pedestrians. Extensive evaluation results demonstrate that our OccluTrack outperforms state-of-the-art methods on MOT-Challenge datasets. Particularly, the improvements on IDF1, IDSw, AssA, and AssR demonstrate the effectiveness of our OccluTrack on tracking and association performance.

artificial intelligence, machine learning, occlusion, (17 more...)

arXiv.org Artificial Intelligence

2309.1036

Country: Asia > Middle East > Israel (0.14)

Genre:

Research Report > New Finding (0.34)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Empirical Analysis of Overfitting and Mode Drop in GAN Training

Yazici, Yasin, Foo, Chuan-Sheng, Winkler, Stefan, Yap, Kim-Hui, Chandrasekhar, Vijay

arXiv.org Machine LearningJun-25-2020

We examine two key questions in GAN training, namely overfitting and mode drop, from an empirical perspective. We show that when stochasticity is removed from the training procedure, GANs can overfit and exhibit almost no mode drop. Our results shed light on important characteristics of the GAN training procedure. They also provide evidence against prevailing intuitions that GANs do not memorize the training set, and that mode dropping is mainly due to properties of the GAN objective rather than how it is optimized during training.

artificial intelligence, neural network, stochasticity, (18 more...)

arXiv.org Machine Learning

2006.14265

Country: Asia > Middle East > UAE (0.14)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)

Add feedback

Venn GAN: Discovering Commonalities and Particularities of Multiple Distributions

Yazıcı, Yasin, Lecouat, Bruno, Foo, Chuan-Sheng, Winkler, Stefan, Yap, Kim-Hui, Piliouras, Georgios, Chandrasekhar, Vijay

arXiv.org Machine LearningFeb-9-2019

We propose a GAN design which models multiple distributions effectively and discovers their commonalities and particularities. Each data distribution is modeled with a mixture of $K$ generator distributions. As the generators are partially shared between the modeling of different true data distributions, shared ones captures the commonality of the distributions, while non-shared ones capture unique aspects of them. We show the effectiveness of our method on various datasets (MNIST, Fashion MNIST, CIFAR-10, Omniglot, CelebA) with compelling results.

artificial intelligence, generator, machine learning, (18 more...)

arXiv.org Machine Learning

1902.03444

Country:

Asia > Singapore (0.15)
North America (0.14)

Genre: Research Report (0.52)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)

Add feedback

The Unusual Effectiveness of Averaging in GAN Training

Yazıcı, Yasin, Foo, Chuan-Sheng, Winkler, Stefan, Yap, Kim-Hui, Piliouras, Georgios, Chandrasekhar, Vijay

arXiv.org Machine LearningJun-12-2018

We show empirically that the optimal strategy of parameter averaging in a minmax convex-concave game setting is also strikingly effective in the non convex-concave GAN setting, specifically alleviating the convergence issues associated with cycling behavior observed in GANs. We show that averaging over generator parameters outside of the trainig loop consistently improves inception and FID scores on different architectures and for different GAN objectives. We provide comprehensive experimental results across a range of datasets, bilinear games, mixture of Gaussians, CIFAR-10, STL-10, CelebA and ImageNet, to demonstrate its effectiveness. We achieve state-of-the-art results on CIFAR-10 and produce clean CelebA face images, demonstrating that averaging is one of the most effective techniques for training highly performant GANs.

artificial intelligence, iteration, neural network, (17 more...)

arXiv.org Machine Learning

1806.04498

Country: North America > United States (0.46)

Genre: Research Report (0.51)

Industry: Leisure & Entertainment > Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Vision (0.66)

Add feedback