AITopics | Kot, Alex

Collaborating Authors

Kot, Alex

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

LOBG:Less Overfitting for Better Generalization in Vision-Language Model

Ding, Chenhao, Gao, Xinyuan, Dong, Songlin, He, Yuhang, Wang, Qiang, Kot, Alex, Gong, Yihong

arXiv.org Artificial IntelligenceOct-27-2024

Existing prompt learning methods in Vision-Language Models (VLM) have effectively enhanced the transfer capability of VLM to downstream tasks, but they suffer from a significant decline in generalization due to severe overfitting. To address this issue, we propose a framework named LOBG for vision-language models. Specifically, we use CLIP to filter out fine-grained foreground information that might cause overfitting, thereby guiding prompts with basic visual concepts. To further mitigate overfitting, we devel oped a structural topology preservation (STP) loss at the feature level, which endows the feature space with overall plasticity, allowing effective reshaping of the feature space during optimization. Additionally, we employed hierarchical logit distilation (HLD) at the output level to constrain outputs, complementing STP at the output end. Extensive experimental results demonstrate that our method significantly improves generalization capability and alleviates overfitting compared to state-of-the-art approaches.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2410.10247

Country: Europe > Switzerland (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Audio-Visual Deception Detection: DOLOS Dataset and Parameter-Efficient Crossmodal Learning

Guo, Xiaobao, Selvaraj, Nithish Muthuchamy, Yu, Zitong, Kong, Adams Wai-Kin, Shen, Bingquan, Kot, Alex

arXiv.org Artificial IntelligenceAug-3-2023

Deception detection in conversations is a challenging yet important task, having pivotal applications in many fields such as credibility assessment in business, multimedia anti-frauds, and custom security. Despite this, deception detection research is hindered by the lack of high-quality deception datasets, as well as the difficulties of learning multimodal features effectively. To address this issue, we introduce DOLOS\footnote {The name ``DOLOS" comes from Greek mythology.}, the largest gameshow deception detection dataset with rich deceptive conversations. DOLOS includes 1,675 video clips featuring 213 subjects, and it has been labeled with audio-visual feature annotations. We provide train-test, duration, and gender protocols to investigate the impact of different factors. We benchmark our dataset on previously proposed deception detection approaches. To further improve the performance by fine-tuning fewer parameters, we propose Parameter-Efficient Crossmodal Learning (PECL), where a Uniform Temporal Adapter (UT-Adapter) explores temporal attention in transformer-based architectures, and a crossmodal fusion module, Plug-in Audio-Visual Fusion (PAVF), combines crossmodal information from audio-visual features. Based on the rich fine-grained audio-visual annotations on DOLOS, we also exploit multi-task learning to enhance performance by concurrently predicting deception and audio-visual features. Experimental results demonstrate the desired quality of the DOLOS dataset and the effectiveness of the PECL. The DOLOS dataset and the source codes are available at https://github.com/NMS05/Audio-Visual-Deception-Detection-DOLOS-Dataset-and-Parameter-Efficient-Crossmodal-Learning/tree/main.

artificial intelligence, image understanding, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2303.12745

Country: Europe (0.28)

Genre: Research Report > New Finding (0.34)

Industry:

Health & Medicine (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.75)

Add feedback

Feature Distillation With Guided Adversarial Contrastive Learning

Bai, Tao, Chen, Jinnan, Zhao, Jun, Wen, Bihan, Jiang, Xudong, Kot, Alex

arXiv.org Machine LearningSep-21-2020

Deep learning models are shown to be vulnerable to adversarial examples. Though adversarial training can enhance model robustness, typical approaches are computationally expensive. Recent works proposed to transfer the robustness to adversarial attacks across different tasks or models with soft labels.Compared to soft labels, feature contains rich semantic information and holds the potential to be applied to different downstream tasks. In this paper, we propose a novel approach called Guided Adversarial Contrastive Distillation (GACD), to effectively transfer adversarial robustness from teacher to student with features. We first formulate this objective as contrastive learning and connect it with mutual information. With a well-trained teacher model as an anchor, students are expected to extract features similar to the teacher. Then considering the potential errors made by teachers, we propose sample reweighted estimation to eliminate the negative effects from teachers. With GACD, the student not only learns to extract robust features, but also captures structural knowledge from the teacher. By extensive experiments evaluating over popular datasets such as CIFAR-10, CIFAR-100 and STL-10, we demonstrate that our approach can effectively transfer robustness across different models and even different tasks, and achieve comparable or better results than existing methods. Besides, we provide a detailed analysis of various methods, showing that students produced by our approach capture more structural knowledge from teachers and learn more robust features under adversarial attacks.

deep learning, distillation, neural network, (16 more...)

arXiv.org Machine Learning

2009.09922

Country: North America > United States > California (0.14)

Genre: Research Report (1.00)

Industry:

Education (1.00)
Information Technology > Security & Privacy (0.55)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback