AITopics | Cho, Jae Won

Collaborating Authors

Cho, Jae Won

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality

Oh, Youngtaek, Cho, Jae Won, Kim, Dong-Jin, Kweon, In So, Kim, Junmo

arXiv.org Artificial IntelligenceOct-7-2024

In this paper, we propose a new method to enhance compositional understanding in pre-trained vision and language models (VLMs) without sacrificing performance in zero-shot multi-modal tasks. Traditional fine-tuning approaches often improve compositional reasoning at the cost of degrading multi-modal capabilities, primarily due to the use of global hard negative (HN) loss, which contrasts global representations of images and texts. This global HN loss pushes HN texts that are highly similar to the original ones, damaging the model's multi-modal representations. To overcome this limitation, we propose Fine-grained Selective Calibrated CLIP (FSC-CLIP), which integrates local hard negative loss and selective calibrated regularization. These innovations provide fine-grained negative supervision while preserving the model's representational integrity. Our extensive evaluations across diverse benchmarks for both compositionality and multi-modal tasks show that FSC-CLIP not only achieves compositionality on par with state-of-the-art models but also retains strong multi-modal capabilities. Code is available at: https://github.com/ytaek-oh/fsc-clip.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.0521

Country: Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Generative Bias for Robust Visual Question Answering

Cho, Jae Won, Kim, Dong-jin, Ryu, Hyeonggon, Kweon, In So

arXiv.org Artificial IntelligenceMar-22-2023

The task of Visual Question Answering (VQA) is known to be plagued by the issue of VQA models exploiting biases within the dataset to make its final prediction. Various previous ensemble based debiasing methods have been proposed where an additional model is purposefully trained to be biased in order to train a robust target model. However, these methods compute the bias for a model simply from the label statistics of the training data or from single modal branches. In this work, in order to better learn the bias a target VQA model suffers from, we propose a generative method to train the bias model directly from the target model, called GenB. In particular, GenB employs a generative network to learn the bias in the target model through a combination of the adversarial objective and knowledge distillation. We then debias our target model with GenB as a bias model, and show through extensive experiments the effects of our method on various VQA bias datasets including VQA-CP2, VQA-CP1, GQA-OOD, and VQA-CE, and show state-of-the-art results with the LXMERT architecture on VQA-CP2.

machine learning, question answering, target model, (20 more...)

arXiv.org Artificial Intelligence

2208.0069

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Single-Modal Entropy based Active Learning for Visual Question Answering

Kim, Dong-Jin, Cho, Jae Won, Choi, Jinsoo, Jung, Yunjae, Kweon, In So

arXiv.org Artificial IntelligenceOct-21-2021

Constructing a large-scale labeled dataset in the real world, especially for high-level tasks (eg, Visual Question Answering), can be expensive and time-consuming. In addition, with the ever-growing amounts of data and architecture complexity, Active Learning has become an important aspect of computer vision research. In this work, we address Active Learning in the multi-modal setting of Visual Question Answering (VQA). In light of the multi-modal inputs, image and question, we propose a novel method for effective sample acquisition through the use of ad hoc single-modal branches for each input to leverage its information. Our mutual information based sample acquisition strategy Single-Modal Entropic Measure (SMEM) in addition to our self-distillation technique enables the sample acquisitor to exploit all present modalities and find the most informative samples. Our novel idea is simple to implement, cost-efficient, and readily adaptable to other multi-modal tasks. We confirm our findings on various VQA datasets through state-of-the-art performance by comparing to existing Active Learning baselines.

machine learning, question answering, teaching method, (18 more...)

arXiv.org Artificial Intelligence

2110.10906

Country:

Asia (0.28)
North America > United States > Wisconsin (0.14)

Genre: Research Report > Promising Solution (0.86)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)

Add feedback

Correlate-and-Excite: Real-Time Stereo Matching via Guided Cost Volume Excitation

Bangunharcana, Antyanta, Cho, Jae Won, Lee, Seokju, Kweon, In So, Kim, Kyung-Soo, Kim, Soohyun

arXiv.org Artificial IntelligenceAug-12-2021

Volumetric deep learning approach towards stereo matching aggregates a cost volume computed from input left and right images using 3D convolutions. Recent works showed that utilization of extracted image features and a spatially varying cost volume aggregation complements 3D convolutions. However, existing methods with spatially varying operations are complex, cost considerable computation time, and cause memory consumption to increase. In this work, we construct Guided Cost volume Excitation (GCE) and show that simple channel excitation of cost volume guided by image can improve performance considerably. Moreover, we propose a novel method of using top-k selection prior to soft-argmin disparity regression for computing the final disparity estimate. Combining our novel contributions, we present an end-to-end network that we call Correlate-and-Excite (CoEx). Extensive experiments of our model on the SceneFlow, KITTI 2012, and KITTI 2015 datasets demonstrate the effectiveness and efficiency of our model and show that our model outperforms other speed-based algorithms while also being competitive to other state-of-the-art algorithms. Codes will be made available at https://github.com/antabangun/coex.

deep learning, neural network, regression, (19 more...)

arXiv.org Artificial Intelligence

2108.05773

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MCDAL: Maximum Classifier Discrepancy for Active Learning

Cho, Jae Won, Kim, Dong-Jin, Jung, Yunjae, Kweon, In So

arXiv.org Artificial IntelligenceJul-23-2021

Recent state-of-the-art active learning methods have mostly leveraged Generative Adversarial Networks (GAN) for sample acquisition; however, GAN is usually known to suffer from instability and sensitivity to hyper-parameters. In contrast to these methods, we propose in this paper a novel active learning framework that we call Maximum Classifier Discrepancy for Active Learning (MCDAL) which takes the prediction discrepancies between multiple classifiers. In particular, we utilize two auxiliary classification layers that learn tighter decision boundaries by maximizing the discrepancies among them. Intuitively, the discrepancies in the auxiliary classification layers' predictions indicate the uncertainty in the prediction. In this regard, we propose a novel method to leverage the classifier discrepancies for the acquisition function for active learning. We also provide an interpretation of our idea in relation to existing GAN based active learning methods and domain adaptation frameworks. Moreover, we empirically demonstrate the utility of our approach where the performance of our approach exceeds the state-of-the-art methods on several image classification and semantic segmentation datasets in active learning setups.

deep learning, discrepancy, neural network, (17 more...)

arXiv.org Artificial Intelligence

2107.11049

Country: Asia (0.14)

Genre: Research Report > Promising Solution (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback