AITopics | multi-label image classification

Collaborating Authors

multi-label image classification

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

9bcd0bdb2777fe8c729b682f07e993f1-Supplemental-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsFeb-11-2026, 00:16:40 GMT

MIRcontains25uniquelabels,andweremoved the label "night" as it is not in the label set of any MLAPIs. For each instance in those datasets, we have evaluated the prediction from the mainstream ML APIs from 2020 to 2022. HAPI was collected from 2020 to 2022. For classification tasks, the predictions/annotations of each API were collected in the spring of 2020, 2021, and 2022. Theoriginal IMDB dataset hasbeenpartitioned into training and testing splits, and thus we used its testing split, including 25,000 textparagraphs.

artificial intelligence, dataset, machine learning, (16 more...)

Neural Information Processing Systems

Industry: Information Technology (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Dual-View Alignment Learning with Hierarchical-Prompt for Class-Imbalance Multi-Label Classification

Huang, Sheng, Yan, Jiexuan, Liu, Beiyan, Liu, Bo, Hong, Richang

arXiv.org Artificial IntelligenceSep-23-2025

This is especially challenging in Class-Imbalanced Multi-Label Image Classification (CI-MLIC) tasks, where data imbalance and multi-object recognition present significant obstacles. T o address these challenges, we propose a novel method termed Dual-View Alignment Learning with Hierarchical Prompt (HP-DV AL), which leverages multi-modal knowledge from vision-language pretrained (VLP) models to mitigate the class-imbalance problem in multi-label settings. Specifically, HP-DV AL employs dual-view alignment learning to transfer the powerful feature representation capabilities from VLP models by extracting complementary features for accurate image-text alignment. T o better adapt VLP models for CI-MLIC tasks, we introduce a hierarchical prompt-tuning strategy that utilizes global and local prompts to learn task-specific and context-related prior knowledge. Additionally, we design a semantic consistency loss during prompt tuning to prevent learned prompts from deviating from general knowledge embedded in VLP models. The effectiveness of our approach is validated on two CI-MLIC benchmarks: MS-COCO and VOC2007. Extensive experimental results demonstrate the superiority of our method over SOT A approaches, achieving mAP improvements of 10.0% and 5.2% on the long-tailed multi-label image classification task, and 6.8% and 2.9% on the multi-label few-shot image classification task.

artificial intelligence, classification, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2509.17747

Country:

North America > United States (0.46)
Asia > China > Anhui Province (0.14)

Genre:

Research Report > New Finding (0.48)
Research Report > Promising Solution (0.48)

Industry: Education (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.78)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Diffusion-Based Cross-Modal Feature Extraction for Multi-Label Classification

Lan, Tian, Zheng, Yiming, Yin, Jianxin

arXiv.org Artificial IntelligenceSep-22-2025

Multi-label classification has broad applications and depends on powerful representations capable of capturing multi-label interactions. We introduce \textit{Diff-Feat}, a simple but powerful framework that extracts intermediate features from pre-trained diffusion-Transformer models for images and text, and fuses them for downstream tasks. We observe that for vision tasks, the most discriminative intermediate feature along the diffusion process occurs at the middle step and is located in the middle block in Transformer. In contrast, for language tasks, the best feature occurs at the noise-free step and is located in the deepest block. In particular, we observe a striking phenomenon across varying datasets: a mysterious "Layer $12$" consistently yields the best performance on various downstream classification tasks for images (under DiT-XL/2-256$\times$256). We devise a heuristic local-search algorithm that pinpoints the locally optimal "image-text"$\times$"block-timestep" pair among a few candidates, avoiding an exhaustive grid search. A simple fusion-linear projection followed by addition-of the selected representations yields state-of-the-art performance: 98.6\% mAP on MS-COCO-enhanced and 45.7\% mAP on Visual Genome 500, surpassing strong CNN, graph, and Transformer baselines by a wide margin. t-SNE and clustering metrics further reveal that \textit{Diff-Feat} forms tighter semantic clusters than unimodal counterparts. The code is available at https://github.com/lt-0123/Diff-Feat.

artificial intelligence, machine learning, representation, (16 more...)

arXiv.org Artificial Intelligence

2509.15553

Country: Europe > Italy (0.28)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.46)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

HSVLT: Hierarchical Scale-Aware Vision-Language Transformer for Multi-Label Image Classification

Ouyang, Shuyi, Wang, Hongyi, Niu, Ziwei, Bai, Zhenjia, Xie, Shiao, Xu, Yingying, Tong, Ruofeng, Chen, Yen-Wei, Lin, Lanfen

arXiv.org Artificial IntelligenceJul-23-2024

The task of multi-label image classification involves recognizing multiple objects within a single image. Considering both valuable semantic information contained in the labels and essential visual features presented in the image, tight visual-linguistic interactions play a vital role in improving classification performance. Moreover, given the potential variance in object size and appearance within a single image, attention to features of different scales can help to discover possible objects in the image. Recently, Transformer-based methods have achieved great success in multi-label image classification by leveraging the advantage of modeling long-range dependencies, but they have several limitations. Firstly, existing methods treat visual feature extraction and cross-modal fusion as separate steps, resulting in insufficient visual-linguistic alignment in the joint semantic space. Additionally, they only extract visual features and perform cross-modal fusion at a single scale, neglecting objects with different characteristics. To address these issues, we propose a Hierarchical Scale-Aware Vision-Language Transformer (HSVLT) with two appealing designs: (1)~A hierarchical multi-scale architecture that involves a Cross-Scale Aggregation module, which leverages joint multi-modal features extracted from multiple scales to recognize objects of varying sizes and appearances in images. (2)~Interactive Visual-Linguistic Attention, a novel attention mechanism module that tightly integrates cross-modal interaction, enabling the joint updating of visual, linguistic and multi-modal features. We have evaluated our method on three benchmark datasets. The experimental results demonstrate that HSVLT surpasses state-of-the-art methods with lower computational cost.

classification, hsvlt, proceedings, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3581783.3612159

2407.16244

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > Canada > Ontario > National Capital Region > Ottawa (0.05)
Asia > China > Zhejiang Province > Hangzhou (0.05)
(4 more...)

Genre:

Research Report > Promising Solution (0.34)
Research Report > New Finding (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Counterfactual Reasoning for Multi-Label Image Classification via Patching-Based Training

Xie, Ming-Kun, Xiao, Jia-Hao, Peng, Pei, Niu, Gang, Sugiyama, Masashi, Huang, Sheng-Jun

arXiv.org Artificial IntelligenceJun-12-2024

The key to multi-label image classification (MLC) is to improve model performance by leveraging label correlations. Unfortunately, it has been shown that overemphasizing co-occurrence relationships can cause the overfitting issue of the model, ultimately leading to performance degradation. In this paper, we provide a causal inference framework to show that the correlative features caused by the target object and its co-occurring objects can be regarded as a mediator, which has both positive and negative impacts on model predictions. On the positive side, the mediator enhances the recognition performance of the model by capturing co-occurrence relationships; on the negative side, it has the harmful causal effect that causes the model to make an incorrect prediction for the target object, even when only co-occurring objects are present in an image. To address this problem, we propose a counterfactual reasoning method to measure the total direct effect, achieved by enhancing the direct effect caused only by the target object. Due to the unknown location of the target object, we propose patching-based training and inference to accomplish this goal, which divides an image into multiple patches and identifies the pivot patch that contains the target object. Experimental results on multiple benchmark datasets with diverse configurations validate that the proposed method can achieve state-of-the-art performance.

multi-label image classification, prediction, tresnetl, (13 more...)

arXiv.org Artificial Intelligence

2404.06287

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Austria > Vienna (0.14)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.46)

Add feedback

A Deep Model for Partial Multi-Label Image Classification with Curriculum Based Disambiguation

Sun, Feng, Xie, Ming-Kun, Huang, Sheng-Jun

arXiv.org Artificial IntelligenceMay-6-2024

In this paper, we study the partial multi-label (PML) image classification problem, where each image is annotated with a candidate label set consists of multiple relevant labels and other noisy labels. Existing PML methods typically design a disambiguation strategy to filter out noisy labels by utilizing prior knowledge with extra assumptions, which unfortunately is unavailable in many real tasks. Furthermore, because the objective function for disambiguation is usually elaborately designed on the whole training set, it can be hardly optimized in a deep model with SGD on mini-batches. In this paper, for the first time we propose a deep model for PML to enhance the representation and discrimination ability. On one hand, we propose a novel curriculum based disambiguation strategy to progressively identify ground-truth labels by incorporating the varied difficulties of different classes. On the other hand, a consistency regularization is introduced for model retraining to balance fitting identified easy labels and exploiting potential relevant labels. Extensive experimental results on the commonly used benchmark datasets show the proposed method significantly outperforms the SOTA methods.

candidate label, consistency regularization, disambiguation, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/s11633-023-1439-3

2207.0241

Country: Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision > Image Understanding (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.46)

Add feedback

ProbMCL: Simple Probabilistic Contrastive Learning for Multi-label Visual Classification

Sajedi, Ahmad, Khaki, Samir, Lawryshyn, Yuri A., Plataniotis, Konstantinos N.

arXiv.org Artificial IntelligenceJan-2-2024

Multi-label image classification presents a challenging task in many domains, including computer vision and medical imaging. Recent advancements have introduced graph-based and transformer-based methods to improve performance and capture label dependencies. However, these methods often include complex modules that entail heavy computation and lack interpretability. In this paper, we propose Probabilistic Multi-label Contrastive Learning (ProbMCL), a novel framework to address these challenges in multi-label image classification tasks. Our simple yet effective approach employs supervised contrastive learning, in which samples that share enough labels with an anchor image based on a decision threshold are introduced as a positive set. This structure captures label dependencies by pulling positive pair embeddings together and pushing away negative samples that fall below the threshold. We enhance representation learning by incorporating a mixture density network into contrastive learning and generating Gaussian mixture distributions to explore the epistemic uncertainty of the feature encoder. We validate the effectiveness of our framework through experimentation with datasets from the computer vision and medical imaging domains. Our method outperforms the existing state-of-the-art methods while achieving a low computational footprint on both datasets. Visualization analyses also demonstrate that ProbMCL-learned classifiers maintain a meaningful semantic topology.

classification, computer vision, proceedings, (13 more...)

arXiv.org Artificial Intelligence

2401.01448

Country: North America > Canada > Ontario > Toronto (0.15)

Genre: Research Report (1.00)

Industry: Health & Medicine > Diagnostic Medicine (0.55)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Language-Guided Transformer for Federated Multi-Label Classification

Liu, I-Jieh, Lin, Ci-Siang, Yang, Fu-En, Wang, Yu-Chiang Frank

arXiv.org Artificial IntelligenceDec-12-2023

Federated Learning (FL) is an emerging paradigm that enables multiple users to collaboratively train a robust model in a privacy-preserving manner without sharing their private data. Most existing approaches of FL only consider traditional single-label image classification, ignoring the impact when transferring the task to multi-label image classification. Nevertheless, it is still challenging for FL to deal with user heterogeneity in their local data distribution in the real-world FL scenario, and this issue becomes even more severe in multi-label image classification. Inspired by the recent success of Transformers in centralized settings, we propose a novel FL framework for multi-label classification. Since partial label correlation may be observed by local clients during training, direct aggregation of locally updated models would not produce satisfactory performances. Thus, we propose a novel FL framework of Language-Guided Transformer (FedLGT) to tackle this challenging task, which aims to exploit and transfer knowledge across different clients for learning a robust global model. Through extensive experiments on various multi-label datasets (e.g., FLAIR, MS-COCO, etc.), we show that our FedLGT is able to achieve satisfactory performance and outperforms standard FL techniques under multi-label FL scenarios. Code is available at https://github.com/Jack24658735/FedLGT.

classification, flair, global model, (16 more...)

arXiv.org Artificial Intelligence

2312.07165

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Asia > Taiwan (0.04)

Genre: Research Report (0.64)

Industry:

Health & Medicine (1.00)
Information Technology > Security & Privacy (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Understanding Label Bias in Single Positive Multi-Label Learning

Arroyo, Julio, Perona, Pietro, Cole, Elijah

arXiv.org Artificial IntelligenceMay-24-2023

Annotating data for multi-label classification is prohibitively expensive because every category of interest must be confirmed to be present or absent. Recent work on single positive multi-label (SPML) learning shows that it is possible to train effective multi-label classifiers using only one positive label per image. However, the standard benchmarks for SPML are derived from traditional multi-label classification datasets by retaining one positive label for each training example (chosen uniformly at random) and discarding all other labels. In realistic settings it is not likely that positive labels are chosen uniformly at random. This work introduces protocols for studying label bias in SPML and provides new empirical results.

artificial intelligence, classification, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2305.15584

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Spain (0.05)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.05)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.55)

Add feedback

Matrix Completion for Multi-label Image Classification

Neural Information Processing SystemsApr-6-2023, 12:56:51 GMT

Experimental validation on several datasets shows how our method outperforms state-of-the-art algorithms, while effectively capturing semantic concepts of classes.

image categorization, matrix completion, multi-label image classification, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.75)
Information Technology > Sensing and Signal Processing > Image Processing (0.55)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.40)

Add feedback