AITopics

2503.15426

Country: Asia > China (0.14)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.67)

arXiv.org Artificial IntelligenceDec-24-2024

AFANet: Adaptive Frequency-Aware Network for Weakly-Supervised Few-Shot Semantic Segmentation

Ma, Jiaqi, Xie, Guo-Sen, Zhao, Fang, Li, Zechao

Few-shot learning aims to recognize novel concepts by leveraging prior knowledge learned from a few samples. However, for visually intensive tasks such as few-shot semantic segmentation, pixel-level annotations are time-consuming and costly. Therefore, in this paper, we utilize the more challenging image-level annotations and propose an adaptive frequency-aware network (AFANet) for weakly-supervised few-shot semantic segmentation (WFSS). Specifically, we first propose a cross-granularity frequency-aware module (CFM) that decouples RGB images into high-frequency and low-frequency distributions and further optimizes semantic structural information by realigning them. Unlike most existing WFSS methods using the textual information from the multi-modal language-vision model, e.g., CLIP, in an offline learning manner, we further propose a CLIP-guided spatial-adapter module (CSM), which performs spatial domain adaptive transformation on textual information through online learning, thus providing enriched cross-modal semantic information for CFM. Extensive experiments on the Pascal-5\textsuperscript{i} and COCO-20\textsuperscript{i} datasets demonstrate that AFANet has achieved state-of-the-art performance. The code is available at https://github.com/jarch-ma/AFANet.

machine learning, natural language, segmentation, (18 more...)

2412.17601

Country: Europe > Switzerland (0.28)

Genre: Research Report > Promising Solution (0.34)

Industry:

Transportation (0.46)
Education (0.35)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

arXiv.org Artificial IntelligenceNov-12-2024

Fast Disentangled Slim Tensor Learning for Multi-view Clustering

Xu, Deng, Zhang, Chao, Li, Zechao, Chen, Chunlin, Li, Huaxiong

Tensor-based multi-view clustering has recently received significant attention due to its exceptional ability to explore cross-view high-order correlations. However, most existing methods still encounter some limitations. (1) Most of them explore the correlations among different affinity matrices, making them unscalable to large-scale data. (2) Although some methods address it by introducing bipartite graphs, they may result in sub-optimal solutions caused by an unstable anchor selection process. (3) They generally ignore the negative impact of latent semantic-unrelated information in each view. To tackle these issues, we propose a new approach termed fast Disentangled Slim Tensor Learning (DSTL) for multi-view clustering . Instead of focusing on the multi-view graph structures, DSTL directly explores the high-order correlations among multi-view latent semantic representations based on matrix factorization. To alleviate the negative influence of feature redundancy, inspired by robust PCA, DSTL disentangles the latent low-dimensional representation into a semantic-unrelated part and a semantic-related part for each view. Subsequently, two slim tensors are constructed with tensor-based regularization. To further enhance the quality of feature disentanglement, the semantic-related representations are aligned across views through a consensus alignment indicator. Our proposed model is computationally efficient and can be solved effectively. Extensive experiments demonstrate the superiority and efficiency of DSTL over state-of-the-art approaches. The code of DSTL is available at https://github.com/dengxu-nju/DSTL.

artificial intelligence, machine learning, representation, (19 more...)

2411.07685

Country: Asia > China (0.48)

Genre:

Research Report > Promising Solution (0.34)
Overview > Innovation (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

arXiv.org Artificial IntelligenceOct-10-2024

Why pre-training is beneficial for downstream classification tasks?

Jiang, Xin, Cheng, Xu, Li, Zechao

Pre-training has exhibited notable benefits to downstream tasks by boosting accuracy and speeding up convergence, but the exact reasons for these benefits still remain unclear. To this end, we propose to quantitatively and explicitly explain effects of pre-training on the downstream task from a novel game-theoretic view, which also sheds new light into the learning behavior of deep neural networks (DNNs). Specifically, we extract and quantify the knowledge encoded by the pre-trained model, and further track the changes of such knowledge during the fine-tuning process. Interestingly, we discover that only a small amount of pre-trained model's knowledge is preserved for the inference of downstream tasks. However, such preserved knowledge is very challenging for a model training from scratch to learn. Thus, with the help of this exclusively learned and useful knowledge, the model fine-tuned from pre-training usually achieves better performance than the model training from scratch. Besides, we discover that pre-training can guide the fine-tuned model to learn target knowledge for the downstream task more directly and quickly, which accounts for the faster convergence of the fine-tuned model.

artificial intelligence, knowledge, machine learning, (19 more...)

2410.08455

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

arXiv.org Artificial IntelligenceJul-18-2020

Face Super-Resolution Guided by 3D Facial Priors

Hu, Xiaobin, Ren, Wenqi, LaMaster, John, Cao, Xiaochun, Li, Xiaoming, Li, Zechao, Menze, Bjoern, Liu, Wei

State-of-the-art face super-resolution methods employ deep convolutional neural networks to learn a mapping between low- and high- resolution facial patterns by exploring local appearance knowledge. However, most of these methods do not well exploit facial structures and identity information, and struggle to deal with facial images that exhibit large pose variations. In this paper, we propose a novel face super-resolution method that explicitly incorporates 3D facial priors which grasp the sharp facial structures. Our work is the first to explore 3D morphable knowledge based on the fusion of parametric descriptions of face attributes (e.g., identity, facial expression, texture, illumination, and face pose). Furthermore, the priors can easily be incorporated into any network and are extremely efficient in improving the performance and accelerating the convergence speed. Firstly, a 3D face rendering branch is set up to obtain 3D priors of salient facial structures and identity knowledge. Secondly, the Spatial Attention Module is used to better exploit this hierarchical information (i.e., intensity similarity, 3D facial structure, and identity content) for the super-resolution problem. Extensive experiments demonstrate that the proposed 3D priors achieve superior face super-resolution results over the state-of-the-arts.

deep learning, neural network, pose variation, (15 more...)

2007.09454

Country: Asia > China (0.46)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

AAAI ConferencesFeb-8-2018

Show, Reward and Tell: Automatic Generation of Narrative Paragraph From Photo Stream by Adversarial Training

Wang, Jing (Nanjing University of Science and Technology) | Fu, Jianlong (Microsoft Research) | Tang, Jinhui (Nanjing University of Science and Technology) | Li, Zechao (Nanjing University of Science and Technology) | Mei, Tao (Microsoft Research)

Impressive image captioning results (i.e., an objective description for an image) are achieved with plenty of training pairs. In this paper, we take one step further to investigate the creation of narrative paragraph for a photo stream. This task is even more challenging due to the difficulty in modeling an ordered photo sequence and in generating a relevant paragraph with expressive language style for storytelling. The difficulty can even be exacerbated by the limited training data, so that existing approaches almost focus on search-based solutions. To deal with these challenges, we propose a sequence-to-sequence modeling approach with reinforcement learning and adversarial training. First, to model the ordered photo stream, we propose a hierarchical recurrent neural network as story generator, which is optimized by reinforcement learning with rewards. Second, to generate relevant and story-style paragraphs, we design the rewards with two critic networks, including a multi-modal and a language-style discriminator. Third, we further consider the story generator and reward critics as adversaries. The generator aims to create indistinguishable paragraphs to human-level stories, whereas the critics aim at distinguishing them and further improving the generator by policy gradient. Experiments on three widely-used datasets show the effectiveness, against state-of-the-art methods with relative increase of 20.2% by METEOR. We also show the subjective preference for the proposed approach over the baselines through a user study with 30 human subjects.

deep learning, neural network, paragraph, (20 more...)

Thirty-Second AAAI Conference on Artificial Intelligence

Country: Asia > China (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

AAAI ConferencesFeb-14-2017

Weakly-Supervised Deep Nonnegative Low-Rank Model for Social Image Tag Refinement and Assignment

Li, Zechao (Nanjing University of Science and Technology) | Tang, Jinhui (Nanjing University of Science and Technology)

It has been well known that the user-provided tags of social images are imperfect, i.e., there exist noisy, irrelevant or incomplete tags. It heavily degrades the performance of many multimedia tasks. To alleviate this problem, we propose a Weakly-supervised Deep Nonnegative Low-rank model (WDNL) to improve the quality of tags by integrating the low-rank model with deep feature learning. A nonnegative low-rank model is introduced to uncover the intrinsic relationships between images and tags by simultaneously removing noisy or irrelevant tags and complementing missing tags. The deep architecture is leveraged to seamlessly connect the visual content and the semantic tag. That is, the proposed model can well handle the scalability by assigning tags to new images. Extensive experiments conducted on two real-world datasets demonstrate the effectiveness of the proposed method compared with some state-of-the-art methods.

deep architecture, deep learning, neural network, (17 more...)

Thirty-First AAAI Conference on Artificial Intelligence

Country: Asia > China (0.14)

Genre: Research Report (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

AAAI ConferencesJul-14-2014

Learning Low-Rank Representations with Classwise Block-Diagonal Structure for Robust Face Recognition

Li, Yong (Chinese Academy of Sciences) | Liu, Jing (Chinese Academy of Sciences) | Li, Zechao (Nanjing University of Science and Technology) | Zhang, Yangmuzi (University of Maryland, College Park) | Lu, Hanqing (Chinese Academy of Sciences) | Ma, Songde (Chinese Academy of Sciences)

Face recognition has been widely studied due to its importance in various applications. However, the case that both training images and testing images are corrupted is not well addressed. Motivated by the success of low-rank matrix recovery, we propose a novel semi-supervised low-rank matrix recovery algorithm for robust face recognition. The proposed method can learn robust discriminative representations for both training images and testing images simultaneously by exploiting the classwise block-diagonal structure. Specifically, low-rank matrix approximation can handle the possible contamination of data. Moreover, the classwise block-diagonal structure is exploited to promote discrimination of representations for robust recognition. The above issues are formulated into a unified objective function and we design an efficient optimization procedure based on augmented Lagrange multiplier method to solve it. Extensive experiments on three public databases are performed to validate the effectiveness of our approach. The strong identification capability of representations with block-diagonal structure is verified.

artificial intelligence, machine learning, representation, (15 more...)

Twenty-Eighth AAAI Conference on Artificial Intelligence

Country: North America > United States > Maryland (0.14)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

AAAI ConferencesJul-21-2012

Unsupervised Feature Selection Using Nonnegative Spectral Analysis

Li, Zechao (Chinese Academy of Sciences) | Yang, Yi (Carnegie Mellon University) | Liu, Jing (Chinese Academy of Sciences) | Zhou, Xiaofang (The University of Queensland) | Lu, Hanqing (Chinese Academy of Science)

In this paper, a new unsupervised learning algorithm, namely Nonnegative Discriminative Feature Selection (NDFS), is proposed. To exploit the discriminative information in unsupervised scenarios, we perform spectral clustering to learn the cluster labels of the input samples, during which the feature selection is performed simultaneously. The joint learning of the cluster labels and feature selection matrix enables NDFS to select the most discriminative features. To learn more accurate cluster labels, a nonnegative constraint is explicitly imposed to the class indicators. To reduce the redundant or even noisy features, l 2,1 -norm minimization constraint is added into the objective function, which guarantees the feature selection matrix sparse in rows. Our algorithm exploits the discriminative information and feature correlation simultaneously to select a better feature subset. A simple yet efficient iterative algorithm is designed to optimize the proposed objective function. Experimental results on different real world datasets demonstrate the encouraging performance of our algorithm over the state-of-the-arts.

artificial intelligence, feature selection, optimization problem, (16 more...)

Twenty-Sixth AAAI Conference on Artificial Intelligence

Country: North America > United States (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)