Goto

Collaborating Authors

 Cheng, Zhongwei


Data Efficient Training with Imbalanced Label Sample Distribution for Fashion Detection

arXiv.org Artificial Intelligence

Multi-label classification models have a wide range of applications in E-commerce, including visual-based label predictions and language-based sentiment classifications. A major challenge in achieving satisfactory performance for these tasks in the real world is the notable imbalance in data distribution. For instance, in fashion attribute detection, there may be only six 'puff sleeve' clothes among 1000 products in most E-commerce fashion catalogs. To address this issue, we explore more data-efficient model training techniques rather than acquiring a huge amount of annotations to collect sufficient samples, which is neither economic nor scalable. In this paper, we propose a state-of-the-art weighted objective function to boost the performance of deep neural networks (DNNs) for multi-label classification with long-tailed data distribution. Our experiments involve image-based attribute classification of fashion apparels, and the results demonstrate favorable performance for the new weighting method compared to non-weighted and inverse-frequency-based weighting mechanisms. We further evaluate the robustness of the new weighting mechanism using two popular fashion attribute types in today's fashion industry: sleevetype and archetype.


HalluAudio: Hallucinating Frequency as Concepts for Few-Shot Audio Classification

arXiv.org Artificial Intelligence

ABSTRACT Few-shot audio classification is an emerging topic that attracts more and more attention from the research community. Most existing work ignores the specificity of the form of the audio spectrogram and focuses largely on the embedding space borrowed from image tasks, while in this work, we aim to take advantage of this special audio format and propose a new method by hallucinating high-frequency and low-frequency parts as structured concepts. Extensive experiments on ESC-50 and our curated balanced Kaggle18 dataset show the proposed method outperforms the baseline by a notable margin. The way that our method hallucinates high-frequency and low-frequency parts also enables its interpretability and Figure 1. Detailed structure opens up new potentials for the few-shot audio classification.