AITopics | Sugiyama, Masashi

Collaborating Authors

Sugiyama, Masashi

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Thompson Exploration with Best Challenger Rule in Best Arm Identification

Lee, Jongyeong, Honda, Junya, Sugiyama, Masashi

arXiv.org Machine LearningSep-30-2023

This paper studies the fixed-confidence best arm identification (BAI) problem in the bandit framework in the canonical single-parameter exponential models. For this problem, many policies have been proposed, but most of them require solving an optimization problem at every round and/or are forced to explore an arm at least a certain number of times except those restricted to the Gaussian model. To address these limitations, we propose a novel policy that combines Thompson sampling with a computationally efficient approach known as the best challenger rule. While Thompson sampling was originally considered for maximizing the cumulative reward, we demonstrate that it can be used to naturally explore arms in BAI without forcing it. We show that our policy is asymptotically optimal for any two-armed bandit problems and achieves near optimality for general $K$-armed bandit problems for $K\geq 3$. Nevertheless, in numerical experiments, our policy shows competitive performance compared to asymptotically optimal policies in terms of sample complexity while requiring less computation cost. In addition, we highlight the advantages of our policy by comparing it to the concept of $\beta$-optimality, a relaxed notion of asymptotic optimality commonly considered in the analysis of a class of policies including the proposed one.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

2310.00539

Country: Asia > Japan > Honshū (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.86)

Add feedback

Understanding and Mitigating the Label Noise in Pre-training on Downstream Tasks

Chen, Hao, Wang, Jindong, Shah, Ankit, Tao, Ran, Wei, Hongxin, Xie, Xing, Sugiyama, Masashi, Raj, Bhiksha

arXiv.org Artificial IntelligenceSep-29-2023

Pre-training on large-scale datasets and then fine-tuning on downstream tasks have become a standard practice in deep learning. However, pre-training data often contain label noise that may adversely affect the generalization of the model. This paper aims to understand the nature of noise in pre-training datasets and to mitigate its impact on downstream tasks. More specifically, through extensive experiments of supervised pre-training models on synthetic noisy ImageNet-1K and YFCC15M datasets, we demonstrate that while slight noise in pre-training can benefit in-domain (ID) transfer performance, where the training and testing data share the same distribution, it always deteriorates out-of-domain (OOD) performance, where training and testing data distribution are different. We empirically verify that the reason behind is noise in pre-training shapes the feature space differently. We then propose a lightweight black-box tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization on both ID and OOD tasks, considering one may not be able to fully fine-tune or even access the pre-trained models. We conduct practical experiments on popular vision and language models that are pre-trained on noisy data for evaluation of our approach. Our analysis and results show the importance of this interesting and novel research direction, which we term Noisy Model Learning.

large language model, machine learning, sample percentage, (19 more...)

arXiv.org Artificial Intelligence

2309.17002

Country:

Europe (0.92)
North America > United States (0.27)

Genre: Research Report > New Finding (0.48)

Industry: Transportation (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Imprecise Label Learning: A Unified Framework for Learning with Various Imprecise Label Configurations

Chen, Hao, Shah, Ankit, Wang, Jindong, Tao, Ran, Wang, Yidong, Xie, Xing, Sugiyama, Masashi, Singh, Rita, Raj, Bhiksha

arXiv.org Artificial IntelligenceSep-29-2023

Learning with reduced labeling standards, such as noisy label, partial label, and multiple label candidates, which we generically refer to as \textit{imprecise} labels, is a commonplace challenge in machine learning tasks. Previous methods tend to propose specific designs for every emerging imprecise label configuration, which is usually unsustainable when multiple configurations of imprecision coexist. In this paper, we introduce imprecise label learning (ILL), a framework for the unification of learning with various imprecise label configurations. ILL leverages expectation-maximization (EM) for modeling the imprecise label information, treating the precise labels as latent variables.Instead of approximating the correct labels for training, it considers the entire distribution of all possible labeling entailed by the imprecise information. We demonstrate that ILL can seamlessly adapt to partial label learning, semi-supervised learning, noisy label learning, and, more importantly, a mixture of these settings. Notably, ILL surpasses the existing specified techniques for handling imprecise labels, marking the first unified framework with robust and effective performance across various challenging settings. We hope our work will inspire further research on this topic, unleashing the full potential of ILL in wider scenarios where precise labels are expensive and complicated to obtain.

artificial intelligence, inductive learning, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2305.12715

Country:

Europe > Portugal (0.14)
Asia (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Unified Risk Analysis for Weakly Supervised Learning

Chiang, Chao-Kai, Sugiyama, Masashi

arXiv.org Artificial IntelligenceSep-15-2023

Among the flourishing research of weakly supervised learning (WSL), we recognize the lack of a unified interpretation of the mechanism behind the weakly supervised scenarios, let alone a systematic treatment of the risk rewrite problem, a crucial step in the empirical risk minimization approach. In this paper, we introduce a framework providing a comprehensive understanding and a unified methodology for WSL. The formulation component of the framework, leveraging a contamination perspective, provides a unified interpretation of how weak supervision is formed and subsumes fifteen existing WSL settings. The induced reduction graphs offer comprehensive connections over WSLs. The analysis component of the framework, viewed as a decontamination process, provides a systematic method of conducting risk rewrite. In addition to the conventional inverse matrix approach, we devise a novel strategy called marginal chain aiming to decontaminate distributions. We justify the feasibility of the proposed framework by recovering existing rewrites reported in the literature.

artificial intelligence, inductive learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2309.08216

Country:

Europe (1.00)
Asia (1.00)
North America > United States > California (0.28)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.85)

Add feedback

Multi-Label Knowledge Distillation

Yang, Penghui, Xie, Ming-Kun, Zong, Chen-Chen, Feng, Lei, Niu, Gang, Sugiyama, Masashi, Huang, Sheng-Jun

arXiv.org Artificial IntelligenceAug-11-2023

Existing knowledge distillation methods typically work by imparting the knowledge of output logits or intermediate feature maps from the teacher network to the student network, which is very successful in multi-class single-label learning. However, these methods can hardly be extended to the multi-label learning scenario, where each instance is associated with multiple semantic labels, because the prediction probabilities do not sum to one and feature maps of the whole example may ignore minor classes in such a scenario. In this paper, we propose a novel multi-label knowledge distillation method. On one hand, it exploits the informative semantic knowledge from the logits by dividing the multi-label learning problem into a set of binary classification problems; on the other hand, it enhances the distinctiveness of the learned feature representations by leveraging the structural information of label-wise embeddings. Experimental results on multiple benchmark datasets validate that the proposed method can avoid knowledge counteraction among labels, thus achieving superior performance against diverse comparing methods. Our code is available at: https://github.com/penghui-yang/L2D

distillation, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2308.06453

Country: Asia > China > Jiangsu Province (0.14)

Genre: Research Report (0.82)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
(3 more...)

Add feedback

Distribution Shift Matters for Knowledge Distillation with Webly Collected Images

Tang, Jialiang, Chen, Shuo, Niu, Gang, Sugiyama, Masashi, Gong, Chen

arXiv.org Artificial IntelligenceJul-21-2023

Knowledge distillation aims to learn a lightweight student network from a pre-trained teacher network. In practice, existing knowledge distillation methods are usually infeasible when the original training data is unavailable due to some privacy issues and data management considerations. Therefore, data-free knowledge distillation approaches proposed to collect training instances from the Internet. However, most of them have ignored the common distribution shift between the instances from original training data and webly collected data, affecting the reliability of the trained student network. To solve this problem, we propose a novel method dubbed ``Knowledge Distillation between Different Distributions" (KD$^{3}$), which consists of three components. Specifically, we first dynamically select useful training instances from the webly collected data according to the combined predictions of teacher network and student network. Subsequently, we align both the weighted features and classifier parameters of the two networks for knowledge memorization. Meanwhile, we also build a new contrastive learning block called MixDistribution to generate perturbed data with a new distribution for instance alignment, so that the student network can further learn a distribution-invariant representation. Intensive experiments on various benchmark datasets demonstrate that our proposed KD$^{3}$ can outperform the state-of-the-art data-free knowledge distillation approaches.

artificial intelligence, machine learning, student network, (18 more...)

arXiv.org Artificial Intelligence

2307.11469

Country:

Asia > China (0.14)
Asia > Japan (0.14)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Diversity-enhancing Generative Network for Few-shot Hypothesis Adaptation

Dong, Ruijiang, Liu, Feng, Chi, Haoang, Liu, Tongliang, Gong, Mingming, Niu, Gang, Sugiyama, Masashi, Han, Bo

arXiv.org Artificial IntelligenceJul-12-2023

Generating unlabeled data has been recently shown to help address the few-shot hypothesis adaptation (FHA) problem, where we aim to train a classifier for the target domain with a few labeled target-domain data and a well-trained source-domain classifier (i.e., a source hypothesis), for the additional information of the highly-compatible unlabeled data. However, the generated data of the existing methods are extremely similar or even the same. The strong dependency among the generated data will lead the learning to fail. In this paper, we propose a diversity-enhancing generative network (DEG-Net) for the FHA problem, which can generate diverse unlabeled data with the help of a kernel independence measure: the Hilbert-Schmidt independence criterion (HSIC). Specifically, DEG-Net will generate data via minimizing the HSIC value (i.e., maximizing the independence) among the semantic features of the generated data. By DEG-Net, the generated unlabeled data are more diverse and more effective for addressing the FHA problem. Experimental results show that the DEG-Net outperforms existing FHA baselines and further verifies that generating diverse data plays a vital role in addressing the FHA problem

artificial intelligence, machine learning, unlabeled data, (16 more...)

arXiv.org Artificial Intelligence

2307.05948

Country: North America > United States > Hawaii (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.97)

Add feedback

Making Binary Classification from Multiple Unlabeled Datasets Almost Free of Supervision

Wu, Yuhao, Xia, Xiaobo, Yu, Jun, Han, Bo, Niu, Gang, Sugiyama, Masashi, Liu, Tongliang

arXiv.org Artificial IntelligenceJun-12-2023

Training a classifier exploiting a huge amount of supervised data is expensive or even prohibited in a situation, where the labeling cost is high. The remarkable progress in working with weaker forms of supervision is binary classification from multiple unlabeled datasets which requires the knowledge of exact class priors for all unlabeled datasets. However, the availability of class priors is restrictive in many real-world scenarios. To address this issue, we propose to solve a new problem setting, i.e., binary classification from multiple unlabeled datasets with only one pairwise numerical relationship of class priors (MU-OPPO), which knows the relative order (which unlabeled dataset has a higher proportion of positive examples) of two class-prior probabilities for two datasets among multiple unlabeled datasets. In MU-OPPO, we do not need the class priors for all unlabeled datasets, but we only require that there exists a pair of unlabeled datasets for which we know which unlabeled dataset has a larger class prior. Clearly, this form of supervision is easier to be obtained, which can make labeling costs almost free. We propose a novel framework to handle the MU-OPPO problem, which consists of four sequential modules: (i) pseudo label assignment; (ii) confident example collection; (iii) class prior estimation; (iv) classifier training with estimated class priors. Theoretically, we analyze the gap between estimated class priors and true class priors under the proposed framework. Empirically, we confirm the superiority of our framework with comprehensive experiments. Experimental results demonstrate that our framework brings smaller estimation errors of class priors and better performance of binary classification.

artificial intelligence, machine learning, unlabeled data, (20 more...)

arXiv.org Artificial Intelligence

2306.07036

Country:

Asia (0.28)
North America > United States > Louisiana (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine > Therapeutic Area (0.46)

Add feedback

A Category-theoretical Meta-analysis of Definitions of Disentanglement

Zhang, Yivan, Sugiyama, Masashi

arXiv.org Artificial IntelligenceMay-29-2023

Disentangling the factors of variation in data is a fundamental concept in machine learning and has been studied in various ways by different researchers, leading to a multitude of definitions. Despite the numerous empirical studies, more theoretical research is needed to fully understand the defining properties of disentanglement and how different definitions relate to each other. This paper presents a meta-analysis of existing definitions of disentanglement, using category theory as a unifying and rigorous framework. We propose that the concepts of the cartesian and monoidal products should serve as the core of disentanglement. With these core concepts, we show the similarities and crucial differences in dealing with (i) functions, (ii) equivariant maps, (iii) relations, and (iv) stochastic maps. Overall, our meta-analysis deepens our understanding of disentanglement and its various formulations and can help researchers navigate different definitions and choose the most appropriate one for their specific context.

artificial intelligence, category, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2305.06886

Country:

Europe (0.28)
North America > United States (0.28)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

BadLabel: A Robust Perspective on Evaluating and Enhancing Label-noise Learning

Zhang, Jingfeng, Song, Bo, Wang, Haohan, Han, Bo, Liu, Tongliang, Liu, Lei, Sugiyama, Masashi

arXiv.org Artificial IntelligenceMay-28-2023

Label-noise learning (LNL) aims to increase the model's generalization given training data with noisy labels. To facilitate practical LNL algorithms, researchers have proposed different label noise types, ranging from class-conditional to instance-dependent noises. In this paper, we introduce a novel label noise type called BadLabel, which can significantly degrade the performance of existing LNL algorithms by a large margin. BadLabel is crafted based on the label-flipping attack against standard classification, where specific samples are selected and their labels are flipped to other labels so that the loss values of clean and noisy labels become indistinguishable. To address the challenge posed by BadLabel, we further propose a robust LNL method that perturbs the labels in an adversarial manner at each epoch to make the loss values of clean and noisy labels again distinguishable. Once we select a small set of (mostly) clean labeled data, we can apply the techniques of semi-supervised learning to train the model accurately. Empirically, our experimental results demonstrate that existing LNL algorithms are vulnerable to the newly introduced BadLabel noise type, while our proposed robust LNL method can effectively improve the generalization performance of the model under various types of label noise. The new dataset of noisy labels and the source codes of robust LNL algorithms are available at https://github.com/zjfheart/BadLabels.

artificial intelligence, machine learning, noise, (12 more...)

arXiv.org Artificial Intelligence

2305.18377

Country:

Asia > Japan (0.28)
North America > United States > Illinois (0.14)

Genre: Research Report > New Finding (0.34)

Industry:

Health & Medicine (1.00)
Information Technology > Security & Privacy (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback