AITopics | Chiang, Chao-Kai

Domain Adaptation and Entanglement: an Optimal Transport Perspective

Koç, Okan, Soen, Alexander, Chiang, Chao-Kai, Sugiyama, Masashi

arXiv.org Artificial IntelligenceMar-11-2025

Current machine learning systems are brittle in the face of distribution shifts (DS), where the target distribution that the system is tested on differs from the source distribution used to train the system. This problem of robustness to DS has been studied extensively in the field of domain adaptation. For deep neural networks, a popular framework for unsupervised domain adaptation (UDA) is domain matching, in which algorithms try to align the marginal distributions in the feature or output space. The current theoretical understanding of these methods, however, is limited and existing theoretical results are not precise enough to characterize their performance in practice. In this paper, we derive new bounds based on optimal transport that analyze the UDA problem. Our new bounds include a term which we dub as \emph{entanglement}, consisting of an expectation of Wasserstein distance between conditionals with respect to changing data distributions. Analysis of the entanglement term provides a novel perspective on the unoptimizable aspects of UDA. In various experiments with multiple models across several DS scenarios, we show that this term can be used to explain the varying performance of UDA algorithms.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2503.08155

Country:

North America > United States (0.29)
Asia (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

The Choice of Noninformative Priors for Thompson Sampling in Multiparameter Bandit Models

Lee, Jongyeong, Chiang, Chao-Kai, Sugiyama, Masashi

arXiv.org Machine LearningDec-12-2023

Thompson sampling (TS) has been known for its outstanding empirical performance supported by theoretical guarantees across various reward models in the classical stochastic multi-armed bandit problems. Nonetheless, its optimality is often restricted to specific priors due to the common observation that TS is fairly insensitive to the choice of the prior when it comes to asymptotic regret bounds. However, when the model contains multiple parameters, the optimality of TS highly depends on the choice of priors, which casts doubt on the generalizability of previous findings to other models. To address this gap, this study explores the impact of selecting noninformative priors, offering insights into the performance of TS when dealing with new models that lack theoretical understanding. We first extend the regret analysis of TS to the model of uniform distributions with unknown supports, which would be the simplest non-regular model. Our findings reveal that changing noninformative priors can significantly affect the expected regret, aligning with previously known results in other multiparameter bandit models. Although the uniform prior is shown to be optimal, we highlight the inherent limitation of its optimality, which is limited to specific parameterizations and emphasizes the significance of the invariance property of priors. In light of this limitation, we propose a slightly modified TS-based policy, called TS with Truncation (TS-T), which can achieve the asymptotic optimality for the Gaussian models and the uniform models by using the reference prior and the Jeffreys prior that are invariant under one-to-one reparameterizations. This policy provides an alternative approach to achieving optimality by employing fine-tuned truncation, which would be much easier than hunting for optimal priors in practice.

artificial intelligence, data mining, machine learning, (20 more...)

arXiv.org Machine Learning

2302.14407

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.45)

Add feedback

Unified Risk Analysis for Weakly Supervised Learning

Chiang, Chao-Kai, Sugiyama, Masashi

arXiv.org Artificial IntelligenceSep-15-2023

Among the flourishing research of weakly supervised learning (WSL), we recognize the lack of a unified interpretation of the mechanism behind the weakly supervised scenarios, let alone a systematic treatment of the risk rewrite problem, a crucial step in the empirical risk minimization approach. In this paper, we introduce a framework providing a comprehensive understanding and a unified methodology for WSL. The formulation component of the framework, leveraging a contamination perspective, provides a unified interpretation of how weak supervision is formed and subsumes fifteen existing WSL settings. The induced reduction graphs offer comprehensive connections over WSLs. The analysis component of the framework, viewed as a decontamination process, provides a systematic method of conducting risk rewrite. In addition to the conventional inverse matrix approach, we devise a novel strategy called marginal chain aiming to decontaminate distributions. We justify the feasibility of the proposed framework by recovering existing rewrites reported in the literature.

artificial intelligence, inductive learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2309.08216

Country:

Europe (1.00)
Asia (1.00)
North America > United States > California (0.28)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.85)

Add feedback

Optimality of Thompson Sampling with Noninformative Priors for Pareto Bandits

Lee, Jongyeong, Honda, Junya, Chiang, Chao-Kai, Sugiyama, Masashi

arXiv.org Artificial IntelligenceFeb-2-2023

In the stochastic multi-armed bandit problem, a randomized probability matching policy called Thompson sampling (TS) has shown excellent performance in various reward models. In addition to the empirical performance, TS has been shown to achieve asymptotic problem-dependent lower bounds in several models. However, its optimality has been mainly addressed under light-tailed or one-parameter models that belong to exponential families. In this paper, we consider the optimality of TS for the Pareto model that has a heavy tail and is parameterized by two unknown parameters. Specifically, we discuss the optimality of TS with probability matching priors that include the Jeffreys prior and the reference priors. We first prove that TS with certain probability matching priors can achieve the optimal regret bound. Then, we show the suboptimality of TS with other priors, including the Jeffreys and the reference priors. Nevertheless, we find that TS with the Jeffreys and reference priors can achieve the asymptotic lower bound if one uses a truncation procedure. These results suggest carefully choosing noninformative priors to avoid suboptimality and show the effectiveness of truncation procedures in TS-based policies.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2302.01544

Genre: Research Report (0.69)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

Federated Multi-Task Learning

Smith, Virginia, Chiang, Chao-Kai, Sanjabi, Maziar, Talwalkar, Ameet S.

Neural Information Processing SystemsFeb-15-2020, 19:41:39 GMT

Federated learning poses new statistical and systems challenges in training machine learning models over distributed networks of devices. In this work, we show that multi-task learning is naturally suited to handle the statistical challenges of this setting, and propose a novel systems-aware optimization method, MOCHA, that is robust to practical systems issues. Our method and theory for the first time consider issues of high communication cost, stragglers, and fault tolerance for distributed multi-task learning. The resulting method achieves significant speedups compared to alternatives in the federated setting, as we demonstrate through simulations on real-world federated datasets. Papers published at the Neural Information Processing Systems Conference.

artificial intelligence, machine learning, multi-task learning, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Hyper-parameter Tuning under a Budget Constraint

Lu, Zhiyun, Chiang, Chao-Kai, Sha, Fei

arXiv.org Machine LearningFeb-1-2019

Hyper-parameter tuning is of crucial importance to designing and deploying machine learning systems. Broadly, hyper-parameters include the architecture of the learning models, regularization parameters, optimization methods and their parameters, and other "knobs" to be tuned. It is challenging to explore the vast space of hyper-parameters efficiently to identify the optimal configuration. Quite a few approaches have been proposed and investigated: random search, Bayesian Optimization (BO) [30, 29], bandits-based Hyperband [17, 24], and meta-learning [5, 1, 10]. Many of those prior studies have focused on the aspect of reducing as much as possible the computation cost to obtain the optimal configuration. In this work, we look at a different but important perspective to hyper-parameter optimization - under a fixed time/computation cost, how we can improve the performance as much as possible.

artificial intelligence, configuration, optimization problem, (19 more...)

arXiv.org Machine Learning

1902.00532

Country: North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Federated Multi-Task Learning

Smith, Virginia, Chiang, Chao-Kai, Sanjabi, Maziar, Talwalkar, Ameet

arXiv.org Machine LearningFeb-27-2018

Federated learning poses new statistical and systems challenges in training machine learning models over distributed networks of devices. In this work, we show that multi-task learning is naturally suited to handle the statistical challenges of this setting, and propose a novel systems-aware optimization method, MOCHA, that is robust to practical systems issues. Our method and theory for the first time consider issues of high communication cost, stragglers, and fault tolerance for distributed multi-task learning. The resulting method achieves significant speedups compared to alternatives in the federated setting, as we demonstrate through simulations on real-world federated datasets.

learning, neural network, optimization problem, (15 more...)

arXiv.org Machine Learning

1705.10467

Country: North America > United States > California (0.14)

Genre: Research Report (0.82)

Industry: Information Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Federated Multi-Task Learning

Smith, Virginia, Chiang, Chao-Kai, Sanjabi, Maziar, Talwalkar, Ameet S.

Neural Information Processing SystemsDec-31-2017

Federated learning poses new statistical and systems challenges in training machine learning models over distributed networks of devices. In this work, we show that multi-task learning is naturally suited to handle the statistical challenges of this setting, and propose a novel systems-aware optimization method, MOCHA, that is robust to practical systems issues. Our method and theory for the first time consider issues of high communication cost, stragglers, and fault tolerance for distributed multi-task learning. The resulting method achieves significant speedups compared to alternatives in the federated setting, as we demonstrate through simulations on real-world federated datasets.

deep learning, learning, neural network, (16 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.14)

Industry: Information Technology (1.00)

Technology: