AITopics

2410.1372

Country: Asia (0.45)

Genre:

Research Report > New Finding (1.00)
Overview (0.92)

Industry:

Media > Music (1.00)
Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Graphics (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(6 more...)

arXiv.org Artificial IntelligenceJun-22-2023

The Inductive Bias of Flatness Regularization for Deep Matrix Factorization

Gatmiry, Khashayar, Li, Zhiyuan, Chuang, Ching-Yao, Reddi, Sashank, Ma, Tengyu, Jegelka, Stefanie

Recent works on over-parameterized neural networks have shown that the stochasticity in optimizers has the implicit regularization effect of minimizing the sharpness of the loss function (in particular, the trace of its Hessian) over the family zero-loss solutions. More explicit forms of flatness regularization also empirically improve the generalization performance. However, it remains unclear why and when flatness regularization leads to better generalization. This work takes the first step toward understanding the inductive bias of the minimum trace of the Hessian solutions in an important setting: learning deep linear networks from linear measurements, also known as \emph{deep matrix factorization}. We show that for all depth greater than one, with the standard Restricted Isometry Property (RIP) on the measurements, minimizing the trace of Hessian is approximately equivalent to minimizing the Schatten 1-norm of the corresponding end-to-end matrix parameters (i.e., the product of all layer matrices), which in turn leads to better generalization. We empirically verify our theoretical findings on synthetic datasets.

artificial intelligence, hessian, machine learning, (16 more...)

2306.13239

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)

arXiv.org Artificial IntelligenceMay-29-2023

InfoOT: Information Maximizing Optimal Transport

Chuang, Ching-Yao, Jegelka, Stefanie, Alvarez-Melis, David

Optimal transport aligns samples across distributions by minimizing the transportation cost between them, e.g., the geometric distances. Yet, it ignores coherence structure in the data such as clusters, does not handle outliers well, and cannot integrate new data points. To address these drawbacks, we propose InfoOT, an information-theoretic extension of optimal transport that maximizes the mutual information between domains while minimizing geometric distances. The resulting objective can still be formulated as a (generalized) optimal transport problem, and can be efficiently solved by projected gradient descent. This formulation yields a new projection method that is robust to outliers and generalizes to unseen samples. Empirically, InfoOT improves the quality of alignments across benchmarks in domain adaptation, cross-domain retrieval, and single-cell alignment.

artificial intelligence, infoot, machine learning, (16 more...)

2210.03164

Country: North America > United States (0.93)

Genre: Research Report (0.64)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

arXiv.org Artificial IntelligenceMay-15-2023

Debiasing Vision-Language Models via Biased Prompts

Chuang, Ching-Yao, Jampani, Varun, Li, Yuanzhen, Torralba, Antonio, Jegelka, Stefanie

Machine learning models have been shown to inherit biases from their training datasets. This can be particularly problematic for vision-language foundation models trained on uncurated datasets scraped from the internet. The biases can be amplified and propagated to downstream applications like zero-shot classifiers and text-to-image generative models. In this study, we propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding. In particular, we show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models. The proposed closed-form solution enables easy integration into large-scale pipelines, and empirical results demonstrate that our approach effectively reduces social bias and spurious correlation in both discriminative and generative vision-language models without the need for additional data or training.

arxiv preprint arxiv, machine learning, natural language, (18 more...)

2302.0007

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (0.68)
Law Enforcement & Public Safety (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)

arXiv.org Machine LearningJun-6-2021

Measuring Generalization with Optimal Transport

Chuang, Ching-Yao, Mroueh, Youssef, Greenewald, Kristjan, Torralba, Antonio, Jegelka, Stefanie

Understanding the generalization of deep neural networks is one of the most important tasks in deep learning. Although much progress has been made, theoretical error bounds still often behave disparately from empirical observations. In this work, we develop margin-based generalization bounds, where the margins are normalized with optimal transport costs between independent random subsets sampled from the training distribution. In particular, the optimal transport cost can be interpreted as a generalization of variance which captures the structural properties of the learned feature space. Our bounds robustly predict the generalization error, given training data and network parameters, on large scale datasets. Theoretically, we demonstrate that the concentration and separation of features play crucial roles in generalization, supporting empirical results in the literature. The code is available at \url{https://github.com/chingyaoc/kV-Margin}.

deep learning, generalization, neural network, (14 more...)

2106.03314

Country:

North America > United States > Massachusetts (0.14)
Asia > Middle East (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

arXiv.org Machine LearningMar-11-2021

Fair Mixup: Fairness via Interpolation

Chuang, Ching-Yao, Mroueh, Youssef

Training classifiers under fairness constraints such as group fairness, regularizes the disparities of predictions between the groups. Nevertheless, even though the constraints are satisfied during training, they might not generalize at evaluation time. To improve the generalizability of fair classifiers, we propose fair mixup, a new data augmentation strategy for imposing the fairness constraint. In particular, we show that fairness can be achieved by regularizing the models on paths of interpolated samples between the groups. We use mixup, a powerful data augmentation strategy to generate these interpolates. We analyze fair mixup and empirically show that it ensures a better generalization for both accuracy and fairness measurement in tabular, vision, and language benchmarks.

artificial intelligence, mixup, neural network, (18 more...)

2103.06503

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

arXiv.org Machine LearningOct-21-2020

Debiased Contrastive Learning

Chuang, Ching-Yao, Robinson, Joshua, Yen-Chen, Lin, Torralba, Antonio, Jegelka, Stefanie

A prominent technique for self-supervised representation learning has been to contrast semantically similar and dissimilar pairs of samples. Without access to labels, dissimilar (negative) points are typically taken to be randomly sampled datapoints, implicitly accepting that these points may, in reality, actually have the same label. Perhaps unsurprisingly, we observe that sampling negative examples from truly different labels improves performance, in a synthetic setting where labels are available. Motivated by this observation, we develop a debiased contrastive objective that corrects for the sampling of same-label datapoints, even without knowledge of the true labels. Empirically, the proposed objective consistently outperforms the state-of-the-art for representation learning in vision, language, and reinforcement learning benchmarks. Theoretically, we establish generalization bounds for the downstream classification task.

deep learning, neural network, objective, (19 more...)

2007.00224

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

arXiv.org Machine LearningOct-9-2020

Contrastive Learning with Hard Negative Samples

Robinson, Joshua, Chuang, Ching-Yao, Sra, Suvrit, Jegelka, Stefanie

We consider the question: how can you sample good negative examples for contrastive learning? We argue that, as with metric learning, learning contrastive representations benefits from hard negative samples (i.e., points that are difficult to distinguish from an anchor point). The key challenge toward using hard negatives is that contrastive methods must remain unsupervised, making it infeasible to adopt existing negative sampling strategies that use label information. In response, we develop a new class of unsupervised methods for selecting hard negative samples where the user can control the amount of hardness. A limiting case of this sampling results in a representation that tightly clusters each class, and pushes different classes as far apart as possible. The proposed method improves downstream performance across multiple modalities, requires only few additional lines of code to implement, and introduces no computational overhead.

artificial intelligence, inductive learning, representation, (13 more...)

2010.04592

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.35)

arXiv.org Machine LearningJul-6-2020

Estimating Generalization under Distribution Shifts via Domain-Invariant Representations

Chuang, Ching-Yao, Torralba, Antonio, Jegelka, Stefanie

When machine learning models are deployed on a test distribution different from the training distribution, they can perform poorly, but overestimate their performance. In this work, we aim to better estimate a model's performance under distribution shift, without supervision. To do so, we use a set of domain-invariant predictors as a proxy for the unknown, true target labels. Since the error of the resulting risk estimate depends on the target risk of the proxy model, we study generalization of domain-invariant representations and show that the complexity of the latent representation has a significant influence on the target risk. Empirically, our approach (1) enables self-tuning of domain adaptation models, and (2) accurately estimates the target error of given models under distribution shift. Other applications include model selection, deciding early stopping and error detection.

artificial intelligence, evolutionary algorithm, target risk, (16 more...)

2007.03511

Country:

North America > United States > Massachusetts (0.14)
Europe > Austria > Vienna (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.89)

arXiv.org Machine LearningOct-13-2019

The Role of Embedding Complexity in Domain-invariant Representations

Chuang, Ching-Yao, Torralba, Antonio, Jegelka, Stefanie

Unsupervised domain adaptation aims to generalize the hypothesis trained in a source domain to an unlabeled target domain. One popular approach to this problem is to learn domain-invariant embeddings for both domains. In this work, we study, theoretically and empirically, the effect of the embedding complexity on generalization to the target domain. In particular, this complexity affects an upper bound on the target risk; this is reflected in experiments, too. Next, we specify our theoretical framework to multilayer neural networks. As a result, we develop a strategy that mitigates sensitivity to the embedding complexity, and empirically achieves performance on par with or better than the best layer-dependent complexity tradeoff.

artificial intelligence, neural network, representation, (18 more...)

1910.05804

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)