Luo, Yucen
Learning Counterfactually Invariant Predictors
Quinzan, Francesco, Casolo, Cecilia, Muandet, Krikamol, Luo, Yucen, Kilbertus, Niki
Invariance, or equivariance to certain data transformations, has proven essential in numerous applications of machine learning (ML), since it can lead to better generalization capabilities [Arjovsky et al., 2019, Bloem-Reddy and Teh, 2020, Chen et al., 2020]. For instance, in image recognition, predictions ought to remain unchanged under scaling, translation, or rotation of the input image. Data augmentation, an early heuristic to promote such invariances, has become indispensable for successfully training deep neural networks (DNNs) [Shorten and Khoshgoftaar, 2019, Xie et al., 2020]. Well-known examples of "invariance by design" include convolutional neural networks (CNNs) for translation invariance [Krizhevsky et al., 2012], group equivariant NNs for general group transformations [Cohen and Welling, 2016], recurrent neural networks (RNNs) and transformers for sequential data [Vaswani et al., 2017], DeepSet [Zaheer et al., 2017] for sets, and graph neural networks (GNNs) for different types of geometric structures [Battaglia et al., 2018]. Many applications in modern ML, however, call for arguably stronger notions of invariance based on causality. This case has been made for image classification, algorithmic fairness [Hardt et al., 2016, Mitchell et al., 2021], robustness [Bรผhlmann, 2020], and out-of-distribution generalization [Lu et al., 2021]. The goal is invariance with respect to hypothetical manipulations of the data generating process (DGP). Various works develop methods that assume observational distributions (across environments or between training and test) to be governed by shared causal mechanisms, but differ due to various types of distribution shifts encoded by the causal model [Arjovsky et al., 2019, Bรผhlmann, 2020, Heinze-Deml et al., 2018, Makar et al., 2022, Part of this work was done while Francesco Quinzan visited the Max Planck Institute for Intelligent Systems, Tรผbingen, Germany.
Iterative Teaching by Data Hallucination
Qiu, Zeju, Liu, Weiyang, Xiao, Tim Z., Liu, Zhen, Bhatt, Umang, Luo, Yucen, Weller, Adrian, Schรถlkopf, Bernhard
We consider the problem of iterative machine teaching, where a teacher sequentially provides examples based on the status of a learner under a discrete input space (i.e., a pool of finite samples), which greatly limits the teacher's capability. To address this issue, we study iterative teaching under a continuous input space where the input example (i.e., image) can be either generated by solving an optimization problem or drawn directly from a continuous distribution. Specifically, we propose data hallucination teaching (DHT) where the teacher can generate input data intelligently based on labels, the learner's status and the target concept. We study a number of challenging teaching setups (e.g., linear/neural learners in omniscient and black-box settings). Extensive empirical results verify the effectiveness of DHT.
Spectral Representation Learning for Conditional Moment Models
Wang, Ziyu, Luo, Yucen, Li, Yueru, Zhu, Jun, Schรถlkopf, Bernhard
Many problems in causal inference and economics can be formulated in the framework of conditional moment models, which characterize the target function through a collection of conditional moment restrictions. For nonparametric conditional moment models, efficient estimation often relies on preimposed conditions on various measures of ill-posedness of the hypothesis space, which are hard to validate when flexible models are used. In this work, we address this issue by proposing a procedure that automatically learns representations with controlled measures of ill-posedness. Our method approximates a linear representation defined by the spectral decomposition of a conditional expectation operator, which can be used for kernelized estimators and is known to facilitate minimax optimal estimation in certain settings. We show this representation can be efficiently estimated from data, and establish L2 consistency for the resulting estimator. We evaluate the proposed method on proximal causal inference tasks, exhibiting promising performance on high-dimensional, semi-synthetic data.
Conditional Generative Moment-Matching Networks
Ren, Yong, Zhu, Jun, Li, Jialian, Luo, Yucen
Maximum mean discrepancy (MMD) has been successfully applied to learn deep generative models for characterizing a joint distribution of variables via kernel mean embedding. In this paper, we present conditional generative moment-matching networks (CGMMN), which learn a conditional distribution given some input variables based on a conditional maximum mean discrepancy (CMMD) criterion. The learning is performed by stochastic gradient descent with the gradient calculated by back-propagation. We evaluate CGMMN on a wide range of tasks, including predictive modeling, contextual generation, and Bayesian dark knowledge, which distills knowledge from a Bayesian model by learning a relatively small CGMMN student network. Our results demonstrate competitive performance in all the tasks.
Semi-crowdsourced Clustering with Deep Generative Models
Luo, Yucen, TIAN, TIAN, Shi, Jiaxin, Zhu, Jun, Zhang, Bo
We consider the semi-supervised clustering problem where crowdsourcing provides noisy information about the pairwise comparisons on a small subset of data, i.e., whether a sample pair is in the same cluster. We propose a new approach that includes a deep generative model (DGM) to characterize low-level features of the data, and a statistical relational model for noisy pairwise annotations on its subset. The two parts share the latent variables. To make the model automatically trade-off between its complexity and fitting data, we also develop its fully Bayesian variant. The challenge of inference is addressed by fast (natural-gradient) stochastic variational inference algorithms, where we effectively combine variational message passing for the relational part and amortized learning of the DGM under a unified framework.
A Simple yet Effective Baseline for Robust Deep Learning with Noisy Labels
Luo, Yucen, Zhu, Jun, Pfister, Tomas
Recently deep neural networks have shown their capacity to memorize training data, even with noisy labels, which hurts generalization performance. To mitigate this issue, we propose a simple but effective baseline that is robust to noisy labels, even with severe noise. Our objective involves a variance regularization term that implicitly penalizes the Jacobian norm of the neural network on the whole training set (including the noisy-labeled data), which encourages generalization and prevents overfitting to the corrupted labels. Experiments on both synthetically generated incorrect labels and realistic large-scale noisy datasets demonstrate that our approach achieves state-of-the-art performance with a high tolerance to severe noise.
Semi-crowdsourced Clustering with Deep Generative Models
Luo, Yucen, TIAN, TIAN, Shi, Jiaxin, Zhu, Jun, Zhang, Bo
We consider the semi-supervised clustering problem where crowdsourcing provides noisy information about the pairwise comparisons on a small subset of data, i.e., whether a sample pair is in the same cluster. We propose a new approach that includes a deep generative model (DGM) to characterize low-level features of the data, and a statistical relational model for noisy pairwise annotations on its subset. The two parts share the latent variables. To make the model automatically trade-off between its complexity and fitting data, we also develop its fully Bayesian variant. The challenge of inference is addressed by fast (natural-gradient) stochastic variational inference algorithms, where we effectively combine variational message passing for the relational part and amortized learning of the DGM under a unified framework. Empirical results on synthetic and real-world datasets show that our model outperforms previous crowdsourced clustering methods.
Semi-crowdsourced Clustering with Deep Generative Models
Luo, Yucen, TIAN, TIAN, Shi, Jiaxin, Zhu, Jun, Zhang, Bo
We consider the semi-supervised clustering problem where crowdsourcing provides noisy information about the pairwise comparisons on a small subset of data, i.e., whether a sample pair is in the same cluster. We propose a new approach that includes a deep generative model (DGM) to characterize low-level features of the data, and a statistical relational model for noisy pairwise annotations on its subset. The two parts share the latent variables. To make the model automatically trade-off between its complexity and fitting data, we also develop its fully Bayesian variant. The challenge of inference is addressed by fast (natural-gradient) stochastic variational inference algorithms, where we effectively combine variational message passing for the relational part and amortized learning of the DGM under a unified framework. Empirical results on synthetic and real-world datasets show that our model outperforms previous crowdsourced clustering methods.
Semi-crowdsourced Clustering with Deep Generative Models
Luo, Yucen, Tian, Tian, Shi, Jiaxin, Zhu, Jun, Zhang, Bo
We consider the semi-supervised clustering problem where crowdsourcing provides noisy information about the pairwise comparisons on a small subset of data, i.e., whether a sample pair is in the same cluster. We propose a new approach that includes a deep generative model (DGM) to characterize low-level features of the data, and a statistical relational model for noisy pairwise annotations on its subset. The two parts share the latent variables. To make the model automatically trade-off between its complexity and fitting data, we also develop its fully Bayesian variant. The challenge of inference is addressed by fast (natural-gradient) stochastic variational inference algorithms, where we effectively combine variational message passing for the relational part and amortized learning of the DGM under a unified framework. Empirical results on synthetic and real-world datasets show that our model outperforms previous crowdsourced clustering methods.
Smooth Neighbors on Teacher Graphs for Semi-supervised Learning
Luo, Yucen, Zhu, Jun, Li, Mengxi, Ren, Yong, Zhang, Bo
The recently proposed self-ensembling methods have achieved promising results in deep semi-supervised learning, which penalize inconsistent predictions of unlabeled data under different perturbations. However, they only consider adding perturbations to each single data point, while ignoring the connections between data samples. In this paper, we propose a novel method, called Smooth Neighbors on Teacher Graphs (SNTG). In SNTG, a graph is constructed based on the predictions of the teacher model, i.e., the implicit self-ensemble of models. Then the graph serves as a similarity measure with respect to which the representations of "similar" neighboring points are learned to be smooth on the low-dimensional manifold. We achieve state-of-the-art results on semi-supervised learning benchmarks. The error rates are 9.89%, 3.99% for CIFAR-10 with 4000 labels, SVHN with 500 labels, respectively. In particular, the improvements are significant when the labels are fewer. For the non-augmented MNIST with only 20 labels, the error rate is reduced from previous 4.81% to 1.36%. Our method also shows robustness to noisy labels.