Goto

Collaborating Authors

 lij


UnderstandingProgrammaticWeakSupervision viaSource-awareInfluenceFunction

Neural Information Processing Systems

Toachievethis, webuildonInfluenceFunction(IF)andproposesource-awareIF 2,whichleverages the generation process of the probabilistic labels to decompose the end model's training objective and then calculate the influence associated with each (data, source, class)tuple.


Local Linear Recovery Guarantee of Deep Neural Networks at Overparameterization

Zhang, Yaoyu, Zhang, Leyang, Zhang, Zhongwang, Bai, Zhiwei

arXiv.org Machine Learning

Determining whether deep neural network (DNN) models can reliably recover target functions at overparameterization is a critical yet complex issue in the theory of deep learning. To advance understanding in this area, we introduce a concept we term "local linear recovery" (LLR), a weaker form of target function recovery that renders the problem more amenable to theoretical analysis. In the sense of LLR, we prove that functions expressible by narrower DNNs are guaranteed to be recoverable from fewer samples than model parameters. Specifically, we establish upper limits on the optimistic sample sizes, defined as the smallest sample size necessary to guarantee LLR, for functions in the space of a given DNN. Furthermore, we prove that these upper bounds are achieved in the case of two-layer tanh neural networks. Our research lays a solid groundwork for future investigations into the recovery capabilities of DNNs in overparameterized scenarios.


HeMPPCAT: Mixtures of Probabilistic Principal Component Analysers for Data with Heteroscedastic Noise

Xu, Alec S., Balzano, Laura, Fessler, Jeffrey A.

arXiv.org Machine Learning

Mixtures of probabilistic principal component analysis (MPPCA) is a well-known mixture model extension of principal component analysis (PCA). Similar to PCA, MPPCA assumes the data samples in each mixture contain homoscedastic noise. However, datasets with heterogeneous noise across samples are becoming increasingly common, as larger datasets are generated by collecting samples from several sources with varying noise profiles. The performance of MPPCA is suboptimal for data with heteroscedastic noise across samples. This paper proposes a heteroscedastic mixtures of probabilistic PCA technique (HeMPPCAT) that uses a generalized expectation-maximization (GEM) algorithm to jointly estimate the unknown underlying factors, means, and noise variances under a heteroscedastic noise setting. Simulation results illustrate the improved factor estimates and clustering accuracies of HeMPPCAT compared to MPPCA.