Goto

Collaborating Authors

 dkl


AuxiliaryTaskReweightingfor Minimum-dataLearning

Neural Information Processing Systems

Supervised learning requires a large amount of training data, limiting its application where labeled data is scarce. To compensate for data scarcity, one possible method is to utilize auxiliary tasks to provide additional supervision for the main task. Assigning and optimizing the importance weights for different auxiliary tasks remains an crucial and largely understudied research question. In this work, we propose a method to automatically reweight auxiliary tasks in order to reduce the data requirement on the main task. Specifically, we formulate the weighted likelihood function of auxiliary tasks as a surrogate prior for the main task. By adjusting the auxiliary task weights to minimize the divergence between the surrogate prior and the true prior ofthe main task, we obtain amore accurate prior estimation, achieving the goal of minimizing the required amount of training data for the main task and avoiding a costly grid search.



Learning Disentangled Joint Continuous and Discrete Representations

Emilien Dupont

Neural Information Processing Systems

Itcomeswiththeadvantages ofVAEs, such asstable training, largesample diversity and aprincipled inference network, while having the flexibility to model a combination of continuous and discrete generative factors.






Efficientconstrainedsamplingviathe mirror-Langevinalgorithm

Neural Information Processing Systems

The sampling problem has attracted considerable attention recently within the machine learning and statistics communities. This renewed interest in sampling is spurred, on one hand, by a wide breadth of applications ranging from Bayesian inference [RC04, DM+19] and its use in inverse problems [DS17], to neural networks [GPAM+14, TR20].


Efficientconstrainedsamplingviathe mirror-Langevinalgorithm

Neural Information Processing Systems

The sampling problem has attracted considerable attention recently within the machine learning and statistics communities. This renewed interest in sampling is spurred, on one hand, by a wide breadth of applications ranging from Bayesian inference [RC04, DM+19] and its use in inverse problems [DS17], to neural networks [GPAM+14, TR20].


SM.1 Omittedproofs SM.1.1 ProofofProposition1 Proposition1. ThefunctionmC() = 2C(Mϵ()): X [1,c]satisfiesallpropertiesofapredictive multiplicitymetricinDefinition1

Neural Information Processing Systems

For clarity, we assume|Mϵ(xi)| = m. By the information inequality [1, Theorem 2.6.3] the mutual informationI(M;Y) between the random variablesM and Y (defined in Section 3) is non-negative, i.e.,I(M;Y) 0. On the other hand, we denote the c models in R(H,ϵ) which output scores are the "vertices" of c to be m1,,mc, then H(Y|M = mk) = 0, k [c]. H(Y|M) is minimized to 0 by setting the weightspm on those c models to be 1c and the rest to be0. Since this holds for the capacity-achievingPM, which in turn is the maximimum across input distributions,theconverseresultfollows. Theconsequence ofpredictivemultiplicity isthatthe sameindividual can betreated differently due toarbitrary and unjustified choices made during the training process (e.g., parameter initialization, random seed, dropoutprobability,etc.).