ddfa
Unsupervised Learning under Latent Label Shift
Roberts, Manley, Mani, Pranav, Garg, Saurabh, Lipton, Zachary C.
What sorts of structure might enable a learner to discover classes from unlabeled data? Traditional approaches rely on feature-space similarity and heroic assumptions on the data. In this paper, we introduce unsupervised learning under Latent Label Shift (LLS), where we have access to unlabeled data from multiple domains such that the label marginals $p_d(y)$ can shift across domains but the class conditionals $p(\mathbf{x}|y)$ do not. This work instantiates a new principle for identifying classes: elements that shift together group together. For finite input spaces, we establish an isomorphism between LLS and topic modeling: inputs correspond to words, domains to documents, and labels to topics. Addressing continuous data, we prove that when each label's support contains a separable region, analogous to an anchor word, oracle access to $p(d|\mathbf{x})$ suffices to identify $p_d(y)$ and $p_d(y|\mathbf{x})$ up to permutation. Thus motivated, we introduce a practical algorithm that leverages domain-discriminative models as follows: (i) push examples through domain discriminator $p(d|\mathbf{x})$; (ii) discretize the data by clustering examples in $p(d|\mathbf{x})$ space; (iii) perform non-negative matrix factorization on the discrete data; (iv) combine the recovered $p(y|d)$ with the discriminator outputs $p(d|\mathbf{x})$ to compute $p_d(y|x) \; \forall d$. With semi-synthetic experiments, we show that our algorithm can leverage domain information to improve upon competitive unsupervised classification methods. We reveal a failure mode of standard unsupervised classification methods when feature-space similarity does not indicate true groupings, and show empirically that our method better handles this case. Our results establish a deep connection between distribution shift and topic modeling, opening promising lines for future work.
Unsupervised Feature Learning through Divergent Discriminative Feature Accumulation
Szerlip, Paul A. (University of Central Florida) | Morse, Gregory (University of Central Florida) | Pugh, Justin K. (University of Central Florida) | Stanley, Kenneth O. (University of Central Florida)
The increasing realization in recent years that artificial In particular, there is an alternative kind of discriminative neural networks (ANNs) can learn many layers of features learning that is unsupervised rather than supervised. In this (Bengio et al. 2007; Hinton, Osindero, and Teh 2006; proposed alternative approach, called divergent discriminative Marc'Aurelio, Boureau, and LeCun 2007; Cireşan et al. feature accumulation (DDFA), instead of searching for 2010) has reinvigorated the study of representation learning features constrained by the objective of solving the discriminative in ANNs (Bengio, Courville, and Vincent 2013). While classification problem, a learning algorithm can instead the beginning of this renaissance focused on the sequential attempt to collect as many features that discriminate unsupervised training of individual layers one upon another strongly among training examples as possible, without regard (Bengio et al. 2007; Hinton, Osindero, and Teh 2006), the to any particular classification problem.