Yang, Weiwei
Inducing a hierarchy for multi-class classification problems
Helm, Hayden S., Yang, Weiwei, Bharadwaj, Sujeeth, Lytvynets, Kate, Riva, Oriana, White, Christopher, Geisa, Ali, Priebe, Carey E.
In applications where categorical labels follow a natural hierarchy, classification methods that exploit the label structure often outperform those that do not. Unfortunately, the majority of classification datasets do not come pre-equipped with a hierarchical structure and classical "flat" classifiers must be employed. In this paper, we investigate a class of methods that induce a hierarchy that can similarly improve classification performance over flat classifiers. The class of methods follows the structure of first clustering the conditional distributions and subsequently using a hierarchical classifier with the induced hierarchy. We demonstrate the effectiveness of the class of methods both for discovering a latent hierarchy and for improving accuracy in principled simulation settings and three real data applications. Machine learning practitioners are often challenged with the task of classifying an object as one of tens or hundreds of classes. To address these problems, algorithms originally designed for binary or small multi-class problems are applied and naively deployed. In some instances the large set of labels comes pre-equipped with a hierarchical structure - that is, some labels are known to be mutually semantically similar to various degrees.
A partition-based similarity for classification distributions
Helm, Hayden S., Mehta, Ronak D., Duderstadt, Brandon, Yang, Weiwei, White, Christoper M., Geisa, Ali, Vogelstein, Joshua T., Priebe, Carey E.
Herein we define a measure of similarity between classification distributions that is both principled from the perspective of statistical pattern recognition and useful from the perspective of machine learning practitioners. In particular, we propose a novel similarity on classification distributions, dubbed task similarity, that quantifies how an optimally-transformed optimal representation for a source distribution performs when applied to inference related to a target distribution. The definition of task similarity allows for natural definitions of adversarial and orthogonal distributions. We highlight limiting properties of representations induced by (universally) consistent decision rules and demonstrate in simulation that an empirical estimate of task similarity is a function of the decision rule deployed for inference. We demonstrate that for a given target distribution, both transfer efficiency and semantic similarity of candidate source distributions correlate with empirical task similarity.
A general approach to progressive learning
Vogelstein, Joshua T., Helm, Hayden S., Mehta, Ronak D., Dey, Jayanta, LeVine, Will, Yang, Weiwei, Tower, Bryan, Larson, Jonathan, White, Chris, Priebe, Carey E.
In biological learning, data are used to improve performance simultaneously on the current task, as well as previously encountered and as yet unencountered tasks. In contrast, classical machine learning starts from a blank slate, or tabula rasa, using data only for the single task at hand. While typical transfer learning algorithms can improve performance on future tasks, their performance on prior tasks degrades upon learning new tasks (called catastrophic forgetting). Many recent approaches have attempted to maintain performance given new tasks. But striving to avoid forgetting sets the goal unnecessarily low: the goal of progressive learning, whether biological or artificial, is to improve performance on all tasks (including past and future) with any new data. We propose representation ensembling, as opposed to learner ensembling (e.g., bagging), to address progressive learning. We show that representation ensembling -- including representations learned by decision forests or deep network -- uniquely demonstrates improved performance on both past and future tasks in a variety of simulated and real data scenarios, including vision, language, and adversarial tasks, with or without resource constraints. Beyond progressive learning, this work has immediate implications with regards to mitigating batch effects and federated learning applications. We expect a deeper understanding of the mechanisms underlying biological progressive learning to enable further improvements in machine progressive learning.