Jojic, Nebojsa
Structural epitome: a way to summarize one’s visual experience
Jojic, Nebojsa, Perina, Alessandro, Murino, Vittorio
In order to study the properties of total visual input in humans, a single subject wore a camera for two weeks capturing, on average, an image every 20 seconds (www.research.microsoft.com/~jojic/aihs). The resulting new dataset contains a mix of indoor and outdoor scenes as well as numerous foreground objects. Our first analysis goal is to create a visual summary of the subject’s two weeks of life using unsupervised algorithms that would automatically discover recurrent scenes, familiar faces or common actions. Direct application of existing algorithms, such as panoramic stitching (e.g. Photosynth) or appearance-based clustering models (e.g. the epitome), is impractical due to either the large dataset size or the dramatic variation in the lighting conditions. As a remedy to these problems, we introduce a novel image representation, the “stel epitome,” and an associated efficient learning algorithm. In our model, each image or image patch is characterized by a hidden mapping T, which, as in previous epitome models, defines a mapping between the image-coordinates and the coordinates in the large all-I-have-seen" epitome matrix. The limited epitome real-estate forces the mappings of different images to overlap, with this overlap indicating image similarity. However, in our model the image similarity does not depend on direct pixel-to-pixel intensity/color/feature comparisons as in previous epitome models, but on spatial configuration of scene or object parts, as the model is based on the palette-invariant stel models. As a result, stel epitomes capture structure that is invariant to non-structural changes, such as illumination, that tend to uniformly affect pixels belonging to a single scene or object part."
Exact inference and learning for cumulative distribution functions on loopy graphs
Jojic, Nebojsa, Meek, Chris, Huang, Jim C.
Probabilistic graphical models use local factors to represent dependence among sets of variables. For many problem domains, for instance climatology and epidemiology, in addition to local dependencies, we may also wish to model heavy-tailed statistics, where extreme deviations should not be treated as outliers. Specifying such distributions using graphical models for probability density functions (PDFs) generally lead to intractable inference and learning. Cumulative distribution networks (CDNs) provide a means to tractably specify multivariate heavy-tailed models as a product of cumulative distribution functions (CDFs). Currently, algorithms for inference and learning, which correspond to computing mixed derivatives, are exact only for tree-structured graphs. For graphs of arbitrary topology, an efficient algorithm is needed that takes advantage of the sparse structure of the model, unlike symbolic differentiation programs such as Mathematica and D* that do not. We present an algorithm for recursively decomposing the computation of derivatives for CDNs of arbitrary topology, where the decomposition is naturally described using junction trees. We compare the performance of the resulting algorithm to Mathematica and D*, and we apply our method to learning models for rainfall and H1N1 data, where we show that CDNs with cycles are able to provide a significantly better fits to the data as compared to tree-structured and unstructured CDNs and other heavy-tailed multivariate distributions such as the multivariate copula and logistic models.
Free energy score space
Perina, Alessandro, Cristani, Marco, Castellani, Umberto, Murino, Vittorio, Jojic, Nebojsa
A score function induced by a generative model of the data can provide a feature vectorof a fixed dimension for each data sample. Data samples themselves may be of differing lengths (e.g., speech segments, or other sequence data), but as a score function is based on the properties of the data generation process, it produces a fixed-length vector in a highly informative space, typically referred to as a "score space". Discriminative classifiers have been shown to achieve higher performance in appropriately chosen score spaces than is achievable by either the corresponding generative likelihood-based classifiers, or the discriminative classifiers usingstandard feature extractors. In this paper, we present a novel score space that exploits the free energy associated with a generative model. The resulting freeenergy score space (FESS) takes into account latent structure of the data at various levels, and can be trivially shown to lead to classification performance that at least matches the performance of the free energy classifier based on the same generative model, and the same factorization of the posterior. We also show that in several typical vision and computational biology applications the classifiers optimized in FESS outperform the corresponding pure generative approaches, as well as a number of previous approaches to combining discriminating and generative models.
Using ``epitomes'' to model genetic diversity: Rational design of HIV vaccine cocktails
Jojic, Nebojsa, Jojic, Vladimir, Meek, Christopher, Heckerman, David, Frey, Brendan J.
We introduce a new model of genetic diversity which summarizes a large input dataset into an epitome, a short sequence or a small set of short sequences of probability distributions capturing many overlapping subsequences from the dataset. The epitome as a representation has already been used in modeling real-valued signals, such as images and audio. The discrete sequence model we introduce in this paper targets applications in genetics, from multiple alignment to recombination and mutation inference. In our experiments, we concentrate on modeling the diversity of HIV where the epitome emerges as a natural model for producing relatively small vaccines covering a large number of immune system targets known as epitopes. Our experiments show that the epitome includes more epitopes than other vaccine designs of similar length, including cocktails of consensus strains, phylogenetic tree centers, and observed strains. We also discuss epitome designs that take into account uncertainty about T-cell cross reactivity and epitope presentation. In our experiments, we find that vaccine optimization is fairly robust to these uncertainties.
Using ``epitomes'' to model genetic diversity: Rational design of HIV vaccine cocktails
Jojic, Nebojsa, Jojic, Vladimir, Meek, Christopher, Heckerman, David, Frey, Brendan J.
We introduce a new model of genetic diversity which summarizes a large input dataset into an epitome, a short sequence or a small set of short sequences of probability distributions capturing many overlapping subsequences fromthe dataset. The epitome as a representation has already been used in modeling real-valued signals, such as images and audio. The discrete sequence model we introduce in this paper targets applications in genetics, from multiple alignment to recombination and mutation inference. Inour experiments, we concentrate on modeling the diversity of HIV where the epitome emerges as a natural model for producing relatively smallvaccines covering a large number of immune system targets known as epitopes. Our experiments show that the epitome includes more epitopes than other vaccine designs of similar length, including cocktails of consensus strains, phylogenetic tree centers, and observed strains. We also discuss epitome designs that take into account uncertainty about T-cell cross reactivity and epitope presentation. In our experiments, we find that vaccine optimization is fairly robust to these uncertainties.
Q-Clustering
Narasimhan, Mukund, Jojic, Nebojsa, Bilmes, Jeff A.
We show that Queyranne's algorithm for minimizing symmetric submodular functionscan be used for clustering with a variety of different objective functions. Two specific criteria that we consider in this paper are the single linkage and the minimum description length criteria. The first criterion triesto maximize the minimum distance between elements of different clusters,and is inherently "discriminative". It is known that optimal clusterings into k clusters for any given k in polynomial time for this criterion can be computed. The second criterion seeks to minimize the description length of the clusters given a probabilistic generative model. We show that the optimal partitioning into 2 clusters, and approximate partitioning (guaranteed to be within a factor of 2 of the the optimal) for more clusters can be computed. To the best of our knowledge, this is the first time that a tractable algorithm for finding the optimal clustering with respect to the MDL criterion for 2 clusters has been given. Besides the optimality result for the MDL criterion, the chief contribution of this paper is to show that the same algorithm can be used to optimize a broad class of criteria, and hence can be used for many application specific criterion for which efficient algorithm are not known.
Fast Transformation-Invariant Factor Analysis
Kannan, Anitha, Jojic, Nebojsa, Frey, Brendan
Dimensionality reduction techniques such as principal component analysis and factor analysis are used to discover a linear mapping between high dimensional data samples and points in a lower dimensional subspace. In [6], Jojic and Frey introduced mixture of transformation-invariant component analyzers (MTCA) that can account for global transformations such as translations and rotations, perform clustering and learn local appearance deformations by dimensionality reduction.
Fast Transformation-Invariant Factor Analysis
Kannan, Anitha, Jojic, Nebojsa, Frey, Brendan
Dimensionality reduction techniques such as principal component analysis andfactor analysis are used to discover a linear mapping between high dimensional data samples and points in a lower dimensional subspace. In [6], Jojic and Frey introduced mixture of transformation-invariant component analyzers (MTCA) that can account for global transformations suchas translations and rotations, perform clustering and learn local appearance deformations by dimensionality reduction.
Fast, Large-Scale Transformation-Invariant Clustering
Frey, Brendan J., Jojic, Nebojsa
In previous work on "transformed mixtures of Gaussians" and "transformed hidden Markov models", we showed how the EM algorithm in a discrete latent variable model can be used to jointly normalize data (e.g., center images, pitch-normalize spectrograms) and learn a mixture model of the normalized data. The only input to the algorithm is the data, a list of possible transformations, and the number of clusters to find. The main criticism of this work was that the exhaustive computation of the posterior probabilities over transformations would make scaling up to large feature vectors and large sets of transformations intractable. Here, we describe how a tremendous speedup is acheived through the use of a variational technique for decoupling transformations, and a fast Fourier transform method for computing posterior probabilities.
Fast, Large-Scale Transformation-Invariant Clustering
Frey, Brendan J., Jojic, Nebojsa
In previous work on "transformed mixtures of Gaussians" and "transformed hidden Markov models", we showed how the EM algorithm in a discrete latent variable model can be used to jointly normalize data (e.g., center images, pitch-normalize spectrograms) and learn a mixture model of the normalized data. The only input to the algorithm is the data, a list of possible transformations, and the number of clusters to find. The main criticism of this work was that the exhaustive computation of the posterior probabilities over transformations would make scaling up to large feature vectors and large sets of transformations intractable. Here, we describe how a tremendous speedup is acheived through the use of a variational technique for decoupling transformations, and a fast Fourier transform method for computing posterior probabilities.