Goto

Collaborating Authors

 Tomkins, Andrew


Substance or Style: What Does Your Image Embedding Know?

arXiv.org Artificial Intelligence

Probes are small networks that predict properties of underlying data from embeddings, and they provide a targeted, effective way to illuminate the information contained in embeddings. While analysis through the use of probes has become standard in NLP, there has been much less exploration in vision. Image foundation models have primarily been evaluated for semantic content. Better understanding the non-semantic information in popular embeddings (e.g., MAE, SimCLR, or CLIP) will shed new light both on the training algorithms and on the uses for these foundation models. We design a systematic transformation prediction task and measure the visual content of embeddings along many axes, including image style, quality, and a range of natural and artificial transformations. Surprisingly, six embeddings (including SimCLR) encode enough non-semantic information to identify dozens of transformations. We also consider a generalization task, where we group similar transformations and hold out several for testing. We find that image-text models (CLIP and ALIGN) are better at recognizing new examples of style transfer than masking-based models (CAN and MAE). Overall, our results suggest that the choice of pre-training algorithm impacts the types of information in the embedding, and certain models are better than others for non-semantic downstream tasks.


Approximating a RUM from Distributions on k-Slates

arXiv.org Artificial Intelligence

In this work we consider the problem of fitting Random Utility Models (RUMs) to user choices. Given the winner distributions of the subsets of size $k$ of a universe, we obtain a polynomial-time algorithm that finds the RUM that best approximates the given distribution on average. Our algorithm is based on a linear program that we solve using the ellipsoid method. Given that its corresponding separation oracle problem is NP-hard, we devise an approximate separation oracle that can be viewed as a generalization of the weighted feedback arc set problem to hypergraphs. Our theoretical result can also be made practical: we obtain a heuristic that is effective and scales to real-world datasets.


Graph Autoencoders with Deconvolutional Networks

arXiv.org Artificial Intelligence

Recent studies have indicated that Graph Convolutional Networks (GCNs) act as a low pass filter in spectral domain and encode smoothed node representations. In this paper, we consider their opposite, namely Graph Deconvolutional Networks (GDNs) that reconstruct graph signals from smoothed node representations. We motivate the design of Graph Deconvolutional Networks via a combination of inverse filters in spectral domain and de-noising layers in wavelet domain, as the inverse operation results in a high pass filter and may amplify the noise. Based on the proposed GDN, we further propose a graph autoencoder framework that first encodes smoothed graph representations with GCN and then decodes accurate graph signals with GDN. We demonstrate the effectiveness of the proposed method on several tasks including unsupervised graph-level representation, social recommendation and graph generation. Autoencoders have demonstrated excellent performance on tasks such as unsupervised representation learning (Bengio, 2009) and de-noising (Vincent et al., 2010). Recently, several studies (Zeiler & Fergus, 2014; Long et al., 2015) have demonstrated that the performance of autoencoders can be further improved by encoding with Convolutional Networks and decoding with Deconvolutional Networks (Zeiler et al., 2010). Notably, Noh et al. (2015) present a novel symmetric architecture that provides a bottom-up mapping from input signals to latent hierarchical feature space with {convolution, pooling} operations and then maps the latent representation back to the input space with {deconvolution, unpooling} operations. While this architecture has been successful when processing features with structures existed in the Euclidean space (e.g., images), recently there has been a surging interest in applying such a framework on non-Euclidean data like graphs.


Generative Models are Unsupervised Predictors of Page Quality: A Colossal-Scale Study

arXiv.org Machine Learning

Large generative language models such as GPT-2 are well-known for their ability to generate text as well as their utility in supervised downstream tasks via fine-tuning. Our work is twofold: firstly we demonstrate via human evaluation that classifiers trained to discriminate between human and machine-generated text emerge as unsupervised predictors of "page quality", able to detect low quality content without any training. This enables fast bootstrapping of quality indicators in a low-resource setting. Secondly, curious to understand the prevalence and nature of low quality pages in the wild, we conduct extensive qualitative and quantitative analysis over 500 million web articles, making this the largest-scale study ever conducted on the topic.


BusTr: Predicting Bus Travel Times from Real-Time Traffic

arXiv.org Machine Learning

Of these two modalities, the world's public transit systems where no official real-time real-time state is disproportionately important for the bus tracking is provided. We demonstrate that our neural routine trips that dominate most people's transportation sequence model improves over DeepTTE, the state-ofthe-art needs. Most transit users know by heart the routes connecting baseline, both in performance ( 30% MAPE) and their home, work, and other frequent destinations, training stability. We also demonstrate significant generalization but they have a well-established need for information gains over simpler models, evaluated on longitudinal about real-time changes. Transit variability is a data to cope with a constantly evolving world.


Preventing Adversarial Use of Datasets through Fair Core-Set Construction

arXiv.org Artificial Intelligence

We propose improving the privacy properties of a dataset by publishing only a strategically chosen "core-set" of the data containing a subset of the instances. The core-set allows strong performance on primary tasks, but forces poor performance on unwanted tasks. We give methods for both linear models and neural networks and demonstrate their efficacy on data.


Graph-RISE: Graph-Regularized Image Semantic Embedding

arXiv.org Machine Learning

Learning image representations to capture fine-grained semantics has been a challenging and important task enabling many applications such as image search and clustering. In this paper, we present Graph-Regularized Image Semantic Embedding (Graph-RISE), a large-scale neural graph learning framework that allows us to train embeddings to discriminate an unprecedented O(40M) ultra-fine-grained semantic labels. Graph-RISE outperforms state-of-the-art image embedding algorithms on several evaluation tasks, including image classification and triplet ranking. We provide case studies to demonstrate that, qualitatively, image retrieval based on Graph-RISE effectively captures semantics and, compared to the state-of-the-art, differentiates nuances at levels that are closer to human-perception.


Linear Additive Markov Processes

arXiv.org Machine Learning

We introduce LAMP: the Linear Additive Markov Process. Transitions in LAMP may be influenced by states visited in the distant history of the process, but unlike higher-order Markov processes, LAMP retains an efficient parametrization. LAMP also allows the specific dependence on history to be learned efficiently from data. We characterize some theoretical properties of LAMP, including its steady-state and mixing time. We then give an algorithm based on alternating minimization to learn LAMP models from data. Finally, we perform a series of real-world experiments to show that LAMP is more powerful than first-order Markov processes, and even holds its own against deep sequential models (LSTMs) with a negligible increase in parameter complexity.