Goto

Collaborating Authors

 Information Retrieval


Query Complexity of Clustering with Side Information

Neural Information Processing Systems

Suppose, we are given a set of $n$ elements to be clustered into $k$ (unknown) clusters, and an oracle/expert labeler that can interactively answer pair-wise queries of the form, do two elements $u$ and $v$ belong to the same cluster?''. The goal is to recover the optimum clustering by asking the minimum number of queries. In this paper, we provide a rigorous theoretical study of this basic problem of query complexity of interactive clustering, and give strong information theoretic lower bounds, as well as nearly matching upper bounds. Most clustering problems come with a similarity matrix, which is used by an automated process to cluster similar points together. To improve accuracy of clustering, a fruitful approach in recent years has been to ask a domain expert or crowd to obtain labeled data interactively.


Meta-Learning MCMC Proposals

Neural Information Processing Systems

Effective implementations of sampling-based probabilistic inference often require manually constructed, model-specific proposals. Inspired by recent progresses in meta-learning for training learning agents that can generalize to unseen environments, we propose a meta-learning approach to building effective and generalizable MCMC proposals. We parametrize the proposal as a neural network to provide fast approximations to block Gibbs conditionals. The learned neural proposals generalize to occurrences of common structural motifs across different models, allowing for the construction of a library of learned inference primitives that can accelerate inference on unseen models with no model-specific training required. We explore several applications including open-universe Gaussian mixture models, in which our learned proposals outperform a hand-tuned sampler, and a real-world named entity recognition task, in which our sampler yields higher final F1 scores than classical single-site Gibbs sampling.


Query Complexity of Bayesian Private Learning

Neural Information Processing Systems

We study the query complexity of Bayesian Private Learning: a learner wishes to locate a random target within an interval by submitting queries, in the presence of an adversary who observes all of her queries but not the responses. How many queries are necessary and sufficient in order for the learner to accurately estimate the target, while simultaneously concealing the target from the adversary? Our main result is a query complexity lower bound that is tight up to the first order. We show that if the learner wants to estimate the target within an error of $\epsilon$, while ensuring that no adversary estimator can achieve a constant additive error with probability greater than $1/L$, then the query complexity is on the order of $L\log(1/\epsilon)$ as $\epsilon \to 0$. Our result demonstrates that increased privacy, as captured by $L$, comes at the expense of a \emph{multiplicative} increase in query complexity.


Few-Shot Learning Through an Information Retrieval Lens

Neural Information Processing Systems

Few-shot learning refers to understanding new concepts from only a few examples. We propose an information retrieval-inspired approach for this problem that is motivated by the increased importance of maximally leveraging all the available information in this low-data regime. We define a training objective that aims to extract as much information as possible from each training batch by effectively optimizing over all relative orderings of the batch points simultaneously. In particular, we view each batch point as a query' that ranks the remaining ones based on its predicted relevance to them and we define a model within the framework of structured prediction to optimize mean Average Precision over these rankings. Our method achieves impressive results on the standard few-shot classification benchmarks while is also capable of few-shot retrieval.


Efficient Optimization for Average Precision SVM

Neural Information Processing Systems

The accuracy of information retrieval systems is often measured using average precision (AP). Given a set of positive (relevant) and negative (non-relevant) samples, the parameters of a retrieval system can be estimated using the AP-SVM framework, which minimizes a regularized convex upper bound on the empirical AP loss. However, the high computational complexity of loss-augmented inference, which is required for learning an AP-SVM, prohibits its use with large training datasets. To alleviate this deficiency, we propose three complementary approaches. The second approach takes advantage of the fact that we do not require a full ranking during loss-augmented inference.


Flexible Models for Microclustering with Application to Entity Resolution

Neural Information Processing Systems

Most generative models for clustering implicitly assume that the number of data points in each cluster grows linearly with the total number of data points. Finite mixture models, Dirichlet process mixture models, and Pitman--Yor process mixture models make this assumption, as do all other infinitely exchangeable clustering models. However, for some applications, this assumption is inappropriate. For example, when performing entity resolution, the size of each cluster should be unrelated to the size of the data set, and each cluster should contain a negligible fraction of the total number of data points. These applications require models that yield clusters whose sizes grow sublinearly with the size of the data set.


An ensemble diversity approach to supervised binary hashing

Neural Information Processing Systems

Binary hashing is a well-known approach for fast approximate nearest-neighbor search in information retrieval. Much work has focused on affinity-based objective functions involving the hash functions or binary codes. These objective functions encode neighborhood information between data points and are often inspired by manifold learning algorithms. They ensure that the hash functions differ from each other through constraints or penalty terms that encourage codes to be orthogonal or dissimilar across bits, but this couples the binary variables and complicates the already difficult optimization. We propose a much simpler approach: we train each hash function (or bit) independently from each other, but introduce diversity among them using techniques from classifier ensembles.


Eliminating Search Intent Bias in Learning to Rank

arXiv.org Machine Learning

Click-through data has proven to be a valuable resource for improving search-ranking quality. Search engines can easily collect click data, but biases introduced in the data can make it difficult to use the data effectively. In order to measure the effects of biases, many click models have been proposed in the literature. However, none of the models can explain the observation that users with different search intent (e.g., informational, navigational, etc.) have different click behaviors. In this paper, we study how differences in user search intent can influence click activities and determined that there exists a bias between user search intent and the relevance of the document relevance. Based on this observation, we propose a search intent bias hypothesis that can be applied to most existing click models to improve their ability to learn unbiased relevance. Experimental results demonstrate that after adopting the search intent hypothesis, click models can better interpret user clicks and substantially improve retrieval performance.


Online Preselection with Context Information under the Plackett-Luce Model

arXiv.org Machine Learning

In machine learning, the notion of multi-armed bandits (MAB) refers to a class of online learning problems, in which a learner is supposed to simultaneously explore and exploit a given set of choice alternatives (metaphorically referred to as "arms") in the course of a sequential decision process (Lattimore and Szepesvári, 2019). In this paper, we consider an extension of the basic setting, which is practically motivated by the problem of preselection as recently introduced by Saha and Gopalan (2018b) and Bengs and Hüllermeier (2019): Instead of selecting a single arm, the learner is only supposed to preselect a promising subset of arms. The final choice is then made by a selector, for example a human user or another algorithm. In information retrieval, for instance, the role of the learner is played by a search engine, and the selector is the user who seeks a certain information. Another application, which served as a concrete motivation of our setting and will also be used in our experimental study, is the problem of algorithm (pre-)selection (Kerschke et al., 2018).


Pre-training Tasks for Embedding-based Large-scale Retrieval

arXiv.org Machine Learning

We consider the large-scale query-document retrieval problem: given a query (e.g., a question), return the set of relevant documents (e.g., paragraphs containing the answer) from a large document corpus. This problem is often solved in two steps. The retrieval phase first reduces the solution space, returning a subset of candidate documents. The scoring phase then re-ranks the documents. Critically, the retrieval algorithm not only desires high recall but also requires to be highly efficient, returning candidates in time sublinear to the number of documents. Unlike the scoring phase witnessing significant advances recently due to the BERT-style pre-training tasks on cross-attention models, the retrieval phase remains less well studied. Most previous works rely on classic Information Retrieval (IR) methods such as BM-25 (token matching + TF-IDF weights). These models only accept sparse handcrafted features and can not be optimized for different downstream tasks of interest. In this paper, we conduct a comprehensive study on the embedding-based retrieval models. We show that the key ingredient of learning a strong embedding-based Transformer model is the set of pre-training tasks. With adequately designed paragraph-level pre-training tasks, the Transformer models can remarkably improve over the widely-used BM-25 as well as embedding models without Transformers. The paragraph-level pre-training tasks we studied are Inverse Cloze Task (ICT), Body First Selection (BFS), Wiki Link Prediction (WLP), and the combination of all three.