Vucetic, Slobodan
X-Shot: A Unified System to Handle Frequent, Few-shot and Zero-shot Learning Simultaneously in Classification
Xu, Hanzi, Chen, Muhao, Huang, Lifu, Vucetic, Slobodan, Yin, Wenpeng
In recent years, few-shot and zero-shot learning, which learn to predict labels with limited annotated instances, have garnered significant attention. Traditional approaches often treat frequent-shot (freq-shot; labels with abundant instances), few-shot, and zero-shot learning as distinct challenges, optimizing systems for just one of these scenarios. Yet, in real-world settings, label occurrences vary greatly. Some of them might appear thousands of times, while others might only appear sporadically or not at all. For practical deployment, it is crucial that a system can adapt to any label occurrence. We introduce a novel classification challenge: X-shot, reflecting a real-world context where freq-shot, few-shot, and zero-shot labels co-occur without predefined limits. Here, X can span from 0 to positive infinity. The crux of X-shot centers on open-domain generalization and devising a system versatile enough to manage various label scenarios. To solve X-shot, we propose BinBin (Binary INference Based on INstruction following) that leverages the Indirect Supervision from a large collection of NLP tasks via instruction following, bolstered by Weak Supervision provided by large language models. BinBin surpasses previous state-of-the-art techniques on three benchmark datasets across multiple domains. To our knowledge, this is the first work addressing X-shot learning, where X remains variable.
OpenStance: Real-world Zero-shot Stance Detection
Xu, Hanzi, Vucetic, Slobodan, Yin, Wenpeng
Prior studies of zero-shot stance detection identify the attitude of texts towards unseen topics occurring in the same document corpus. Such task formulation has three limitations: (i) Single domain/dataset. A system is optimized on a particular dataset from a single domain; therefore, the resulting system cannot work well on other datasets; (ii) the model is evaluated on a limited number of unseen topics; (iii) it is assumed that part of the topics has rich annotations, which might be impossible in real-world applications. These drawbacks will lead to an impractical stance detection system that fails to generalize to open domains and open-form topics. This work defines OpenStance: open-domain zero-shot stance detection, aiming to handle stance detection in an open world with neither domain constraints nor topic-specific annotations. The key challenge of OpenStance lies in the open-domain generalization: learning a system with fully unspecific supervision but capable of generalizing to any dataset. To solve OpenStance, we propose to combine indirect supervision, from textual entailment datasets, and weak supervision, from data generated automatically by pre-trained Language Models. Our single system, without any topic-specific supervision, outperforms the supervised method on three popular datasets. To our knowledge, this is the first work that studies stance detection under the open-domain zero-shot setting. All data and code are publicly released.
Multi-Modal Trajectory Prediction of NBA Players
Hauri, Sandro, Djuric, Nemanja, Radosavljevic, Vladan, Vucetic, Slobodan
National Basketball Association (NBA) players are highly motivated and skilled experts that solve complex decision making problems at every time point during a game. As a step towards understanding how players make their decisions, we focus on their movement trajectories during games. We propose a method that captures the multi-modal behavior of players, where they might consider multiple trajectories and select the most advantageous one. The method is built on an LSTM-based architecture predicting multiple trajectories and their probabilities, trained by a multi-modal loss function that updates the best trajectories. Experiments on large, fine-grained NBA tracking data show that the proposed method outperforms the state-of-the-art. In addition, the results indicate that the approach generates more realistic trajectories and that it can learn individual playing styles of specific players.
Non-linear Label Ranking for Large-scale Prediction of Long-Term User Interests
Djuric, Nemanja, Grbovic, Mihajlo, Radosavljevic, Vladan, Bhamidipati, Narayan, Vucetic, Slobodan
We consider the problem of personalization of online services from the viewpoint of ad targeting, where we seek to find the best ad categories to be shown to each user, resulting in improved user experience and increased advertisers' revenue. We propose to address this problem as a task of ranking the ad categories depending on a user's preference, and introduce a novel label ranking approach capable of efficiently learning non-linear, highly accurate models in large-scale settings. Experiments on a real-world advertising data set with more than 3.2 million users show that the proposed algorithm outperforms the existing solutions in terms of both rank loss and top-K retrieval performance, strongly suggesting the benefit of using the proposed model on large-scale ranking problems.
Non-Linear Label Ranking for Large-Scale Prediction of Long-Term User Interests
Djuric, Nemanja (Yahoo! Labs) | Grbovic, Mihajlo (Yahoo! Labs) | Radosavljevic, Vladan (Yahoo! Labs) | Bhamidipati, Narayan (Yahoo! Labs) | Vucetic, Slobodan (Temple University)
We consider the problem of personalization of online services from the viewpoint of ad targeting, where we seek to find the best ad categories to be shown to each user, resulting in improved user experience and increased advertiser's revenue. We propose to address this problem as a task of ranking the ad categories depending on a user's preference, and introduce a novel label ranking approach capable of efficiently learning non-linear, highly accurate models in large-scale settings. Experiments on real-world advertising data set with more than 3.2 million users show that the proposed algorithm outperforms the existing solutions in terms of both rank loss and top-K retrieval performance, strongly suggesting the benefit of using the proposed model on large-scale ranking problems.
Spatial Scan for Disease Mapping on a Mobile Population
Lan, Liang (Temple University) | Malbasa, Vuk (University of Novi Sad) | Vucetic, Slobodan (Temple University)
In disease mapping, the spatial scan statistic is used to detect spatial regions where population is exposed to a significantly higher disease risk than expected. In this important application, the current residence is typically used to define the location of individuals from the population. Considering the mobility of humans at various temporal and spatial scales, using only information about the current residence may be an insufficiently informative proxy because it ignores a multitude of exposures that may occur away from home, or which had occurred at previous residences. In this paper, we propose a spatial scan statistic that is appropriate for disease mapping on mobile populations. We formulate a computationally efficient algorithm that uses the proposed statistic to find significant high-risk regions from mobile population's disease status data. The algorithm is applicable on large populations and over dense spatial grids. The experimental results demonstrate that the proposed algorithm is computationally efficient and outperforms the traditional disease clustering approaches at discovering high-risk regions in mobile populations.
Continuous Conditional Random Fields for Efficient Regression in Large Fully Connected Graphs
Ristovski, Kosta (Temple University) | Radosavljevic, Vladan (Temple University) | Vucetic, Slobodan (Temple University) | Obradovic, Zoran (Temple University)
When used for structured regression, powerful Conditional Random Fields (CRFs) are typically restricted to modeling effects of interactions among examples in local neighborhoods. Using more expressive representation would result in dense graphs, making these methods impractical for large-scale applications. To address this issue, we propose an effective CRF model with linear scale-up properties regarding approximate learning and inference for structured regression on large, fully connected graphs. The proposed method is validated on real-world large-scale problems of image de-noising and remote sensing. In conducted experiments, we demonstrated that dense connectivity provides an improvement in prediction accuracy. Inference time of less than ten seconds on graphs with millions of nodes and trillions of edges makes the proposed model an attractive tool for large-scale, structured regression problems.
Sparse Principal Component Analysis with Constraints
Grbovic, Mihajlo (Temple University) | Dance, Christopher Roger (Xerox Research Centre Europe) | Vucetic, Slobodan (Temple University)
The sparse principal component analysis is a variant of the classical principal component analysis, which finds linear combinations of a small number of features that maximize variance across data. In this paper we propose a methodology for adding two general types of feature grouping constraints into the original sparse PCA optimization procedure.We derive convex relaxations of the considered constraints, ensuring the convexity of the resulting optimization problem. Empirical evaluation on three real-world problems, one in process monitoring sensor networks and two in social networks, serves to illustrate the usefulness of the proposed methodology.
Convex Kernelized Sorting
Djuric, Nemanja (Temple University) | Grbovic, Mihajlo (Temple University) | Vucetic, Slobodan (Temple University)
Kernelized sorting is a method for aligning objects across two domains by considering within-domain similarity, without a need to specify a cross-domain similarity measure. In this paper we present the Convex Kernelized Sorting method where, unlike in the previous approaches, the cross-domain object matching is formulated as a convex optimization problem, leading to simpler optimization and global optimum solution. Our method outputs soft alignments between objects, which can be used to rank the best matches for each object, or to visualize the object matching and verify the correct choice of the kernel. It also allows for computing hard one-to-one alignments by solving the resulting Linear Assignment Problem. Experiments on a number of cross-domain matching tasks show the strength of the proposed method, which consistently achieves higher accuracy than the existing methods.