Pal, Aditya
CLIP-UP: A Simple and Efficient Mixture-of-Experts CLIP Training Recipe with Sparse Upcycling
Wang, Xinze, Chen, Chen, Yang, Yinfei, Chen, Hong-You, Zhang, Bowen, Pal, Aditya, Zhu, Xiangxin, Du, Xianzhi
Mixture-of-Experts (MoE) models are crucial for scaling model capacity while controlling inference costs. While integrating MoE into multimodal models like CLIP improves performance, training these models is notoriously challenging and expensive. We propose CLIP-Upcycling (CLIP-UP), an efficient alternative training strategy that converts a pre-trained dense CLIP model into a sparse MoE architecture. Through extensive experimentation with various settings and auxiliary losses, we demonstrate that CLIP-UP significantly reduces training complexity and cost. Remarkably, our sparse CLIP B/16 model, trained with CLIP-UP, outperforms its dense counterpart by 7.2% and 6.6% on COCO and Flickr30k text-to-image Recall@1 benchmarks respectively. It even surpasses the larger CLIP L/14 model on this task while using only 30% of the inference FLOPs. We further demonstrate the generalizability of our training recipe across different scales, establishing sparse upcycling as a practical and scalable approach for building efficient, high-performance CLIP models.
Vartani Spellcheck -- Automatic Context-Sensitive Spelling Correction of OCR-generated Hindi Text Using BERT and Levenshtein Distance
Pal, Aditya, Mustafi, Abhijit
Traditional Optical Character Recognition (OCR) systems that generate text of highly inflectional Indic languages like Hindi tend to suffer from poor accuracy due to a wide alphabet set, compound characters and difficulty in segmenting characters in a word. Automatic spelling error detection and context-sensitive error correction can be used to improve accuracy by post-processing the text generated by these OCR systems. A majority of previously developed language models for error correction of Hindi spelling have been context-free. In this paper, we present Vartani Spellcheck - a context-sensitive approach for spelling correction of Hindi text using a state-of-the-art transformer - BERT in conjunction with the Levenshtein distance algorithm, popularly known as Edit Distance. We use a lookup dictionary and context-based named entity recognition (NER) for detection of possible spelling errors in the text. Our proposed technique has been tested on a large corpus of text generated by the widely used Tesseract OCR on the Hindi epic Ramayana. With an accuracy of 81%, the results show a significant improvement over some of the previously established context-sensitive error correction mechanisms for Hindi. We also explain how Vartani Spellcheck may be used for on-the-fly autocorrect suggestion during continuous typing in a text editor environment.
PinnerSage: Multi-Modal User Embedding Framework for Recommendations at Pinterest
Pal, Aditya, Eksombatchai, Chantat, Zhou, Yitong, Zhao, Bo, Rosenberg, Charles, Leskovec, Jure
Latent user representations are widely adopted in the tech industry for powering personalized recommender systems. Most prior work infers a single high dimensional embedding to represent a user, which is a good starting point but falls short in delivering a full understanding of the user's interests. In this work, we introduce PinnerSage, an end-to-end recommender system that represents each user via multi-modal embeddings and leverages this rich representation of users to provides high quality personalized recommendations. PinnerSage achieves this by clustering users' actions into conceptually coherent clusters with the help of a hierarchical clustering method (Ward) and summarizes the clusters via representative pins (Medoids) for efficiency and interpretability. PinnerSage is deployed in production at Pinterest and we outline the several design decisions that makes it run seamlessly at a very large scale. We conduct several offline and online A/B experiments to show that our method significantly outperforms single embedding methods.
Detecting Emotions in Social Media: A Constrained Optimization Approach
Wang, Yichen (Georgia Institute of Technology) | Pal, Aditya (IBM Research)
Emotion detection can considerably enhance our understanding of users' emotional states. Understanding users' emotions especially in a real-time setting can be pivotal in improving user interactions and understanding their preferences. In this paper, we propose a constraint optimization framework to discover emotions from social media content of the users. Our framework employs several novel constraints such as emotion bindings, topic correlations, along with specialized features proposed by prior work and well-established emotion lexicons. We propose an efficient inference algorithm and report promising empirical results on three diverse datasets.
Discovering Hierarchical Structure for Sources and Entities
Pal, Aditya (IBM Research) | Dalvi, Nilesh (Facebook) | Bellare, Kedar (Facebook)
In this paper, we consider the problem of jointly learning hierarchies over a set of sources and entities based on their containment relationship. We model the concept of hierarchy using a set of latent binary features and propose a generative model that assigns those latent features to sources and entities in order to maximize the probability of the observed containment. To avoid fixing the number of features beforehand, we consider a non-parametric approach based on the Indian Buffet Process. The hierarchies produced by our algorithm can be used for completing missing associations and discovering structural bindings in the data. Using simulated and real datasets we provide empirical evidence of the effectiveness of the proposed approach in comparison to the existing hierarchy agnostic approaches.
Evolution of Experts in Question Answering Communities
Pal, Aditya (University of Minnesota) | Chang, Shuo (University of Minnesota) | Konstan, Joseph A. (University of Minnesota)
Community Question Answering (CQA) services thrive as a result of a small number of highly active users, typically called experts, who provide a large number of high quality useful answers. Understanding the temporal dynamics and interactions between experts can present key insights into how community members evolve over time. In this paper, we present a temporal study of experts in CQA and analyze the changes in their behavioral patterns over time. Further, using unsupervised machine learning methods, we show the interesting evolution patterns that can help us distinguish experts from one another. Using supervised classification methods, we show that the models based on evolutionary data of users can be more effective at expert identification than the models that ignore evolution. We run our experiments on two large online CQA to show the generality of our proposed approach.
Connecting Mutually Influencing Bloggers
Pal, Aditya (University of Minnesota) | Kawale, Jaya (University of Minnesota)
The blogosphere shows the characteristics of a power law distribution where a small set of the bloggers (influentials) get the majority of readership and the vast majority receives little traffic. Blogger recommendation algorithms aim at finding influentials for recommendation, putting bloggers with limited readership at further disadvantage. These bloggers could benefit from mutual endorsement of each other with the eventual goal of forming strong local communities with broader readership. In this paper, we propose a recommendation algorithm to connect blogger pairs with the intent that once connected the bloggers would share a mutually influencing relationship between them. In particular, we compute bloggers' influence profile based on how much she influences her blog friends and recommend bloggers with similar influence profiles. We characterize bloggers into four different groups: global leaders, connectors, local leaders, isolates. Our result shows marginal benefit for isolates and significant benefit for local leaders. Our approach can be instructive in building intelligent recommendation engine for bloggers with limited readership to build strong local communities.
What's in a @name? How Name Value Biases Judgment of Microblog Authors
Pal, Aditya (University of Minnesota) | Counts, Scott (Microsoft Research)
Bias can be defined as selective favoritism exhibited by human beings when posed with a task of decision making across multiple options. Online communities present plenty of decision making opportunities to their users. Users exhibit biases in their attachments, voting and ratings and other tasks of decision making. We study bias amongst microblog users due to the value of an author's name. We describe the relationship between name value bias and number of followers, and cluster authors and readers based on patterns of bias they receive and exhibit, respectively. For authors we show that content from known names (e.g., @CNN) is rated artificially high, while content from unknown names is rated artificially low. For readers, our results indicate that there are two types: slightly biased, heavily biased. A subsequent analysis of Twitter author names revealed attributes of names that underlie this bias, including effects for gender, type of name (individual versus organization), and degree of topical relevance. We discuss how our work can be instructive to content distributors and search engines in leveraging and presenting microblog content.