Goto

Collaborating Authors

 gist




GIST: Cross-Domain Click-Through Rate Prediction via Guided Content-Behavior Distillation

Xu, Wei, Li, Haoran, Ou, Baoyuan, Xu, Lai, Qin, Yingjie, Su, Ruilong, Xu, Ruiwen

arXiv.org Artificial Intelligence

Cross-domain Click-Through Rate prediction aims to tackle the data sparsity and the cold start problems in online advertising systems by transferring knowledge from source domains to a target domain. Most existing methods rely on overlapping users to facilitate this transfer, often focusing on joint training or pre-training with fine-tuning approach to connect the source and target domains. However, in real-world industrial settings, joint training struggles to learn optimal representations with different distributions, and pre-training with fine-tuning is not well-suited for continuously integrating new data. To address these issues, we propose GIST, a cross-domain lifelong sequence model that decouples the training processes of the source and target domains. Unlike previous methods that search lifelong sequences in the source domains using only content or behavior signals or their simple combinations, we innovatively introduce a Content-Behavior Joint Training Module (CBJT), which aligns content-behavior distributions and combines them with guided information to facilitate a more stable representation. Furthermore, we develop an Asymmetric Similarity Integration strategy (ASI) to augment knowledge transfer through similarity computation. Extensive experiments demonstrate the effectiveness of GIST, surpassing SOTA methods on offline evaluations and an online A/B test. Deployed on the Xiaohongshu (RedNote) platform, GIST effectively enhances online ads system performance at scale, serving hundreds of millions of daily active users.


You can (probably) sing better than you think

Popular Science

The ability to identify or produce any musical note from memory without reference, aka true perfect pitch, is a rare gift. In fact, less than one in 10,000 people have it--but you don't need the ability to spontaneously recall a melody with decent accuracy. If anything, you may not be as tone deaf as you think. Past research in lab settings shows people tasked with remembering and singing a well-known song can do so at least 15-percent of the time, more than can be chalked up to chance. Even so, psychologists' understanding of this recall process remains incomplete.


GIST: Greedy Independent Set Thresholding for Diverse Data Summarization

Fahrbach, Matthew, Ramalingam, Srikumar, Zadimoghaddam, Morteza, Ahmadian, Sara, Citovsky, Gui, DeSalvo, Giulia

arXiv.org Artificial Intelligence

Subset selection is a challenging optimization problem with a wide variety of applications in machine learning, including feature selection, recommender systems, news aggregation, drug discovery, data summarization, and designing pretraining sets for large language models (Anil et al., 2023). Data sampling in particular is a salient problem due to unprecedented and continuous data collection. For example, LiDAR and imaging devices in one self-driving vehicle can easily capture ~80 terabytes of data per day (Kazhamiaka et al., 2021). In most subset selection tasks, we rely on the weight (or utility) of the objects to rank one over the other, and also to avoid selecting duplicate or near-duplicate objects. If we select a small subset, then we also want to ensure that the selected subset is a good representation of the original set. These utility, diversity, and coverage criteria can be expressed through objective functions, and the interesting research lies in developing efficient algorithms with strong approximation guarantees. The underlying machinery used in constrained subset selection algorithms shares many similarities with techniques from other areas of combinatorial optimization such as submodular maximization, -center clustering, and convex hull approximations. In this work, we study the problem of selecting a set of points in a metric space that maximizes an objective that combines their utility and a minimum pairwise-distance diversity measure.


Break down YouTube videos to the gist with this tool -- now 145 off

PCWorld

YouTube can be a great resource to stay up to date on industry developments or learn new skills. But it can also be incredibly time-consuming to watch videos all day. That's where a TubeOnAi Premium Lite Plan can help. This clever tool downloads transcripts of videos from the channel you follow and summarizes the content by leveraging GPT-4 AI. Right now, you can get a lifetime subscription to this time-saving tool for just 78.99.


Leveraging Prompt-Based Large Language Models: Predicting Pandemic Health Decisions and Outcomes Through Social Media Language

Ding, Xiaohan, Carik, Buse, Gunturi, Uma Sushmitha, Reyna, Valerie, Rho, Eugenia H.

arXiv.org Artificial Intelligence

We introduce a multi-step reasoning framework using prompt-based LLMs to examine the relationship between social media language patterns and trends in national health outcomes. Grounded in fuzzy-trace theory, which emphasizes the importance of gists of causal coherence in effective health communication, we introduce Role-Based Incremental Coaching (RBIC), a prompt-based LLM framework, to identify gists at-scale. Using RBIC, we systematically extract gists from subreddit discussions opposing COVID-19 health measures (Study 1). We then track how these gists evolve across key events (Study 2) and assess their influence on online engagement (Study 3). Finally, we investigate how the volume of gists is associated with national health trends like vaccine uptake and hospitalizations (Study 4). Our work is the first to empirically link social media linguistic patterns to real-world public health trends, highlighting the potential of prompt-based LLMs in identifying critical online discussion patterns that can form the basis of public health communication strategies.


GIST: Generated Inputs Sets Transferability in Deep Learning

Tambon, Florian, Khomh, Foutse, Antoniol, Giuliano

arXiv.org Artificial Intelligence

As the demand for verifiability and testability of neural networks continues to rise, an increasing number of methods for generating test sets are being developed. However, each of these techniques tends to emphasize specific testing aspects and can be quite time-consuming. A straightforward solution to mitigate this issue is to transfer test sets between some benchmarked models and a new model under test, based on a desirable property one wishes to transfer. This paper introduces GIST (Generated Inputs Sets Transferability), a novel approach for the efficient transfer of test sets among Deep Learning models. Given a property of interest that a user wishes to transfer (e.g., coverage criterion), GIST enables the selection of good test sets from the point of view of this property among available ones from a benchmark. We empirically evaluate GIST on fault types coverage property with two modalities and different test set generation procedures to demonstrate the approach's feasibility. Experimental results show that GIST can select an effective test set for the given property to transfer it to the model under test. Our results suggest that GIST could be applied to transfer other properties and could generalize to different test sets' generation procedures and modalities


Get the gist? Using large language models for few-shot decontextualization

Kane, Benjamin, Schubert, Lenhart

arXiv.org Artificial Intelligence

In many NLP applications that involve interpreting sentences within a rich context -- for instance, information retrieval systems or dialogue systems -- it is desirable to be able to preserve the sentence in a form that can be readily understood without context, for later reuse -- a process known as ``decontextualization''. While previous work demonstrated that generative Seq2Seq models could effectively perform decontextualization after being fine-tuned on a specific dataset, this approach requires expensive human annotations and may not transfer to other domains. We propose a few-shot method of decontextualization using a large language model, and present preliminary results showing that this method achieves viable performance on multiple domains using only a small set of examples.


GIST: Generating Image-Specific Text for Fine-grained Object Classification

Lewis, Kathleen M., Mu, Emily, Dalca, Adrian V., Guttag, John

arXiv.org Artificial Intelligence

Recent vision-language models outperform vision-only models on many image classification tasks. However, because of the absence of paired text/image descriptions, it remains difficult to fine-tune these models for fine-grained image classification. In this work, we propose a method, GIST, for generating image-specific fine-grained text descriptions from image-only datasets, and show that these text descriptions can be used to improve classification. Key parts of our method include 1. prompting a pretrained large language model with domain-specific prompts to generate diverse fine-grained text descriptions for each class and 2. using a pretrained vision-language model to match each image to label-preserving text descriptions that capture relevant visual features in the image. We demonstrate the utility of GIST by fine-tuning vision-language models on the image-and-generated-text pairs to learn an aligned vision-language representation space for improved classification. We evaluate our learned representation space in full-shot and few-shot scenarios across four diverse fine-grained classification datasets, each from a different domain. Our method achieves an average improvement of $4.1\%$ in accuracy over CLIP linear probes and an average of $1.1\%$ improvement in accuracy over the previous state-of-the-art image-text classification method on the full-shot datasets. Our method achieves similar improvements across few-shot regimes. Code is available at https://github.com/emu1729/GIST.