Goto

Collaborating Authors

 Liu, Linfeng


Preference Discerning with LLM-Enhanced Generative Retrieval

arXiv.org Machine Learning

Sequential recommendation systems aim to provide personalized recommendations for users based on their interaction history. To achieve this, they often incorporate auxiliary information, such as textual descriptions of items and auxiliary tasks, like predicting user preferences and intent. Despite numerous efforts to enhance these models, they still suffer from limited personalization. To address this issue, we propose a new paradigm, which we term preference discerning. In preference dscerning, we explicitly condition a generative sequential recommendation system on user preferences within its context. To this end, we generate user preferences using Large Language Models (LLMs) based on user reviews and item-specific data. To evaluate preference discerning capabilities of sequential recommendation systems, we introduce a novel benchmark that provides a holistic evaluation across various scenarios, including preference steering and sentiment following. We assess current state-of-the-art methods using our benchmark and show that they struggle to accurately discern user preferences. Therefore, we propose a new method named Mender ($\textbf{M}$ultimodal Prefer$\textbf{en}$ce $\textbf{d}$iscern$\textbf{er}$), which improves upon existing methods and achieves state-of-the-art performance on our benchmark. Our results show that Mender can be effectively guided by human preferences even though they have not been observed during training, paving the way toward more personalized sequential recommendation systems. We will open-source the code and benchmarks upon publication.


Empower Nested Boolean Logic via Self-Supervised Curriculum Learning

arXiv.org Artificial Intelligence

Beyond the great cognitive powers showcased by language models, it is crucial to scrutinize whether their reasoning capabilities stem from strong generalization or merely exposure to relevant data. As opposed to constructing increasingly complex logic, this paper probes into the boolean logic, the root capability of a logical reasoner. We find that any pre-trained language models even including large language models only behave like a random selector in the face of multi-nested boolean logic, a task that humans can handle with ease. To empower language models with this fundamental capability, this paper proposes a new self-supervised learning method \textit{Curriculum Logical Reasoning} (\textsc{Clr}), where we augment the training data with nested boolean logic chain step-by-step, and program the training from simpler logical patterns gradually to harder ones. This new training paradigm allows language models to effectively generalize to much harder and longer-hop logic, which can hardly be learned through naive training. Furthermore, we show that boolean logic is a great foundation for improving the subsequent general logical tasks.


Chinese Spelling Correction as Rephrasing Language Model

arXiv.org Artificial Intelligence

This paper studies Chinese Spelling Correction (CSC), which aims to detect and correct the potential spelling errors in a given sentence. Current state-of-the-art methods regard CSC as a sequence tagging task and fine-tune BERT-based models on sentence pairs. However, we note a critical flaw in the process of tagging one character to another, that the correction is excessively conditioned on the error. This is opposite from human mindset, where individuals rephrase the complete sentence based on its semantics, rather than solely on the error patterns memorized before. Such a counter-intuitive learning process results in the bottleneck of generalizability and transferability of machine spelling correction. To address this, we propose Rephrasing Language Model (ReLM), where the model is trained to rephrase the entire sentence by infilling additional slots, instead of character-to-character tagging. This novel training paradigm achieves the new state-of-the-art results across fine-tuned and zero-shot CSC benchmarks, outperforming previous counterparts by a large margin. Our method also learns transferable language representation when CSC is jointly trained with other tasks.


TriFormer: A Multi-modal Transformer Framework For Mild Cognitive Impairment Conversion Prediction

arXiv.org Artificial Intelligence

Magnetic resonance imaging (MRI) and Positron emission tomography (PET) could help more accurately predict MCI The prediction of mild cognitive impairment (MCI) conversion conversion [2]. to Alzheimer's disease (AD) is important for early Convolutional neural networks (CNNs) have been widely treatment to prevent or slow the progression of AD. To accurately applied to AD classification and prediction from imaging predict the MCI conversion to stable MCI or progressive data. Valliani et al. [3] fine-tuned a pretrained ResNet-50 MCI, we propose TriFormer, a novel transformer-based to classify AD and CN based on 2D axial slices. Wen et framework with three specialized transformers to incorporate al. [4] leveraged 3D spatial information by using a 3D CNN multi-modal data. TriFormer uses I) an image transformer to and outperformed previous 2D-based methods in AD classification extract multi-view image features from medical scans, II) a and MCI conversion prediction. However, both clinical transformer to embed and correlate multi-modal clinical 2D and 3D CNNs have a strong inductive bias towards local data, and III) a modality fusion transformer that produces receptive fields, which could limit the performance on an accurate prediction based on fusing the outputs from the high dimensional data [5]. Recently, transformers have been image and clinical transformers. Triformer is evaluated on the shown to be effective in capturing global long-range dependency Alzheimer's Disease Neuroimaging Initiative (ADNI) 1 and within imaging [6] and sequential data [7]. They also ADNI2 datasets and outperforms previous state-of-the-art have no indictive bias compared with CNNs.


Kriging Convolutional Networks

arXiv.org Artificial Intelligence

Spatial interpolation is a class of estimation problems where locations with known values are used to estimate values at other locations, with an emphasis on harnessing spatial locality and trends. Traditional Kriging methods have strong Gaussian assumptions, and as a result, often fail to capture complexities within the data. Inspired by the recent progress of graph neural networks, we introduce Kriging Convolutional Networks (KCN), a method of combining the advantages of Graph Convolutional Networks (GCN) and Kriging. Compared to standard GCNs, KCNs make direct use of neighboring observations when generating predictions. KCNs also contain the Kriging method as a specific configuration. We further improve the model's performance by adding attention. Empirically, we show that this model outperforms GCNs and Kriging in several applications. The implementation of KCN using PyTorch is publicized at the GitHub repository: https://github.com/tufts-ml/kcn-torch.


Stochastic Iterative Graph Matching

arXiv.org Machine Learning

Recent works leveraging Graph Neural Networks to approach graph matching tasks have shown promising results. Recent progress in learning discrete distributions poses new opportunities for learning graph matching models. In this work, we propose a new model, Stochastic Iterative Graph MAtching (SIGMA), to address the graph matching problem. Our model defines a distribution of matchings for a graph pair so the model can explore a wide range of possible matchings. We further introduce a novel multi-step matching procedure, which learns how to refine a graph pair's matching results incrementally. The model also includes dummy nodes so that the model does not have to find matchings for nodes without correspondence. We fit this model to data via scalable stochastic optimization. We conduct extensive experiments across synthetic graph datasets as well as biochemistry and computer vision applications. Across all tasks, our results show that SIGMA can produce significantly improved graph matching results compared to state-of-the-art models. Ablation studies verify that each of our components (stochastic training, iterative matching, and dummy nodes) offers noticeable improvement.


Modeling Graph Node Correlations with Neighbor Mixture Models

arXiv.org Machine Learning

We propose a new model, the Neighbor Mixture Model (NMM), for modeling node labels in a graph. This model aims to capture correlations between the labels of nodes in a local neighborhood. We carefully design the model so it could be an alternative to a Markov Random Field but with more affordable computations. In particular, drawing samples and evaluating marginal probabilities of single labels can be done in linear time. To scale computations to large graphs, we devise a variational approximation without introducing extra parameters. We further use graph neural networks (GNNs) to parameterize the NMM, which reduces the number of learnable parameters while allowing expressive representation learning. The proposed model can be either fit directly to large observed graphs or used to enable scalable inference that preserves correlations for other distributions such as deep generative graph models. Across a diverse set of node classification, image denoising, and link prediction tasks, we show our proposed NMM advances the state-of-the-art in modeling real-world labeled graphs.


Universal Representation for Code

arXiv.org Artificial Intelligence

Learning from source code usually requires a large amount of labeled data. Despite the possible scarcity of labeled data, the trained model is highly task-specific and lacks transferability to different tasks. In this work, we present effective pre-training strategies on top of a novel graph-based code representation, to produce universal representations for code. Specifically, our graph-based representation captures important semantics between code elements (e.g., control flow and data flow). We pre-train graph neural networks on the representation to extract universal code properties. The pre-trained model then enables the possibility of fine-tuning to support various downstream applications. We evaluate our model on two real-world datasets -- spanning over 30M Java methods and 770K Python methods. Through visualization, we reveal discriminative properties in our universal code representation. By comparing multiple benchmarks, we demonstrate that the proposed framework achieves state-of-the-art results on method name prediction and code graph link prediction.


Non-Parametric Variational Inference with Graph Convolutional Networks for Gaussian Processes

arXiv.org Machine Learning

Inference for GP models with non-Gaussian noises is computationally expensive when dealing with large datasets. Many recent inference methods approximate the posterior distribution with a simpler distribution defined on a small number of inducing points. The inference is accurate only when data points have strong correlation with these inducing points. In this paper, we consider the inference problem in a different direction: GP function values in the posterior are mostly correlated in short distance. We construct a variational distribution such that the inference for a data point considers only its neighborhood. With this construction, the variational lower bound is highly decomposible, hence we can run stochastic optimization with very small batches. We then train Graph Convolutional Networks as a reusable model to identify variational parameters for each data point. Model reuse greatly reduces the number of parameters and the number of iterations needed in optimization. The proposed method significantly speeds up the inference and often gets more accurate results than previous methods.