AITopics

2412.08604

Country:

Europe (1.00)
Asia (1.00)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceDec-30-2023

Empower Nested Boolean Logic via Self-Supervised Curriculum Learning

Wu, Hongqiu, Liu, Linfeng, Zhao, Hai, Zhang, Min

Beyond the great cognitive powers showcased by language models, it is crucial to scrutinize whether their reasoning capabilities stem from strong generalization or merely exposure to relevant data. As opposed to constructing increasingly complex logic, this paper probes into the boolean logic, the root capability of a logical reasoner. We find that any pre-trained language models even including large language models only behave like a random selector in the face of multi-nested boolean logic, a task that humans can handle with ease. To empower language models with this fundamental capability, this paper proposes a new self-supervised learning method \textit{Curriculum Logical Reasoning} (\textsc{Clr}), where we augment the training data with nested boolean logic chain step-by-step, and program the training from simpler logical patterns gradually to harder ones. This new training paradigm allows language models to effectively generalize to much harder and longer-hop logic, which can hardly be learned through naive training. Furthermore, we show that boolean logic is a great foundation for improving the subsequent general logical tasks.

large language model, machine learning, natural language, (20 more...)

2310.0545

Country:

Europe (0.94)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Middle East > UAE (0.14)
(3 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceDec-18-2023

Chinese Spelling Correction as Rephrasing Language Model

Liu, Linfeng, Wu, Hongqiu, Zhao, Hai

This paper studies Chinese Spelling Correction (CSC), which aims to detect and correct the potential spelling errors in a given sentence. Current state-of-the-art methods regard CSC as a sequence tagging task and fine-tune BERT-based models on sentence pairs. However, we note a critical flaw in the process of tagging one character to another, that the correction is excessively conditioned on the error. This is opposite from human mindset, where individuals rephrase the complete sentence based on its semantics, rather than solely on the error patterns memorized before. Such a counter-intuitive learning process results in the bottleneck of generalizability and transferability of machine spelling correction. To address this, we propose Rephrasing Language Model (ReLM), where the model is trained to rephrase the entire sentence by infilling additional slots, instead of character-to-character tagging. This novel training paradigm achieves the new state-of-the-art results across fine-tuned and zero-shot CSC benchmarks, outperforming previous counterparts by a large margin. Our method also learns transferable language representation when CSC is jointly trained with other tasks.

computational linguistic, large language model, machine learning, (18 more...)

2308.08796

Country:

Europe (1.00)
Asia > China (0.47)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

arXiv.org Artificial IntelligenceJul-14-2023

TriFormer: A Multi-modal Transformer Framework For Mild Cognitive Impairment Conversion Prediction

Liu, Linfeng, Lyu, Junyan, Liu, Siyu, Tang, Xiaoying, Chandra, Shekhar S., Nasrallah, Fatima A.

Magnetic resonance imaging (MRI) and Positron emission tomography (PET) could help more accurately predict MCI The prediction of mild cognitive impairment (MCI) conversion conversion [2]. to Alzheimer's disease (AD) is important for early Convolutional neural networks (CNNs) have been widely treatment to prevent or slow the progression of AD. To accurately applied to AD classification and prediction from imaging predict the MCI conversion to stable MCI or progressive data. Valliani et al. [3] fine-tuned a pretrained ResNet-50 MCI, we propose TriFormer, a novel transformer-based to classify AD and CN based on 2D axial slices. Wen et framework with three specialized transformers to incorporate al. [4] leveraged 3D spatial information by using a 3D CNN multi-modal data. TriFormer uses I) an image transformer to and outperformed previous 2D-based methods in AD classification extract multi-view image features from medical scans, II) a and MCI conversion prediction. However, both clinical transformer to embed and correlate multi-modal clinical 2D and 3D CNNs have a strong inductive bias towards local data, and III) a modality fusion transformer that produces receptive fields, which could limit the performance on an accurate prediction based on fusing the outputs from the high dimensional data [5]. Recently, transformers have been image and clinical transformers. Triformer is evaluated on the shown to be effective in capturing global long-range dependency Alzheimer's Disease Neuroimaging Initiative (ADNI) 1 and within imaging [6] and sequential data [7]. They also ADNI2 datasets and outperforms previous state-of-the-art have no indictive bias compared with CNNs.

artificial intelligence, machine learning, transformer, (16 more...)

2307.07177

Country: Oceania > Australia > Queensland (0.15)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceJun-15-2023

Kriging Convolutional Networks

Appleby, Gabriel, Liu, Linfeng, Liu, Li-Ping

Spatial interpolation is a class of estimation problems where locations with known values are used to estimate values at other locations, with an emphasis on harnessing spatial locality and trends. Traditional Kriging methods have strong Gaussian assumptions, and as a result, often fail to capture complexities within the data. Inspired by the recent progress of graph neural networks, we introduce Kriging Convolutional Networks (KCN), a method of combining the advantages of Graph Convolutional Networks (GCN) and Kriging. Compared to standard GCNs, KCNs make direct use of neighboring observations when generating predictions. KCNs also contain the Kriging method as a specific configuration. We further improve the model's performance by adding attention. Empirically, we show that this model outperforms GCNs and Kriging in several applications. The implementation of KCN using PyTorch is publicized at the GitHub repository: https://github.com/tufts-ml/kcn-torch.

artificial intelligence, machine learning, survey article, (21 more...)

doi: 10.1609/aaai.v34i04.5716

2306.09463

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningJun-3-2021

Stochastic Iterative Graph Matching

Liu, Linfeng, Hughes, Michael C., Hassoun, Soha, Liu, Li-Ping

Recent works leveraging Graph Neural Networks to approach graph matching tasks have shown promising results. Recent progress in learning discrete distributions poses new opportunities for learning graph matching models. In this work, we propose a new model, Stochastic Iterative Graph MAtching (SIGMA), to address the graph matching problem. Our model defines a distribution of matchings for a graph pair so the model can explore a wide range of possible matchings. We further introduce a novel multi-step matching procedure, which learns how to refine a graph pair's matching results incrementally. The model also includes dummy nodes so that the model does not have to find matchings for nodes without correspondence. We fit this model to data via scalable stochastic optimization. We conduct extensive experiments across synthetic graph datasets as well as biochemistry and computer vision applications. Across all tasks, our results show that SIGMA can produce significantly improved graph matching results compared to state-of-the-art models. Ablation studies verify that each of our components (stochastic training, iterative matching, and dummy nodes) offers noticeable improvement.

graph, neural network, survey article, (17 more...)

2106.02206

Genre:

Research Report > New Finding (0.54)
Research Report > Promising Solution (0.34)
Instructional Material > Course Syllabus & Notes (0.34)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)

arXiv.org Machine LearningApr-18-2021

Modeling Graph Node Correlations with Neighbor Mixture Models

Liu, Linfeng, Hughes, Michael C., Liu, Li-Ping

We propose a new model, the Neighbor Mixture Model (NMM), for modeling node labels in a graph. This model aims to capture correlations between the labels of nodes in a local neighborhood. We carefully design the model so it could be an alternative to a Markov Random Field but with more affordable computations. In particular, drawing samples and evaluating marginal probabilities of single labels can be done in linear time. To scale computations to large graphs, we devise a variational approximation without introducing extra parameters. We further use graph neural networks (GNNs) to parameterize the NMM, which reduces the number of learnable parameters while allowing expressive representation learning. The proposed model can be either fit directly to large observed graphs or used to enable scalable inference that preserves correlations for other distributions such as deep generative graph models. Across a diverse set of node classification, image denoising, and link prediction tasks, we show our proposed NMM advances the state-of-the-art in modeling real-world labeled graphs.

artificial intelligence, neural network, node, (15 more...)

2103.15966

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Artificial IntelligenceMar-4-2021

Universal Representation for Code

Liu, Linfeng, Nguyen, Hoan, Karypis, George, Sengamedu, Srinivasan

Learning from source code usually requires a large amount of labeled data. Despite the possible scarcity of labeled data, the trained model is highly task-specific and lacks transferability to different tasks. In this work, we present effective pre-training strategies on top of a novel graph-based code representation, to produce universal representations for code. Specifically, our graph-based representation captures important semantics between code elements (e.g., control flow and data flow). We pre-train graph neural networks on the representation to extract universal code properties. The pre-trained model then enables the possibility of fine-tuning to support various downstream applications. We evaluate our model on two real-world datasets -- spanning over 30M Java methods and 770K Python methods. Through visualization, we reveal discriminative properties in our universal code representation. By comparing multiple benchmarks, we demonstrate that the proposed framework achieves state-of-the-art results on method name prediction and code graph link prediction.

deep learning, neural network, representation, (20 more...)

2103.03116

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

arXiv.org Machine LearningSep-8-2018

Non-Parametric Variational Inference with Graph Convolutional Networks for Gaussian Processes

Liu, Linfeng, Liu, Liping

Inference for GP models with non-Gaussian noises is computationally expensive when dealing with large datasets. Many recent inference methods approximate the posterior distribution with a simpler distribution defined on a small number of inducing points. The inference is accurate only when data points have strong correlation with these inducing points. In this paper, we consider the inference problem in a different direction: GP function values in the posterior are mostly correlated in short distance. We construct a variational distribution such that the inference for a data point considers only its neighborhood. With this construction, the variational lower bound is highly decomposible, hence we can run stochastic optimization with very small batches. We then train Graph Convolutional Networks as a reusable model to identify variational parameters for each data point. Model reuse greatly reduces the number of parameters and the number of iterations needed in optimization. The proposed method significantly speeds up the inference and often gets more accurate results than previous methods.

artificial intelligence, inference, machine learning, (17 more...)

1809.02838

Country: North America > United States (0.68)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)