AITopics | Teng, Zhiyang

Plotting

Teng, Zhiyang

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Logic Agent: Enhancing Validity with Logic Rule Invocation

Liu, Hanmeng, Teng, Zhiyang, Zhang, Chaoli, Zhang, Yue

arXiv.org Artificial IntelligenceApr-28-2024

Chain-of-Thought (CoT) prompting has emerged as a pivotal technique for augmenting the inferential capabilities of language models during reasoning tasks. Despite its advancements, CoT often grapples with challenges in validating reasoning validity and ensuring informativeness. Addressing these limitations, this paper introduces the Logic Agent (LA), an agent-based framework aimed at enhancing the validity of reasoning processes in Large Language Models (LLMs) through strategic logic rule invocation. Unlike conventional approaches, LA transforms LLMs into logic agents that dynamically apply propositional logic rules, initiating the reasoning process by converting natural language inputs into structured logic forms. The logic agent leverages a comprehensive set of predefined functions to systematically navigate the reasoning process. This methodology not only promotes the structured and coherent generation of reasoning constructs but also significantly improves their interpretability and logical coherence. Through extensive experimentation, we demonstrate LA's capacity to scale effectively across various model sizes, markedly improving the precision of complex reasoning across diverse tasks.

large language model, logic & formal reasoning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2404.1813

Genre:

Research Report > New Finding (1.00)
Overview (0.93)

Industry: Education (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Refining Latent Homophilic Structures over Heterophilic Graphs for Robust Graph Convolution Networks

Qiu, Chenyang, Nan, Guoshun, Xiong, Tianyu, Deng, Wendi, Wang, Di, Teng, Zhiyang, Sun, Lijuan, Cui, Qimei, Tao, Xiaofeng

arXiv.org Artificial IntelligenceDec-27-2023

Graph convolution networks (GCNs) are extensively utilized in various graph tasks to mine knowledge from spatial data. Our study marks the pioneering attempt to quantitatively investigate the GCN robustness over omnipresent heterophilic graphs for node classification. We uncover that the predominant vulnerability is caused by the structural out-of-distribution (OOD) issue. This finding motivates us to present a novel method that aims to harden GCNs by automatically learning Latent Homophilic Structures over heterophilic graphs. We term such a methodology as LHS. To elaborate, our initial step involves learning a latent structure by employing a novel self-expressive technique based on multi-node interactions. Subsequently, the structure is refined using a pairwisely constrained dual-view contrastive learning approach. We iteratively perform the above procedure, enabling a GCN model to aggregate information in a homophilic way on heterophilic graphs. Armed with such an adaptable structure, we can properly mitigate the structural OOD threats over heterophilic graphs. Experiments on various benchmarks show the effectiveness of the proposed LHS approach for robust GCNs.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2312.16418

Country:

North America > United States (0.15)
Asia > China (0.14)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

Non-Autoregressive Document-Level Machine Translation

Bao, Guangsheng, Teng, Zhiyang, Zhou, Hao, Yan, Jianhao, Zhang, Yue

arXiv.org Artificial IntelligenceDec-9-2023

Non-autoregressive translation (NAT) models achieve comparable performance and superior speed compared to auto-regressive translation (AT) models in the context of sentence-level machine translation (MT). However, their abilities are unexplored in document-level MT, hindering their usage in real scenarios. In this paper, we conduct a comprehensive examination of typical NAT models in the context of document-level MT and further propose a simple but effective design of sentence alignment between source and target. Experiments show that NAT models achieve high acceleration on documents, and sentence alignment significantly enhances their performance. However, current NAT models still have a significant performance gap compared to their AT counterparts. Further investigation reveals that NAT models suffer more from the multi-modality and misalignment issues in the context of document-level MT, and current NAT models struggle with exploiting document context and handling discourse phenomena. We delve into these challenges and provide our code at \url{https://github.com/baoguangsheng/nat-on-doc}.

artificial intelligence, nat model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2305.12878

Genre: Research Report > Experimental Study (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

LogiCoT: Logical Chain-of-Thought Instruction-Tuning

Liu, Hanmeng, Teng, Zhiyang, Cui, Leyang, Zhang, Chaoli, Zhou, Qiji, Zhang, Yue

arXiv.org Artificial IntelligenceOct-28-2023

Generative Pre-trained Transformer 4 (GPT-4) demonstrates impressive chain-of-thought reasoning ability. Recent work on self-instruction tuning, such as Alpaca, has focused on enhancing the general proficiency of models. These instructions enable the model to achieve performance comparable to GPT-3.5 on general tasks like open-domain text generation and paraphrasing. However, they fall short of helping the model handle complex reasoning tasks. To bridge the gap, this paper presents LogiCoT, a new instruction-tuning dataset for Logical Chain-of-Thought reasoning with GPT-4. We elaborate on the process of harvesting instructions for prompting GPT-4 to generate chain-of-thought rationales. LogiCoT serves as an instruction set for teaching models of logical reasoning and elicits general reasoning skills.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2305.12147

Country:

North America > Canada (0.14)
Europe > Italy (0.14)

Genre: Research Report (0.64)

Industry: Education > Curriculum > Subject-Specific Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

YATO: Yet Another deep learning based Text analysis Open toolkit

Wang, Zeqiang, Wang, Yile, Wu, Jiageng, Teng, Zhiyang, Yang, Jie

arXiv.org Artificial IntelligenceOct-18-2023

We introduce YATO, an open-source, easy-to-use toolkit for text analysis with deep learning. Different from existing heavily engineered toolkits and platforms, YATO is lightweight and user-friendly for researchers from cross-disciplinary areas. Designed in a hierarchical structure, YATO supports free combinations of three types of widely used features including 1) traditional neural networks (CNN, RNN, etc.); 2) pre-trained language models (BERT, RoBERTa, ELECTRA, etc.); and 3) user-customized neural features via a simple configurable file. Benefiting from the advantages of flexibility and ease of use, YATO can facilitate fast reproduction and refinement of state-of-the-art NLP models, and promote the cross-disciplinary applications of NLP techniques. The code, examples, and documentation are publicly available at https://github.com/jiesutd/YATO. A demo video is also available at https://www.youtube.com/playlist?list=PLJ0mhzMcRuDUlTkzBfAftOqiJRxYTTjXH.

artificial intelligence, machine learning, natural language, (3 more...)

arXiv.org Artificial Intelligence

2209.13877

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

GLoRE: Evaluating Logical Reasoning of Large Language Models

liu, Hanmeng, Teng, Zhiyang, Ning, Ruoxi, Liu, Jian, Zhou, Qiji, Zhang, Yue

arXiv.org Artificial IntelligenceOct-13-2023

Recently, large language models (LLMs), including notable models such as GPT-4 and burgeoning community models, have showcased significant general language understanding abilities. However, there has been a scarcity of attempts to assess the logical reasoning capacities of these LLMs, an essential facet of natural language understanding. To encourage further investigation in this area, we introduce GLoRE, a meticulously assembled General Logical Reasoning Evaluation benchmark comprised of 12 datasets that span three different types of tasks. Our experimental results show that compared to the performance of human and supervised fine-tuning, the logical reasoning capabilities of open LLM models necessitate additional improvement; ChatGPT and GPT-4 show a strong capability of logical reasoning, with GPT-4 surpassing ChatGPT by a large margin. We propose a self-consistency probing method to enhance the accuracy of ChatGPT and a fine-tuned method to boost the performance of an open LLM. We release the datasets and evaluation programs to facilitate future research.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2310.09107

Country: North America > United States > New York > New York County > New York City (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Education (0.69)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)
Health & Medicine > Therapeutic Area > Immunology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Fast-DetectGPT: Efficient Zero-Shot Detection of Machine-Generated Text via Conditional Probability Curvature

Bao, Guangsheng, Zhao, Yanbin, Teng, Zhiyang, Yang, Linyi, Zhang, Yue

arXiv.org Artificial IntelligenceOct-8-2023

Table 4: Details of the source models that is used to produce machine-generated text. We assess the performance of our methodologies using text generations sourced from various models, as outlined in Table 4. These models are arranged in order of their parameter count, with those having fewer than 20 billion parameters being run locally on a Tesla A100 GPU (80G). For models with over 6 billion parameters, we employ half-precision (float16), otherwise, we use full-precision (float32). In the case of larger models like GPT-3, ChatGPT, and GPT-4, we utilize the OpenAI API for the evaluations. Additionally, we provide information about the training corpus associated with each model, which we believe is pertinent for understanding the detection accuracy of different sampling and scoring models when applied to text generations originating from diverse source models, domains, and languages.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2310.0513

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.67)

Industry:

Information Technology > Security & Privacy (0.46)
Energy > Oil & Gas > Upstream (0.41)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.42)

Add feedback

Token-Level Fitting Issues of Seq2seq Models

Bao, Guangsheng, Teng, Zhiyang, Zhang, Yue

arXiv.org Artificial IntelligenceJun-22-2023

Sequence-to-sequence (seq2seq) models have been widely used for natural language processing, computer vision, and other deep learning tasks. We find that seq2seq models trained with early-stopping suffer from issues at the token level. In particular, while some tokens in the vocabulary demonstrate overfitting, others underfit when training is stopped. Experiments show that the phenomena are pervasive in different models, even in fine-tuned large pretrained-models. We identify three major factors that influence token-level fitting, which include token frequency, parts-of-speech, and prediction discrepancy. Further, we find that external factors such as language, model size, domain, data scale, and pretraining can also influence the fitting of tokens.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2305.04493

Country: Asia > China (0.14)

Genre: Research Report > Experimental Study (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Target-Side Augmentation for Document-Level Machine Translation

Bao, Guangsheng, Teng, Zhiyang, Zhang, Yue

arXiv.org Artificial IntelligenceJun-4-2023

Document-level machine translation faces the challenge of data sparsity due to its long input length and a small amount of training data, increasing the risk of learning spurious patterns. To address this challenge, we propose a target-side augmentation method, introducing a data augmentation (DA) model to generate many potential translations for each source document. Learning on these wider range translations, an MT model can learn a smoothed distribution, thereby reducing the risk of data sparsity. We demonstrate that the DA model, which estimates the posterior distribution, largely improves the MT performance, outperforming the previous best system by 2.30 s-BLEU on News and achieving new state-of-the-art on News and Europarl benchmarks. Our code is available at https://github.com/baoguangsheng/target-side-augmentation.

machine learning, natural language, translation, (19 more...)

arXiv.org Artificial Intelligence

2305.04505

Country:

Europe (0.93)
Asia > Russia (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Multimodal Relation Extraction with Cross-Modal Retrieval and Synthesis

Hu, Xuming, Guo, Zhijiang, Teng, Zhiyang, King, Irwin, Yu, Philip S.

arXiv.org Artificial IntelligenceMay-25-2023

Multimodal relation extraction (MRE) is the task of identifying the semantic relationships between two entities based on the context of the sentence image pair. Existing retrieval-augmented approaches mainly focused on modeling the retrieved textual knowledge, but this may not be able to accurately identify complex relations. To improve the prediction, this research proposes to retrieve textual and visual evidence based on the object, sentence, and whole image. We further develop a novel approach to synthesize the object-level, image-level, and sentence-level information for better reasoning between the same and different modalities. Extensive experiments and analyses show that the proposed method is able to effectively select and compare evidence across modalities and significantly outperforms state-of-the-art models.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2305.16166

Country:

Europe (0.68)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > Promising Solution (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.47)

Add feedback