AITopics | Song, Linqi

Collaborating Authors

Song, Linqi

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification

Zhou, Aojun, Wang, Ke, Lu, Zimu, Shi, Weikang, Luo, Sichun, Qin, Zipeng, Lu, Shaoqing, Jia, Anya, Song, Linqi, Zhan, Mingjie, Li, Hongsheng

arXiv.org Artificial IntelligenceAug-15-2023

Recent progress in large language models (LLMs) like GPT-4 and PaLM-2 has brought significant advancements in addressing math reasoning problems. In particular, OpenAI's latest version of GPT-4, known as GPT-4 Code Interpreter, shows remarkable performance on challenging math datasets. In this paper, we explore the effect of code on enhancing LLMs' reasoning capability by introducing different constraints on the Code Usage Frequency of GPT-4 Code Interpreter. We found that its success can be largely attributed to its powerful skills in generating and executing code, evaluating the output of code execution, and rectifying its solution when receiving unreasonable outputs. Based on this insight, we propose a novel and effective prompting method, explicit code-based self-verification (CSV), to further boost the mathematical reasoning potential of GPT-4 Code Interpreter. This method employs a zero-shot prompt on GPT-4 Code Interpreter to encourage it to use code to self-verify its answers. In instances where the verification state registers as "False", the model shall automatically amend its solution, analogous to our approach of rectifying errors during a mathematics examination. Furthermore, we recognize that the states of the verification result indicate the confidence of a solution, which can improve the effectiveness of majority voting. With GPT-4 Code Interpreter and CSV, we achieve an impressive zero-shot accuracy on MATH dataset (53.9% 84.3%). Large language models (LLMs) (Brown et al., 2020; OpenAI, 2023; Anil et al., 2023) have shown impressive success in various tasks, such as common sense understanding and code generation. However, they still fall short in mathematical reasoning, often producing nonsensical or inaccurate content and struggling with complex calculations. Previous attempts to tackle these challenges include the Chain-of-Thought (CoT) (Wei et al., 2022) framework, which enhances LLMs' logical reasoning abilities by generating intermediate steps in their reasoning process.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2308.07921

Country: Asia > China (0.14)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.45)

Add feedback

VCSUM: A Versatile Chinese Meeting Summarization Dataset

Wu, Han, Zhan, Mingjie, Tan, Haochen, Hou, Zhaohui, Liang, Ding, Song, Linqi

arXiv.org Artificial IntelligenceMay-15-2023

Compared to news and chat summarization, the development of meeting summarization is hugely decelerated by the limited data. To this end, we introduce a versatile Chinese meeting summarization dataset, dubbed VCSum, consisting of 239 real-life meetings, with a total duration of over 230 hours. We claim our dataset is versatile because we provide the annotations of topic segmentation, headlines, segmentation summaries, overall meeting summaries, and salient sentences for each meeting transcript. As such, the dataset can adapt to various summarization tasks or methods, including segmentation-based summarization, multi-granularity summarization and retrieval-then-generate summarization. Our analysis confirms the effectiveness and robustness of VCSum. We also provide a set of benchmark models regarding different downstream summarization tasks on VCSum to facilitate further research. The dataset and code will be released at https://github.com/hahahawu/VCSum.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2305.0528

Country: Asia > China (0.69)

Genre: Research Report (0.50)

Industry: Education (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Communications > Social Media (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Byzantine-robust Federated Learning through Collaborative Malicious Gradient Filtering

Xu, Jian, Huang, Shao-Lun, Song, Linqi, Lan, Tian

arXiv.org Artificial IntelligenceApr-29-2023

Gradient-based training in federated learning is known to be vulnerable to faulty/malicious clients, which are often modeled as Byzantine clients. To this end, previous work either makes use of auxiliary data at parameter server to verify the received gradients (e.g., by computing validation error rate) or leverages statistic-based methods (e.g. median and Krum) to identify and remove malicious gradients from Byzantine clients. In this paper, we remark that auxiliary data may not always be available in practice and focus on the statistic-based approach. However, recent work on model poisoning attacks has shown that well-crafted attacks can circumvent most of median- and distance-based statistical defense methods, making malicious gradients indistinguishable from honest ones. To tackle this challenge, we show that the element-wise sign of gradient vector can provide valuable insight in detecting model poisoning attacks. Based on our theoretical analysis of the \textit{Little is Enough} attack, we propose a novel approach called \textit{SignGuard} to enable Byzantine-robust federated learning through collaborative malicious gradient filtering. More precisely, the received gradients are first processed to generate relevant magnitude, sign, and similarity statistics, which are then collaboratively utilized by multiple filters to eliminate malicious gradients before final aggregation. Finally, extensive experiments of image and text classification tasks are conducted under recently proposed attacks and defense strategies. The numerical results demonstrate the effectiveness and superiority of our proposed approach. The code is available at \textit{\url{https://github.com/JianXu95/SignGuard}}

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2109.05872

Country:

Asia > China (0.28)
North America (0.28)

Genre: Research Report > New Finding (0.66)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Killing Two Birds with One Stone: Quantization Achieves Privacy in Distributed Learning

Yan, Guangfeng, Li, Tan, Wu, Kui, Song, Linqi

arXiv.org Artificial IntelligenceApr-26-2023

Communication efficiency and privacy protection are two critical issues in distributed machine learning. Existing methods tackle these two issues separately and may have a high implementation complexity that constrains their application in a resource-limited environment. We propose a comprehensive quantization-based solution that could simultaneously achieve communication efficiency and privacy protection, providing new insights into the correlated nature of communication and privacy. Specifically, we demonstrate the effectiveness of our proposed solutions in the distributed stochastic gradient descent (SGD) framework by adding binomial noise to the uniformly quantized gradients to reach the desired differential privacy level but with a minor sacrifice in communication efficiency. We theoretically capture the new trade-offs between communication, privacy, and learning performance.

artificial intelligence, machine learning, optimization problem, (16 more...)

arXiv.org Artificial Intelligence

2304.13545

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

Learning Locality and Isotropy in Dialogue Modeling

Wu, Han, Tan, Haochen, Zhan, Mingjie, Zhao, Gangming, Lu, Shaoqing, Liang, Ding, Song, Linqi

arXiv.org Artificial IntelligenceJan-29-2023

Existing dialogue modeling methods have achieved promising performance on various dialogue tasks with the aid of Transformer and the large-scale pre-trained language models. However, some recent studies revealed that the context representations produced by these methods suffer the problem of anisotropy. In this paper, we find that the generated representations are also not conversational, losing the conversation structure information during the context modeling stage. To this end, we identify two properties in dialogue modeling, i.e., locality and isotropy, and present a simple method for dialogue representation calibration, namely SimDRC, to build isotropic and conversational feature spaces. Experimental results show that our approach significantly outperforms current stateof-the-art models on three open-domain dialogue tasks with eight benchmarks across both automatic and human evaluation metrics. More in-depth analyses further confirm the effectiveness of our proposed approach. Dialogue modeling (Serban et al., 2016; Mehri et al., 2019; Liu et al., 2021) is to encode the raw text of the input dialogue to the contextual representations. Although the Transformer-based dialogue modeling methods (Hosseini-Asl et al., 2020; Liu et al., 2021) have achieved great success on various dialogue tasks, there are still some impediments in these methods that are not well explored nowadays. Specifically, recent studies (Ethayarajh, 2019; Su et al., 2022) have revealed that on dialogue generation tasks, the representations produced by existing dialogue modeling methods are anisotropic, i.e. features occupy a narrow cone in the vector space, thus leading to the problem of degeneration. To alleviate this problem, previous solutions (e.g. SimCTG) (Su et al., 2021; 2022) encourage the model to learn isotropic token embeddings by pushing away the representations of distinct tokens. While building the more discriminative and isotropic feature space, these methods still ignore learning dialogue-specific features, such as inter-speaker correlations and conversational structure information, in the dialogue modeling stage.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2205.14583

Genre:

Research Report > New Finding (0.48)
Personal > Interview (0.46)

Industry: Energy > Oil & Gas > Upstream (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

CSAGN: Conversational Structure Aware Graph Network for Conversational Semantic Role Labeling

Wu, Han, Xu, Kun, Song, Linqi

arXiv.org Artificial IntelligenceSep-23-2021

Conversational semantic role labeling (CSRL) is believed to be a crucial step towards dialogue understanding. However, it remains a major challenge for existing CSRL parser to handle conversational structural information. In this paper, we present a simple and effective architecture for CSRL which aims to address this problem. Our model is based on a conversational structure-aware graph network which explicitly encodes the speaker dependent information. We also propose a multi-task learning method to further improve the model. Experimental results on benchmark datasets show that our model with our proposed training objectives significantly outperforms previous baselines.

educational method, mentoring method, representation, (27 more...)

arXiv.org Artificial Intelligence

2109.11541

Country:

Asia > China (0.29)
North America > United States (0.28)
Asia > Middle East > Israel (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.71)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.61)

Add feedback

DQ-SGD: Dynamic Quantization in SGD for Communication-Efficient Distributed Learning

Yan, Guangfeng, Huang, Shao-Lun, Lan, Tian, Song, Linqi

arXiv.org Artificial IntelligenceJul-30-2021

Gradient quantization is an emerging technique in reducing communication costs in distributed learning. Existing gradient quantization algorithms often rely on engineering heuristics or empirical observations, lacking a systematic approach to dynamically quantize gradients. This paper addresses this issue by proposing a novel dynamically quantized SGD (DQ-SGD) framework, enabling us to dynamically adjust the quantization scheme for each gradient descent step by exploring the trade-off between communication cost and convergence error. We derive an upper bound, tight in some cases, of the convergence error for a restricted family of quantization schemes and loss functions. We design our DQ-SGD algorithm via minimizing the communication cost under the convergence error constraints. Finally, through extensive experiments on large-scale natural language processing and computer vision tasks on AG-News, CIFAR-10, and CIFAR-100 datasets, we demonstrate that our quantization scheme achieves better tradeoffs between the communication cost and learning performance than other state-of-the-art gradient quantization methods.

artificial intelligence, communication cost, optimization problem, (19 more...)

arXiv.org Artificial Intelligence

2107.14575

Country: Asia > China (0.47)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.36)

Add feedback

Domain-Adaptive Pretraining Methods for Dialogue Understanding

Wu, Han, Xu, Kun, Song, Linfeng, Jin, Lifeng, Zhang, Haisong, Song, Linqi

arXiv.org Artificial IntelligenceMay-28-2021

Language models like BERT and SpanBERT pretrained on open-domain data have obtained impressive gains on various NLP tasks. In this paper, we probe the effectiveness of domain-adaptive pretraining objectives on downstream tasks. In particular, three objectives, including a novel objective focusing on modeling predicate-argument relations, are evaluated on two challenging dialogue understanding tasks. Experimental results demonstrate that domain-adaptive pretraining with proper objectives can significantly improve the performance of a strong baseline on these tasks, achieving the new state-of-the-art performances.

artificial intelligence, natural language, objective, (18 more...)

arXiv.org Artificial Intelligence

2105.13665

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.48)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.30)

Add feedback

Conversational Semantic Role Labeling

Xu, Kun, Wu, Han, Song, Linfeng, Zhang, Haisong, Song, Linqi, Yu, Dong

arXiv.org Artificial IntelligenceApr-11-2021

Semantic role labeling (SRL) aims to extract the arguments for each predicate in an input sentence. Traditional SRL can fail to analyze dialogues because it only works on every single sentence, while ellipsis and anaphora frequently occur in dialogues. To address this problem, we propose the conversational SRL task, where an argument can be the dialogue participants, a phrase in the dialogue history or the current sentence. As the existing SRL datasets are in the sentence level, we manually annotate semantic roles for 3,000 chit-chat dialogues (27,198 sentences) to boost the research in this direction. Experiments show that while traditional SRL systems (even with the help of coreference resolution or rewriting) perform poorly for analyzing dialogues, modeling dialogue histories and participants greatly helps the performance, indicating that adapting SRL to conversations is very promising for universal dialogue understanding. Our initial study by applying CSRL to two mainstream conversational tasks, dialogue response generation and dialogue context rewriting, also confirms the usefulness of CSRL.

argument, deep learning, neural network, (20 more...)

arXiv.org Artificial Intelligence

2104.04947

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Distributed Thompson Sampling

Dong, Jing, Li, Tan, Ren, Shaolei, Song, Linqi

arXiv.org Artificial IntelligenceDec-3-2020

We study a cooperative multi-agent multi-armed bandits with M agents and K arms. The goal of the agents is to minimized the cumulative regret. We adapt a traditional Thompson Sampling algoirthm under the distributed setting. However, with agent's ability to communicate, we note that communication may further reduce the upper bound of the regret for a distributed Thompson Sampling approach. To further improve the performance of distributed Thompson Sampling, we propose a distributed Elimination based Thompson Sampling algorithm that allow the agents to learn collaboratively. We analyse the algorithm under Bernoulli reward and derived a problem dependent upper bound on the cumulative regret.

agent, artificial intelligence, big data, (16 more...)

arXiv.org Artificial Intelligence

2012.01789

Country: North America > United States > California (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.69)

Add feedback