Goto

Collaborating Authors

 Song, Linqi


Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification

arXiv.org Artificial Intelligence

Recent progress in large language models (LLMs) like GPT-4 and PaLM-2 has brought significant advancements in addressing math reasoning problems. In particular, OpenAI's latest version of GPT-4, known as GPT-4 Code Interpreter, shows remarkable performance on challenging math datasets. In this paper, we explore the effect of code on enhancing LLMs' reasoning capability by introducing different constraints on the Code Usage Frequency of GPT-4 Code Interpreter. We found that its success can be largely attributed to its powerful skills in generating and executing code, evaluating the output of code execution, and rectifying its solution when receiving unreasonable outputs. Based on this insight, we propose a novel and effective prompting method, explicit code-based self-verification (CSV), to further boost the mathematical reasoning potential of GPT-4 Code Interpreter. This method employs a zero-shot prompt on GPT-4 Code Interpreter to encourage it to use code to self-verify its answers. In instances where the verification state registers as "False", the model shall automatically amend its solution, analogous to our approach of rectifying errors during a mathematics examination. Furthermore, we recognize that the states of the verification result indicate the confidence of a solution, which can improve the effectiveness of majority voting. With GPT-4 Code Interpreter and CSV, we achieve an impressive zero-shot accuracy on MATH dataset (53.9% 84.3%). Large language models (LLMs) (Brown et al., 2020; OpenAI, 2023; Anil et al., 2023) have shown impressive success in various tasks, such as common sense understanding and code generation. However, they still fall short in mathematical reasoning, often producing nonsensical or inaccurate content and struggling with complex calculations. Previous attempts to tackle these challenges include the Chain-of-Thought (CoT) (Wei et al., 2022) framework, which enhances LLMs' logical reasoning abilities by generating intermediate steps in their reasoning process.


VCSUM: A Versatile Chinese Meeting Summarization Dataset

arXiv.org Artificial Intelligence

Compared to news and chat summarization, the development of meeting summarization is hugely decelerated by the limited data. To this end, we introduce a versatile Chinese meeting summarization dataset, dubbed VCSum, consisting of 239 real-life meetings, with a total duration of over 230 hours. We claim our dataset is versatile because we provide the annotations of topic segmentation, headlines, segmentation summaries, overall meeting summaries, and salient sentences for each meeting transcript. As such, the dataset can adapt to various summarization tasks or methods, including segmentation-based summarization, multi-granularity summarization and retrieval-then-generate summarization. Our analysis confirms the effectiveness and robustness of VCSum. We also provide a set of benchmark models regarding different downstream summarization tasks on VCSum to facilitate further research. The dataset and code will be released at https://github.com/hahahawu/VCSum.


Byzantine-robust Federated Learning through Collaborative Malicious Gradient Filtering

arXiv.org Artificial Intelligence

Gradient-based training in federated learning is known to be vulnerable to faulty/malicious clients, which are often modeled as Byzantine clients. To this end, previous work either makes use of auxiliary data at parameter server to verify the received gradients (e.g., by computing validation error rate) or leverages statistic-based methods (e.g. median and Krum) to identify and remove malicious gradients from Byzantine clients. In this paper, we remark that auxiliary data may not always be available in practice and focus on the statistic-based approach. However, recent work on model poisoning attacks has shown that well-crafted attacks can circumvent most of median- and distance-based statistical defense methods, making malicious gradients indistinguishable from honest ones. To tackle this challenge, we show that the element-wise sign of gradient vector can provide valuable insight in detecting model poisoning attacks. Based on our theoretical analysis of the \textit{Little is Enough} attack, we propose a novel approach called \textit{SignGuard} to enable Byzantine-robust federated learning through collaborative malicious gradient filtering. More precisely, the received gradients are first processed to generate relevant magnitude, sign, and similarity statistics, which are then collaboratively utilized by multiple filters to eliminate malicious gradients before final aggregation. Finally, extensive experiments of image and text classification tasks are conducted under recently proposed attacks and defense strategies. The numerical results demonstrate the effectiveness and superiority of our proposed approach. The code is available at \textit{\url{https://github.com/JianXu95/SignGuard}}


Killing Two Birds with One Stone: Quantization Achieves Privacy in Distributed Learning

arXiv.org Artificial Intelligence

Communication efficiency and privacy protection are two critical issues in distributed machine learning. Existing methods tackle these two issues separately and may have a high implementation complexity that constrains their application in a resource-limited environment. We propose a comprehensive quantization-based solution that could simultaneously achieve communication efficiency and privacy protection, providing new insights into the correlated nature of communication and privacy. Specifically, we demonstrate the effectiveness of our proposed solutions in the distributed stochastic gradient descent (SGD) framework by adding binomial noise to the uniformly quantized gradients to reach the desired differential privacy level but with a minor sacrifice in communication efficiency. We theoretically capture the new trade-offs between communication, privacy, and learning performance.


Learning Locality and Isotropy in Dialogue Modeling

arXiv.org Artificial Intelligence

Existing dialogue modeling methods have achieved promising performance on various dialogue tasks with the aid of Transformer and the large-scale pre-trained language models. However, some recent studies revealed that the context representations produced by these methods suffer the problem of anisotropy. In this paper, we find that the generated representations are also not conversational, losing the conversation structure information during the context modeling stage. To this end, we identify two properties in dialogue modeling, i.e., locality and isotropy, and present a simple method for dialogue representation calibration, namely SimDRC, to build isotropic and conversational feature spaces. Experimental results show that our approach significantly outperforms current stateof-the-art models on three open-domain dialogue tasks with eight benchmarks across both automatic and human evaluation metrics. More in-depth analyses further confirm the effectiveness of our proposed approach. Dialogue modeling (Serban et al., 2016; Mehri et al., 2019; Liu et al., 2021) is to encode the raw text of the input dialogue to the contextual representations. Although the Transformer-based dialogue modeling methods (Hosseini-Asl et al., 2020; Liu et al., 2021) have achieved great success on various dialogue tasks, there are still some impediments in these methods that are not well explored nowadays. Specifically, recent studies (Ethayarajh, 2019; Su et al., 2022) have revealed that on dialogue generation tasks, the representations produced by existing dialogue modeling methods are anisotropic, i.e. features occupy a narrow cone in the vector space, thus leading to the problem of degeneration. To alleviate this problem, previous solutions (e.g. SimCTG) (Su et al., 2021; 2022) encourage the model to learn isotropic token embeddings by pushing away the representations of distinct tokens. While building the more discriminative and isotropic feature space, these methods still ignore learning dialogue-specific features, such as inter-speaker correlations and conversational structure information, in the dialogue modeling stage.


CSAGN: Conversational Structure Aware Graph Network for Conversational Semantic Role Labeling

arXiv.org Artificial Intelligence

Conversational semantic role labeling (CSRL) is believed to be a crucial step towards dialogue understanding. However, it remains a major challenge for existing CSRL parser to handle conversational structural information. In this paper, we present a simple and effective architecture for CSRL which aims to address this problem. Our model is based on a conversational structure-aware graph network which explicitly encodes the speaker dependent information. We also propose a multi-task learning method to further improve the model. Experimental results on benchmark datasets show that our model with our proposed training objectives significantly outperforms previous baselines.


DQ-SGD: Dynamic Quantization in SGD for Communication-Efficient Distributed Learning

arXiv.org Artificial Intelligence

Gradient quantization is an emerging technique in reducing communication costs in distributed learning. Existing gradient quantization algorithms often rely on engineering heuristics or empirical observations, lacking a systematic approach to dynamically quantize gradients. This paper addresses this issue by proposing a novel dynamically quantized SGD (DQ-SGD) framework, enabling us to dynamically adjust the quantization scheme for each gradient descent step by exploring the trade-off between communication cost and convergence error. We derive an upper bound, tight in some cases, of the convergence error for a restricted family of quantization schemes and loss functions. We design our DQ-SGD algorithm via minimizing the communication cost under the convergence error constraints. Finally, through extensive experiments on large-scale natural language processing and computer vision tasks on AG-News, CIFAR-10, and CIFAR-100 datasets, we demonstrate that our quantization scheme achieves better tradeoffs between the communication cost and learning performance than other state-of-the-art gradient quantization methods.


Domain-Adaptive Pretraining Methods for Dialogue Understanding

arXiv.org Artificial Intelligence

Language models like BERT and SpanBERT pretrained on open-domain data have obtained impressive gains on various NLP tasks. In this paper, we probe the effectiveness of domain-adaptive pretraining objectives on downstream tasks. In particular, three objectives, including a novel objective focusing on modeling predicate-argument relations, are evaluated on two challenging dialogue understanding tasks. Experimental results demonstrate that domain-adaptive pretraining with proper objectives can significantly improve the performance of a strong baseline on these tasks, achieving the new state-of-the-art performances.


Conversational Semantic Role Labeling

arXiv.org Artificial Intelligence

Semantic role labeling (SRL) aims to extract the arguments for each predicate in an input sentence. Traditional SRL can fail to analyze dialogues because it only works on every single sentence, while ellipsis and anaphora frequently occur in dialogues. To address this problem, we propose the conversational SRL task, where an argument can be the dialogue participants, a phrase in the dialogue history or the current sentence. As the existing SRL datasets are in the sentence level, we manually annotate semantic roles for 3,000 chit-chat dialogues (27,198 sentences) to boost the research in this direction. Experiments show that while traditional SRL systems (even with the help of coreference resolution or rewriting) perform poorly for analyzing dialogues, modeling dialogue histories and participants greatly helps the performance, indicating that adapting SRL to conversations is very promising for universal dialogue understanding. Our initial study by applying CSRL to two mainstream conversational tasks, dialogue response generation and dialogue context rewriting, also confirms the usefulness of CSRL.


Distributed Thompson Sampling

arXiv.org Artificial Intelligence

We study a cooperative multi-agent multi-armed bandits with M agents and K arms. The goal of the agents is to minimized the cumulative regret. We adapt a traditional Thompson Sampling algoirthm under the distributed setting. However, with agent's ability to communicate, we note that communication may further reduce the upper bound of the regret for a distributed Thompson Sampling approach. To further improve the performance of distributed Thompson Sampling, we propose a distributed Elimination based Thompson Sampling algorithm that allow the agents to learn collaboratively. We analyse the algorithm under Bernoulli reward and derived a problem dependent upper bound on the cumulative regret.