Wang, Ke
Disentangled Representation for Causal Mediation Analysis
Xu, Ziqi, Cheng, Debo, Li, Jiuyong, Liu, Jixue, Liu, Lin, Wang, Ke
Estimating direct and indirect causal effects from observational data is crucial to understanding the causal mechanisms and predicting the behaviour under different interventions. Causal mediation analysis is a method that is often used to reveal direct and indirect effects. Deep learning shows promise in mediation analysis, but the current methods only assume latent confounders that affect treatment, mediator and outcome simultaneously, and fail to identify different types of latent confounders (e.g., confounders that only affect the mediator or outcome). Furthermore, current methods are based on the sequential ignorability assumption, which is not feasible for dealing with multiple types of latent confounders. This work aims to circumvent the sequential ignorability assumption and applies the piecemeal deconfounding assumption as an alternative. We propose the Disentangled Mediation Analysis Variational AutoEncoder (DMAVAE), which disentangles the representations of latent confounders into three types to accurately estimate the natural direct effect, natural indirect effect and total effect. Experimental results show that the proposed method outperforms existing methods and has strong generalisation ability. We further apply the method to a real-world dataset to show its potential application.
Improving Neural Machine Translation by Multi-Knowledge Integration with Prompting
Wang, Ke, Xie, Jun, Zhang, Yuqi, Zhao, Yu
Improving neural machine translation (NMT) systems with prompting has achieved significant progress in recent years. In this work, we focus on how to integrate multi-knowledge, multiple types of knowledge, into NMT models to enhance the performance with prompting. We propose a unified framework, which can integrate effectively multiple types of knowledge including sentences, terminologies/phrases and translation templates into NMT models. We utilize multiple types of knowledge as prefix-prompts of input for the encoder and decoder of NMT models to guide the translation process. The approach requires no changes to the model architecture and effectively adapts to domain-specific translation without retraining. The experiments on English-Chinese and English-German translation demonstrate that our approach significantly outperform strong baselines, achieving high translation quality and terminology match accuracy.
i-Razor: A Differentiable Neural Input Razor for Feature Selection and Dimension Search in DNN-Based Recommender Systems
Yao, Yao, Liu, Bin, He, Haoxun, Sheng, Dakui, Wang, Ke, Xiao, Li, Cao, Huanhuan
Input features play a crucial role in DNN-based recommender systems with thousands of categorical and continuous fields from users, items, contexts, and interactions. Noisy features and inappropriate embedding dimension assignments can deteriorate the performance of recommender systems and introduce unnecessary complexity in model training and online serving. Optimizing the input configuration of DNN models, including feature selection and embedding dimension assignment, has become one of the essential topics in feature engineering. However, in existing industrial practices, feature selection and dimension search are optimized sequentially, i.e., feature selection is performed first, followed by dimension search to determine the optimal dimension size for each selected feature. Such a sequential optimization mechanism increases training costs and risks generating suboptimal input configurations. To address this problem, we propose a differentiable neural input razor (i-Razor) that enables joint optimization of feature selection and dimension search. Concretely, we introduce an end-to-end differentiable model to learn the relative importance of different embedding regions of each feature. Furthermore, a flexible pruning algorithm is proposed to achieve feature filtering and dimension derivation simultaneously. Extensive experiments on two large-scale public datasets in the Click-Through-Rate (CTR) prediction task demonstrate the efficacy and superiority of i-Razor in balancing model complexity and performance.
Pi-DUAL: Using Privileged Information to Distinguish Clean from Noisy Labels
Wang, Ke, Ortiz-Jimenez, Guillermo, Jenatton, Rodolphe, Collier, Mark, Kokiopoulou, Efi, Frossard, Pascal
Label noise is a pervasive problem in deep learning that often compromises the generalization performance of trained models. Recently, leveraging privileged information (PI) -- information available only during training but not at test time -- has emerged as an effective approach to mitigate this issue. Yet, existing PI-based methods have failed to consistently outperform their no-PI counterparts in terms of preventing overfitting to label noise. To address this deficiency, we introduce Pi-DUAL, an architecture designed to harness PI to distinguish clean from wrong labels. Pi-DUAL decomposes the output logits into a prediction term, based on conventional input features, and a noise-fitting term influenced solely by PI. A gating mechanism steered by PI adaptively shifts focus between these terms, allowing the model to implicitly separate the learning paths of clean and wrong labels. Empirically, Pi-DUAL achieves significant performance improvements on key PI benchmarks (e.g., +6.8% on ImageNet-PI), establishing a new state-of-the-art test set accuracy. Additionally, Pi-DUAL is a potent method for identifying noisy samples post-training, outperforming other strong methods at this task. Overall, Pi-DUAL is a simple, scalable and practical approach for mitigating the effects of label noise in a variety of real-world scenarios with PI.
Synslator: An Interactive Machine Translation Tool with Online Learning
Wang, Jiayi, Wang, Ke, Zhou, Fengming, Wang, Chengyu, Fu, Zhiyong, Feng, Zeyu, Zhao, Yu, Zhang, Yuqi
Interactive machine translation (IMT) has emerged as a progression of the computer-aided translation paradigm, where the machine translation system and the human translator collaborate to produce high-quality translations. This paper introduces Synslator, a user-friendly computer-aided translation (CAT) tool that not only supports IMT, but is adept at online learning with real-time translation memories. To accommodate various deployment environments for CAT services, Synslator integrates two different neural translation models to handle translation memories for online learning. Additionally, the system employs a language model to enhance the fluency of translations in an interactive mode. In evaluation, we have confirmed the effectiveness of online learning through the translation models, and have observed a 13% increase in post-editing efficiency with the interactive functionalities of Synslator. A tutorial video is available at:https://youtu.be/K0vRsb2lTt8.
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning
Wang, Ke, Ren, Houxing, Zhou, Aojun, Lu, Zimu, Luo, Sichun, Shi, Weikang, Zhang, Renrui, Song, Linqi, Zhan, Mingjie, Li, Hongsheng
The recently released GPT-4 Code Interpreter has demonstrated remarkable proficiency in solving challenging math problems, primarily attributed to its ability to seamlessly reason with natural language, generate code, execute code, and continue reasoning based on the execution output. In this paper, we present a method to fine-tune open-source language models, enabling them to use code for modeling and deriving math equations and, consequently, enhancing their mathematical reasoning abilities. We propose a method of generating novel and high-quality datasets with math problems and their code-based solutions, referred to as MathCodeInstruct. Each solution interleaves natural language, code, and execution results. We also introduce a customized supervised fine-tuning and inference approach. This approach yields the MathCoder models, a family of models capable of generating code-based solutions for solving challenging math problems. Impressively, the MathCoder models achieve state-of-the-art scores among open-source LLMs on the MATH (45.2%) and GSM8K (83.9%) datasets, substantially outperforming other open-source alternatives. Notably, the MathCoder model not only surpasses ChatGPT-3.5 and PaLM-2 on GSM8K and MATH but also outperforms GPT-4 on the competition-level MATH dataset. The dataset and models will be released at https://github.com/mathllm/MathCoder.
Artificial Intelligence Techniques for Next-Generation Mega Satellite Networks
Homssi, Bassel Al, Dakic, Kosta, Wang, Ke, Alpcan, Tansu, Allen, Ben, Boyce, Russell, Kandeepan, Sithamparanathan, Al-Hourani, Akram, Saad, Walid
Space communications, particularly massive satellite networks, re-emerged as an appealing candidate for next generation networks due to major advances in space launching, electronics, processing power, and miniaturization. However, massive satellite networks rely on numerous underlying and intertwined processes that cannot be truly captured using conventionally used models, due to their dynamic and unique features such as orbital speed, inter-satellite links, short pass time, and satellite footprint, among others. Hence, new approaches are needed to enable the network to proactively adjust to the rapidly varying conditions associated within the link. Artificial intelligence (AI) provides a pathway to capture these processes, analyze their behavior, and model their effect on the network. This article introduces the application of AI techniques for integrated terrestrial satellite networks, particularly massive satellite network communications. It details the unique features of massive satellite networks, and the overarching challenges concomitant with their integration into the current communication infrastructure. Moreover, this article provides insights into state-of-the-art AI techniques across various layers of the communication link. This entails applying AI for forecasting the highly dynamic radio channel, spectrum sensing and classification, signal detection and demodulation, inter-satellite and satellite access network optimization, and network security. Moreover, future paradigms and the mapping of these mechanisms onto practical networks are outlined.
Computing SHAP Efficiently Using Model Structure Information
Hu, Linwei, Wang, Ke
SHAP (SHapley Additive exPlanations) has become a popular method to attribute the prediction of a machine learning model on an input to its features. One main challenge of SHAP is the computation time. An exact computation of Shapley values requires exponential time complexity. Therefore, many approximation methods are proposed in the literature. In this paper, we propose methods that can compute SHAP exactly in polynomial time or even faster for SHAP definitions that satisfy our additivity and dummy assumptions (eg, kernal SHAP and baseline SHAP). We develop different strategies for models with different levels of model structure information: known functional decomposition, known order of model (defined as highest order of interaction in the model), or unknown order. For the first case, we demonstrate an additive property and a way to compute SHAP from the lower-order functional components. For the second case, we derive formulas that can compute SHAP in polynomial time. Both methods yield exact SHAP results. Finally, if even the order of model is unknown, we propose an iterative way to approximate Shapley values. The three methods we propose are computationally efficient when the order of model is not high which is typically the case in practice. We compare with sampling approach proposed in Castor & Gomez (2008) using simulation studies to demonstrate the efficacy of our proposed methods.
RestGPT: Connecting Large Language Models with Real-World RESTful APIs
Song, Yifan, Xiong, Weimin, Zhu, Dawei, Wu, Wenhao, Qian, Han, Song, Mingbo, Huang, Hailiang, Li, Cheng, Wang, Ke, Yao, Rong, Tian, Ye, Li, Sujian
Tool-augmented large language models (LLMs) have achieved remarkable progress in tackling a broad range of tasks. However, existing methods are mainly restricted to specifically designed tools and fail to fulfill complex instructions, having great limitations when confronted with real-world scenarios. In this paper, we explore a more realistic scenario by connecting LLMs with RESTful APIs, which adhere to the widely adopted REST software architectural style for web service development. To address the practical challenges of tackling complex instructions, we propose RestGPT, which exploits the power of LLMs and conducts a coarse-to-fine online planning mechanism to enhance the abilities of task decomposition and API selection. RestGPT also contains an API executor tailored for calling RESTful APIs, which can meticulously formulate parameters and parse API responses. To fully evaluate the performance of RestGPT, we propose RestBench, a high-quality benchmark which consists of two real-world scenarios and human-annotated instructions with gold solution paths. Experiments show that RestGPT is able to achieve impressive results in complex tasks and has strong robustness, which paves a new way towards AGI.
Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification
Zhou, Aojun, Wang, Ke, Lu, Zimu, Shi, Weikang, Luo, Sichun, Qin, Zipeng, Lu, Shaoqing, Jia, Anya, Song, Linqi, Zhan, Mingjie, Li, Hongsheng
Recent progress in large language models (LLMs) like GPT-4 and PaLM-2 has brought significant advancements in addressing math reasoning problems. In particular, OpenAI's latest version of GPT-4, known as GPT-4 Code Interpreter, shows remarkable performance on challenging math datasets. In this paper, we explore the effect of code on enhancing LLMs' reasoning capability by introducing different constraints on the Code Usage Frequency of GPT-4 Code Interpreter. We found that its success can be largely attributed to its powerful skills in generating and executing code, evaluating the output of code execution, and rectifying its solution when receiving unreasonable outputs. Based on this insight, we propose a novel and effective prompting method, explicit code-based self-verification (CSV), to further boost the mathematical reasoning potential of GPT-4 Code Interpreter. This method employs a zero-shot prompt on GPT-4 Code Interpreter to encourage it to use code to self-verify its answers. In instances where the verification state registers as "False", the model shall automatically amend its solution, analogous to our approach of rectifying errors during a mathematics examination. Furthermore, we recognize that the states of the verification result indicate the confidence of a solution, which can improve the effectiveness of majority voting. With GPT-4 Code Interpreter and CSV, we achieve an impressive zero-shot accuracy on MATH dataset (53.9% 84.3%). Large language models (LLMs) (Brown et al., 2020; OpenAI, 2023; Anil et al., 2023) have shown impressive success in various tasks, such as common sense understanding and code generation. However, they still fall short in mathematical reasoning, often producing nonsensical or inaccurate content and struggling with complex calculations. Previous attempts to tackle these challenges include the Chain-of-Thought (CoT) (Wei et al., 2022) framework, which enhances LLMs' logical reasoning abilities by generating intermediate steps in their reasoning process.