Goto

Collaborating Authors

 Lv, Jiancheng


Personalized Federated Learning with Hidden Information on Personalized Prior

arXiv.org Artificial Intelligence

Federated learning (FL for simplification) is a distributed machine learning technique that utilizes global servers and collaborative clients to achieve privacy-preserving global model training without direct data sharing. However, heterogeneous data problem, as one of FL's main problems, makes it difficult for the global model to perform effectively on each client's local data. Thus, personalized federated learning (PFL for simplification) aims to improve the performance of the model on local data as much as possible. Bayesian learning, where the parameters of the model are seen as random variables with a prior assumption, is a feasible solution to the heterogeneous data problem due to the tendency that the more local data the model use, the more it focuses on the local data, otherwise focuses on the prior. When Bayesian learning is applied to PFL, the global model provides global knowledge as a prior to the local training process. In this paper, we employ Bayesian learning to model PFL by assuming a prior in the scaled exponential family, and therefore propose pFedBreD, a framework to solve the problem we model using Bregman divergence regularization. Empirically, our experiments show that, under the prior assumption of the spherical Gaussian and the first order strategy of mean selection, our proposal significantly outcompetes other PFL algorithms on multiple public benchmarks.


Partial Differential Equations Meet Deep Neural Networks: A Survey

arXiv.org Artificial Intelligence

Many problems in science and engineering can be represented by a set of partial differential equations (PDEs) through mathematical modeling. Mechanism-based computation following PDEs has long been an essential paradigm for studying topics such as computational fluid dynamics, multiphysics simulation, molecular dynamics, or even dynamical systems. It is a vibrant multi-disciplinary field of increasing importance and with extraordinary potential. At the same time, solving PDEs efficiently has been a long-standing challenge. Generally, except for a few differential equations for which analytical solutions are directly available, many more equations must rely on numerical approaches such as the finite difference method, finite element method, finite volume method, and boundary element method to be solved approximately. These numerical methods usually divide a continuous problem domain into discrete points and then concentrate on solving the system at each of those points. Though the effectiveness of these traditional numerical methods, the vast number of iterative operations accompanying each step forward significantly reduces the efficiency. Recently, another equally important paradigm, data-based computation represented by deep learning, has emerged as an effective means of solving PDEs. Surprisingly, a comprehensive review for this interesting subfield is still lacking. This survey aims to categorize and review the current progress on Deep Neural Networks (DNNs) for PDEs. We discuss the literature published in this subfield over the past decades and present them in a common taxonomy, followed by an overview and classification of applications of these related methods in scientific research and engineering scenarios. The origin, developing history, character, sort, as well as the future trends in each potential direction of this subfield are also introduced.


Face Deblurring Based on Separable Normalization and Adaptive Denormalization

arXiv.org Artificial Intelligence

Face deblurring aims to restore a clear face image from a blurred input image with more explicit structure and facial details. However, most conventional image and face deblurring methods focus on the whole generated image resolution without consideration of special face part texture and generally produce unsufficient details. Considering that faces and backgrounds have different distribution information, in this study, we designed an effective face deblurring network based on separable normalization and adaptive denormalization (SNADNet). First, We fine-tuned the face parsing network to obtain an accurate face structure. Then, we divided the face parsing feature into face foreground and background. Moreover, we constructed a new feature adaptive denormalization to regularize fafcial structures as a condition of the auxiliary to generate more harmonious and undistorted face structure. In addition, we proposed a texture extractor and multi-patch discriminator to enhance the generated facial texture information. Experimental results on both CelebA and CelebA-HQ datasets demonstrate that the proposed face deblurring network restores face structure with more facial details and performs favorably against state-of-the-art methods in terms of structured similarity indexing method (SSIM), peak signal-to-noise ratio (PSNR), Frechet inception distance (FID) and L1, and qualitative comparisons.


TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance

arXiv.org Artificial Intelligence

Hybrid data combining both tabular and textual content (e.g., financial reports) are quite pervasive in the real world. However, Question Answering (QA) over such hybrid data is largely neglected in existing research. In this work, we extract samples from real financial reports to build a new large-scale QA dataset containing both Tabular And Textual data, named TAT-QA, where numerical reasoning is usually required to infer the answer, such as addition, subtraction, multiplication, division, counting, comparison/sorting, and the compositions. We further propose a novel QA model termed TAGOP, which is capable of reasoning over both tables and text. It adopts sequence tagging to extract relevant cells from the table along with relevant spans from the text to infer their semantics, and then applies symbolic reasoning over them with a set of aggregation operators to arrive at the final answer. TAGOPachieves 58.0% inF1, which is an 11.1% absolute increase over the previous best baseline model, according to our experiments on TAT-QA. But this result still lags far behind performance of expert human, i.e.90.8% in F1. It is demonstrated that our TAT-QA is very challenging and can serve as a benchmark for training and testing powerful QA models that address hybrid form data.


Prediction, Selection, and Generation: Exploration of Knowledge-Driven Conversation System

arXiv.org Artificial Intelligence

In open-domain conversational systems, it is important but challenging to leverage background knowledge. We can use the incorporation of knowledge to make the generation of dialogue controllable, and can generate more diverse sentences that contain real knowledge. In this paper, we combine the knowledge bases and pre-training model to propose a knowledge-driven conversation system. The system includes modules such as dialogue topic prediction, knowledge matching and dialogue generation. Based on this system, we study the performance factors that maybe affect the generation of knowledge-driven dialogue: topic coarse recall algorithm, number of knowledge choices, generation model choices, etc., and finally made the system reach state-of-the-art. These experimental results will provide some guiding significance for the future research of this task. As far as we know, this is the first work to study and analyze the effects of the related factors.


LANA: Towards Personalized Deep Knowledge Tracing Through Distinguishable Interactive Sequences

arXiv.org Artificial Intelligence

In educational applications, Knowledge Tracing (KT), the problem of accurately predicting students' responses to future questions by summarizing their knowledge states, has been widely studied for decades as it is considered a fundamental task towards adaptive online learning. Among all the proposed KT methods, Deep Knowledge Tracing (DKT) and its variants are by far the most effective ones due to the high flexibility of the neural network. However, DKT often ignores the inherent differences between students (e.g. memory skills, reasoning skills, ...), averaging the performances of all students, leading to the lack of personalization, and therefore was considered insufficient for adaptive learning. To alleviate this problem, in this paper, we proposed Leveled Attentive KNowledge TrAcing (LANA), which firstly uses a novel student-related features extractor (SRFE) to distill students' unique inherent properties from their respective interactive sequences. Secondly, the pivot module was utilized to dynamically reconstruct the decoder of the neural network on attention of the extracted features, successfully distinguishing the performance between students over time. Moreover, inspired by Item Response Theory (IRT), the interpretable Rasch model was used to cluster students by their ability levels, and thereby utilizing leveled learning to assign different encoders to different groups of students. With pivot module reconstructed the decoder for individual students and leveled learning specialized encoders for groups, personalized DKT was achieved. Extensive experiments conducted on two real-world large-scale datasets demonstrated that our proposed LANA improves the AUC score by at least 1.00% (i.e. EdNet 1.46% and RAIEd2020 1.00%), substantially surpassing the other State-Of-The-Art KT methods.


HLSGD Hierarchical Local SGD With Stale Gradients Featuring

arXiv.org Machine Learning

While distributed training significantly speeds up the training process of the deep neural network (DNN), the utilization of the cluster is relatively low due to the time-consuming data synchronizing between workers. To alleviate this problem, a novel Hierarchical Parallel SGD (HPSGD) strategy is proposed based on the observation that the data synchronization phase can be paralleled with the local training phase (i.e., Feed-forward and back-propagation). Furthermore, an improved model updating method is unitized to remedy the introduced stale gradients problem, which commits updates to the replica (i.e., a temporary model that has the same parameters as the global model) and then merges the average changes to the global model. Extensive experiments are conducted to demonstrate that the proposed HPSGD approach substantially boosts the distributed DNN training, reduces the disturbance of the stale gradients and achieves better accuracy in given fixed wall-time.


DBS: Dynamic Batch Size For Distributed Deep Neural Network Training

arXiv.org Machine Learning

Synchronous strategies with data parallelism, such as the Synchronous StochasticGradient Descent (S-SGD) and the model averaging methods, are widely utilizedin distributed training of Deep Neural Networks (DNNs), largely owing to itseasy implementation yet promising performance. Particularly, each worker ofthe cluster hosts a copy of the DNN and an evenly divided share of the datasetwith the fixed mini-batch size, to keep the training of DNNs convergence. In thestrategies, the workers with different computational capability, need to wait foreach other because of the synchronization and delays in network transmission,which will inevitably result in the high-performance workers wasting computation.Consequently, the utilization of the cluster is relatively low. To alleviate thisissue, we propose the Dynamic Batch Size (DBS) strategy for the distributedtraining of DNNs. Specifically, the performance of each worker is evaluatedfirst based on the fact in the previous epoch, and then the batch size and datasetpartition are dynamically adjusted in consideration of the current performanceof the worker, thereby improving the utilization of the cluster. To verify theeffectiveness of the proposed strategy, extensive experiments have been conducted,and the experimental results indicate that the proposed strategy can fully utilizethe performance of the cluster, reduce the training time, and have good robustnesswith disturbance by irrelevant tasks. Furthermore, rigorous theoretical analysis hasalso been provided to prove the convergence of the proposed strategy.


Unsupervised Multi-Manifold Clustering by Learning Deep Representation

AAAI Conferences

In this paper, we propose a novel deep manifold clustering (DMC) method for learning effective deep representations and partitioning a dataset into clusters where each cluster contains data points from a single nonlinear manifold. Different from other previous research efforts, we adopt deep neural network to classify and parameterize unlabeled data which lie on multiple manifolds. Firstly, motivated by the observation that nearby points lie on the local of manifold should possess similar representations, a locality preserving objective is defined to iteratively explore data relation and learn structure preserving representations. Secondly, by finding the corresponding cluster centers from the representations, a clustering-oriented objective is then proposed to guide the model to extract both discriminative and cluster-specific representations. Finally, by integrating two objectives into a single model with a unified cost function and optimizing it by using back propagation, we can obtain not only more powerful representations, but also more precise clusters of data. In addition, our model can be intuitively extended to cluster out-of-sample datum. The experimental results and comparisons with existing state-of-the-art methods show that the proposed method consistently achieves the best performance on various benchmark datasets.