Zhao, Jin
MCTS-SQL: An Effective Framework for Text-to-SQL with Monte Carlo Tree Search
Yuan, Shuozhi, Chen, Liming, Yuan, Miaomiao, Zhao, Jin, Peng, Haoran, Guo, Wenming
Text-to-SQL is a fundamental and longstanding problem in the NLP area, aiming at converting natural language queries into SQL, enabling non-expert users to operate databases. Recent advances in LLM have greatly improved text-to-SQL performance. However, challenges persist, especially when dealing with complex user queries. Current approaches (e.g., COT prompting and multi-agent frameworks) rely on the ability of models to plan and generate SQL autonomously, but controlling performance remains difficult. In addition, LLMs are still prone to hallucinations. To alleviate these challenges, we designed a novel MCTS-SQL to guide SQL generation iteratively. The approach generates SQL queries through Monte Carlo Tree Search (MCTS) and a heuristic self-refinement mechanism are used to enhance accuracy and reliability. Key components include a schema selector for extracting relevant information and an MCTS-based generator for iterative query refinement. Experimental results from the SPIDER and BIRD benchmarks show that MCTS-SQL achieves state-of-the-art performance. Specifically, on the BIRD development dataset, MCTS-SQL achieves an Execution (EX) accuracy of 69.40% using GPT-4o as the base model and a significant improvement when dealing with challenging tasks, with an EX of 51.48%, which is 3.41% higher than the existing method.
A novel Trunk Branch-net PINN for flow and heat transfer prediction in porous medium
Xing, Haoyun, Jin, Kaiyan, Yao, Guice, Zhao, Jin, Xu, Dichu, Wen, Dongsheng
A novel Trunk-Branch (TB)-net physics-informed neural network (PINN) architecture is developed, which is a PINN-based method incorporating trunk and branch nets to capture both global and local features. The aim is to solve four main classes of problems: forward flow problem, forward heat transfer problem, inverse heat transfer problem, and transfer learning problem within the porous medium, which are notoriously complex that could not be handled by origin PINN. In the proposed TB-net PINN architecture, a Fully-connected Neural Network (FNN) is used as the trunk net, followed by separated FNNs as the branch nets with respect to outputs, and automatic differentiation is performed for partial derivatives of outputs with respect to inputs by considering various physical loss. The effectiveness and flexibility of the novel TB-net PINN architecture is demonstrated through a collection of forward problems, and transfer learning validates the feasibility of resource reuse. Combining with the superiority over traditional numerical methods in solving inverse problems, the proposed TB-net PINN shows its great potential for practical engineering applications.
LCFed: An Efficient Clustered Federated Learning Framework for Heterogeneous Data
Zhang, Yuxin, Chen, Haoyu, Lin, Zheng, Chen, Zhe, Zhao, Jin
Clustered federated learning (CFL) addresses the performance challenges posed by data heterogeneity in federated learning (FL) by organizing edge devices with similar data distributions into clusters, enabling collaborative model training tailored to each group. However, existing CFL approaches strictly limit knowledge sharing to within clusters, lacking the integration of global knowledge with intra-cluster training, which leads to suboptimal performance. Moreover, traditional clustering methods incur significant computational overhead, especially as the number of edge devices increases. In this paper, we propose LCFed, an efficient CFL framework to combat these challenges. By leveraging model partitioning and adopting distinct aggregation strategies for each sub-model, LCFed effectively incorporates global knowledge into intra-cluster co-training, achieving optimal training performance. Additionally, LCFed customizes a computationally efficient model similarity measurement method based on low-rank models, enabling real-time cluster updates with minimal computational overhead. Extensive experiments show that LCFed outperforms state-of-the-art benchmarks in both test accuracy and clustering computational efficiency.
KACDP: A Highly Interpretable Credit Default Prediction Model
Liu, Kun, Zhao, Jin
In today's financial field, individual credit risk prediction has become a crucial part in the risk management of financial institutions. Accurate default prediction can not only help financial institutions significantly reduce losses but also significantly improve the utilization rate of funds, thereby enhancing their competitiveness in the market [1] [2]. With the rapid development of financial technology, numerous machine learning and deep learning techniques are gradually being widely applied in credit risk assessment. However, the existing various methods inevitably expose certain limitations when dealing with high-dimensional and nonlinear data, among which the problems of interpretability and transparency are the most prominent [3]. Traditional credit risk prediction methods mainly include two categories: statistical models and machine learning models. The typical representative of statistical models, such as Logistic regression [4], has the advantage of being simple and easy to use. However, when dealing with complex data, due to relatively strict assumptions, it is often difficult to effectively capture nonlinear relationships. Machine learning models, such as Random Forest (RF) [5], Support Vector Machine (SVM) [6], and Extreme Gradient Boosting Machine (XGBoost) [7], although they perform relatively well in handling high-dimensional data, their interpretability is relatively poor and it is difficult to provide a clear and transparent decision-making process. Deep learning models, like Multi-Layer Perceptron (MLP) [8] and Recurrent Neural Network (RNN) [9], although they have strong expressive ability, in the practical application in the financial field, their black-box characteristics cause the model to severely lack transparency and interpretability, which undoubtedly becomes a major problem in the strictly regulated financial industry [10].
Neural Network-based High-index Saddle Dynamics Method for Searching Saddle Points and Solution Landscape
Liu, Yuankai, Zhang, Lei, Zhao, Jin
The high-index saddle dynamics (HiSD) method is a powerful approach for computing saddle points and solution landscape. However, its practical applicability is constrained by the need for the explicit energy function expression. To overcome this challenge, we propose a neural network-based high-index saddle dynamics (NN-HiSD) method. It utilizes neural network-based surrogate model to approximates the energy function, allowing the use of the HiSD method in the cases where the energy function is either unavailable or computationally expensive. We further enhance the efficiency of the NN-HiSD method by incorporating momentum acceleration techniques, specifically Nesterov's acceleration and the heavy-ball method. We also provide a rigorous convergence analysis of the NN-HiSD method. We conduct numerical experiments on systems with and without explicit energy functions, specifically including the alanine dipeptide model and bacterial ribosomal assembly intermediates for the latter, demonstrating the effectiveness and reliability of the proposed method.
SatFed: A Resource-Efficient LEO Satellite-Assisted Heterogeneous Federated Learning Framework
Zhang, Yuxin, Lin, Zheng, Chen, Zhe, Fang, Zihan, Zhu, Wenjun, Chen, Xianhao, Zhao, Jin, Gao, Yue
Traditional federated learning (FL) frameworks rely heavily on terrestrial networks, where coverage limitations and increasing bandwidth congestion significantly hinder model convergence. Fortunately, the advancement of low-Earth orbit (LEO) satellite networks offers promising new communication avenues to augment traditional terrestrial FL. Despite this potential, the limited satellite-ground communication bandwidth and the heterogeneous operating environments of ground devices-including variations in data, bandwidth, and computing power-pose substantial challenges for effective and robust satellite-assisted FL. To address these challenges, we propose SatFed, a resource-efficient satellite-assisted heterogeneous FL framework. SatFed implements freshness-based model prioritization queues to optimize the use of highly constrained satellite-ground bandwidth, ensuring the transmission of the most critical models. Additionally, a multigraph is constructed to capture real-time heterogeneous relationships between devices, including data distribution, terrestrial bandwidth, and computing capability. This multigraph enables SatFed to aggregate satellite-transmitted models into peer guidance, enhancing local training in heterogeneous environments. Extensive experiments with real-world LEO satellite networks demonstrate that SatFed achieves superior performance and robustness compared to state-of-the-art benchmarks.
A Survey on Large Language Models from Concept to Implementation
Wang, Chen, Zhao, Jin, Gong, Jiaqi
Recent advancements in Large Language Models (LLMs), particularly those built on Transformer architectures, have significantly broadened the scope of natural language processing (NLP) applications, transcending their initial use in chatbot technology. This paper investigates the multifaceted applications of these models, with an emphasis on the GPT series. This exploration focuses on the transformative impact of artificial intelligence (AI) driven tools in revolutionizing traditional tasks like coding and problem-solving, while also paving new paths in research and development across diverse industries. From code interpretation and image captioning to facilitating the construction of interactive systems and advancing computational domains, Transformer models exemplify a synergy of deep learning, data analysis, and neural network design. This survey provides an in-depth look at the latest research in Transformer models, highlighting their versatility and the potential they hold for transforming diverse application sectors, thereby offering readers a comprehensive understanding of the current and future landscape of Transformer-based LLMs in practical applications.
Exploring Lightweight Federated Learning for Distributed Load Forecasting
Duttagupta, Abhishek, Zhao, Jin, Shreejith, Shanker
Federated Learning (FL) is a distributed learning scheme that enables deep learning to be applied to sensitive data streams and applications in a privacy-preserving manner. This paper focuses on the use of FL for analyzing smart energy meter data with the aim to achieve comparable accuracy to state-of-the-art methods for load forecasting while ensuring the privacy of individual meter data. We show that with a lightweight fully connected deep neural network, we are able to achieve forecasting accuracy comparable to existing schemes, both at each meter source and at the aggregator, by utilising the FL framework. The use of lightweight models further reduces the energy and resource consumption caused by complex deep-learning models, making this approach ideally suited for deployment across resource-constrained smart meter systems. With our proposed lightweight model, we are able to achieve an overall average load forecasting RMSE of 0.17, with the model having a negligible energy overhead of 50 mWh when performing training and inference on an Arduino Uno platform.
FedAC: An Adaptive Clustered Federated Learning Framework for Heterogeneous Data
Zhang, Yuxin, Chen, Haoyu, Lin, Zheng, Chen, Zhe, Zhao, Jin
Clustered federated learning (CFL) is proposed to mitigate the performance deterioration stemming from data heterogeneity in federated learning (FL) by grouping similar clients for cluster-wise model training. However, current CFL methods struggle due to inadequate integration of global and intra-cluster knowledge and the absence of an efficient online model similarity metric, while treating the cluster count as a fixed hyperparameter limits flexibility and robustness. In this paper, we propose an adaptive CFL framework, named FedAC, which (1) efficiently integrates global knowledge into intra-cluster learning by decoupling neural networks and utilizing distinct aggregation methods for each submodule, significantly enhancing performance; (2) includes a costeffective online model similarity metric based on dimensionality reduction; (3) incorporates a cluster number fine-tuning module for improved adaptability and scalability in complex, heterogeneous environments. Extensive experiments show that FedAC achieves superior empirical performance, increasing the test accuracy by around 1.82% and 12.67% on CIFAR-10 and CIFAR-100 datasets, respectively, under different non-IID settings compared to SOTA methods.
Deep Learning-Assisted Simultaneous Targets Sensing and Super-Resolution Imaging
Zhao, Jin, Zhang, Huang Zhao, Chong, Ming-Zhe, Zhang, Yue-Yi, Zhang, Zi-Wen, Zhang, Zong-Kun, Du, Chao-Hai, Liu, Pu-Kun
Recently, metasurfaces have experienced revolutionary growth in the sensing and superresolution imaging field, due to their enabling of subwavelength manipulation of electromagnetic waves. However, the addition of metasurfaces multiplies the complexity of retrieving target information from the detected fields. Besides, although the deep learning method affords a compelling platform for a series of electromagnetic problems, many studies mainly concentrate on resolving one single function and limit the research's versatility. In this study, a multifunctional deep neural network is demonstrated to reconstruct target information in a metasurface targets interactive system. Firstly, the interactive scenario is confirmed to tolerate the system noises in a primary verification experiment. Then, fed with the electric field distributions, the multitask deep neural network can not only sense the quantity and permittivity of targets but also generate superresolution images with high precision. The deep learning method provides another way to recover targets' diverse information in metasurface based target detection, accelerating the progression of target reconstruction areas. This methodology may also hold promise for inverse reconstruction or forward prediction problems in other electromagnetic scenarios.