Goto

Collaborating Authors

 Liu, Jinsong


A Deep Learning Framework Integrating CNN and BiLSTM for Financial Systemic Risk Analysis and Prediction

arXiv.org Artificial Intelligence

This study proposes a deep learning model based on the combination of convolutional neural network (CNN) and bidirectional long short-term memory network (BiLSTM) for discriminant analysis of financial systemic risk. The model first uses CNN to extract local patterns of multidimensional features of financial markets, and then models the bidirectional dependency of time series through BiLSTM, to comprehensively characterize the changing laws of systemic risk in spatial features and temporal dynamics. The experiment is based on real financial data sets. The results show that the model is significantly superior to traditional single models (such as BiLSTM, CNN, Transformer, and TCN) in terms of accuracy, recall, and F1 score. The F1-score reaches 0.88, showing extremely high discriminant ability. This shows that the joint strategy of combining CNN and BiLSTM can not only fully capture the complex patterns of market data but also effectively deal with the long-term dependency problem in time series data. In addition, this study also explores the robustness of the model in dealing with data noise and processing high-dimensional data, providing strong support for intelligent financial risk management. In the future, the research will further optimize the model structure, introduce methods such as reinforcement learning and multimodal data analysis, and improve the efficiency and generalization ability of the model to cope with a more complex financial environment.


Leveraging Convolutional Neural Network-Transformer Synergy for Predictive Modeling in Risk-Based Applications

arXiv.org Artificial Intelligence

With the development of the financial industry, credit default prediction, as an important task in financial risk management, has received increasing attention. Traditional credit default prediction methods mostly rely on machine learning models, such as decision trees and random forests, but these methods have certain limitations in processing complex data and capturing potential risk patterns. To this end, this paper proposes a deep learning model based on the combination of convolutional neural networks (CNN) and Transformer for credit user default prediction. The model combines the advantages of CNN in local feature extraction with the ability of Transformer in global dependency modeling, effectively improving the accuracy and robustness of credit default prediction. Through experiments on public credit default datasets, the results show that the CNN+Transformer model outperforms traditional machine learning models, such as random forests and XGBoost, in multiple evaluation indicators such as accuracy, AUC, and KS value, demonstrating its powerful ability in complex financial data modeling. Further experimental analysis shows that appropriate optimizer selection and learning rate adjustment play a vital role in improving model performance. In addition, the ablation experiment of the model verifies the advantages of the combination of CNN and Transformer and proves the complementarity of the two in credit default prediction. This study provides a new idea for credit default prediction and provides strong support for risk assessment and intelligent decision-making in the financial field. Future research can further improve the prediction effect and generalization ability by introducing more unstructured data and improving the model architecture.


Reward Learning From Preference With Ties

arXiv.org Artificial Intelligence

Reinforcement learning from human feedback (RLHF) (Christiano et al., 2017; Ziegler et al., 2019; Ouyang et al., 2022) has played a pivotal role in aligning large language models (LLMs) (Kenton et al., 2021), enhancing specific capabilities of LLMs in various fields, such as summarization (Stiennon et al., 2020), coding (Gao et al., 2023), and medical assistance (Moor et al., 2023). A crucial component of the RLHF process is the reward model, which serves as the primary mechanism for integrating human preferences and feedback into the learning process (Wang et al., 2024). The reward model guides the optimization procedure of RLHF towards objectives aligned with human preferences (Kaufmann et al., 2023). Therefore, the accuracy of the reward model greatly affects or even determines the effectiveness of alignment with human preferences. Moreover, the direct preference optimization (DPO) method (Rafailov et al., 2024) utilizes LLMs to implicitly represent the reward model through mathematical transformations, bypassing the complex RL optimization phase and focusing solely on the reward modeling phase. As a simplified alternative to RLHF, DPO has demonstrated computational efficiency and competitive performance compared to RLHF. To learn a reward model from human preferences, obtaining high-quality human preference data is crucial (Wang et al., 2024), typically achieved by having human labelers annotate previously collected data consisting of a prompt and a pair of responses (Ouyang et al., 2022; Bai et al., 2022).


Stochastic Dimension-reduced Second-order Methods for Policy Optimization

arXiv.org Artificial Intelligence

In this paper, we propose several new stochastic second-order algorithms for policy optimization that only require gradient and Hessian-vector product in each iteration, making them computationally efficient and comparable to policy gradient methods. Specifically, we propose a dimension-reduced second-order method (DR-SOPO) which repeatedly solves a projected two-dimensional trust region subproblem. We show that DR-SOPO obtains an $\mathcal{O}(\epsilon^{-3.5})$ complexity for reaching approximate first-order stationary condition and certain subspace second-order stationary condition. In addition, we present an enhanced algorithm (DVR-SOPO) which further improves the complexity to $\mathcal{O}(\epsilon^{-3})$ based on the variance reduction technique. Preliminary experiments show that our proposed algorithms perform favorably compared with stochastic and variance-reduced policy gradient methods.