prospectuse
Why Bonds Fail Differently? Explainable Multimodal Learning for Multi-Class Default Prediction
Lu, Yi, Ling, Aifan, Wang, Chaoqun, Xu, Yaxin
In recent years, China's bond market has seen a surge in defaults amid regulatory reforms and macroeconomic volatility. Traditional machine learning models struggle to capture financial data's irregularity and temporal dependencies, while most deep learning models lack interpretability-critical for financial decision-making. To tackle these issues, we propose EMDLOT (Explainable Multimodal Deep Learning for Time-series), a novel framework for multi-class bond default prediction. EMDLOT integrates numerical time-series (financial/macroeconomic indicators) and unstructured textual data (bond prospectuses), uses Time-Aware LSTM to handle irregular sequences, and adopts soft clustering and multi-level attention to boost interpretability. Experiments on 1994 Chinese firms (2015-2024) show EMDLOT outperforms traditional (e.g., XGBoost) and deep learning (e.g., LSTM) benchmarks in recall, F1-score, and mAP, especially in identifying default/extended firms. Ablation studies validate each component's value, and attention analyses reveal economically intuitive default drivers. This work provides a practical tool and a trustworthy framework for transparent financial risk modeling.
Measuring Sustainability Intention of ESG Fund Disclosure using Few-Shot Learning
Singh, Mayank, Nafis, Nazia, Kumar, Abhijeet, Mishra, Mridul
Global sustainable fund universe encompasses open-end funds and exchange-traded funds (ETF) that, by prospectus or other regulatory filings, claim to focus on Environment, Social and Governance (ESG). Challengingly, the claims can only be confirmed by examining the textual disclosures to check if there is presence of intentionality and ESG focus on its investment strategy. Currently, there is no regulation to enforce sustainability in ESG products space. This paper proposes a unique method and system to classify and score the fund prospectuses in the sustainable universe regarding specificity and transparency of language. We aim to employ few-shot learners to identify specific, ambiguous, and generic sustainable investment-related language. Additionally, we construct a ratio metric to determine language score and rating to rank products and quantify sustainability claims for US sustainable universe. As a by-product, we publish manually annotated quality training dataset on Hugging Face (ESG-Prospectus-Clarity-Category under cc-by-nc-sa-4.0) of more than 1K ESG textual statements. The performance of the few-shot finetuning approach is compared with zero-shot models e.g., Llama-13B, GPT 3.5 Turbo etc. We found that prompting large language models are not accurate for domain specific tasks due to misalignment issues. The few-shot finetuning techniques outperform zero-shot models by large margins of more than absolute ~30% in precision, recall and F1 metrics on completely unseen ESG languages (test set). Overall, the paper attempts to establish a systematic and scalable approach to measure and rate sustainability intention quantitatively for sustainable funds using texts in prospectus. Regulatory bodies, investors, and advisors may utilize the findings of this research to reduce cognitive load in investigating or screening of ESG funds which accurately reflects the ESG intention.
NLP-based Decision Support System for Examination of Eligibility Criteria from Securities Prospectuses at the German Central Bank
Hรคnig, Christian, Schlรถsser, Markus, Hamotskyi, Serhii, Zambaku, Gent, Blankenburg, Janek
As part of its digitization initiative, the German Central Bank (Deutsche Bundesbank) wants to examine the extent to which natural Language Processing (NLP) can be used to make independent decisions upon the eligibility criteria of securities prospectuses. Every month, the Directorate General Markets at the German Central Bank receives hundreds of scanned prospectuses in PDF format, which must be manually processed to decide upon their eligibility. We found that this tedious and time-consuming process can be (semi-)automated by employing modern NLP model architectures, which learn the linguistic feature representation in text to identify the present eligible and ineligible criteria. The proposed Decision Support System provides decisions of document-level eligibility criteria accompanied by human-understandable explanations of the decisions. The aim of this project is to model the described use case and to evaluate the extent to which current research results from the field of NLP can be applied to this problem. After creating a heterogeneous domain-specific dataset containing annotations of eligible and non-eligible mentions of relevant criteria, we were able to successfully build, train and deploy a semi-automatic decider model. This model is based on transformer-based language models and decision trees, which integrate the established rule-based parts of the decision processes. Results suggest that it is possible to efficiently model the problem and automate decision making to more than 90% for many of the considered eligibility criteria.
Modeling Financial Products and their Supply Chains
Bjarnadottir, Margret, Raschid, Louiqa
The objective of this paper is to explore how financial big data and machine learning methods can be applied to model and understand financial products. We focus on residential mortgage backed securities, resMBS, which were at the heart of the 2008 US financial crisis. These securities are contained within a prospectus and have a complex waterfall payoff structure. Multiple financial institutions form a supply chain to create prospectuses. To model this supply chain, we use unsupervised probabilistic methods, particularly dynamic topics models (DTM), to extract a set of features (topics) reflecting community formation and temporal evolution along the chain. We then provide insight into the performance of the resMBS securities and the impact of the supply chain through a series of increasingly comprehensive models. First, models at the security level directly identify salient features of resMBS securities that impact their performance. We then extend the model to include prospectus level features and demonstrate that the composition of the prospectus is significant. Our model also shows that communities along the supply chain that are associated with the generation of the prospectuses and securities have an impact on performance. We are the first to show that toxic communities that are closely linked to financial institutions that played a key role in the subprime crisis can increase the risk of failure of resMBS securities.