Goto

Collaborating Authors

 borrower


How Data Quality Affects Machine Learning Models for Credit Risk Assessment

Maurino, Andrea

arXiv.org Artificial Intelligence

Machine Learning (ML) models are being increasingly employed for credit risk evaluation, with their effectiveness largely hinging on the quality of the input data. In this paper we investigate the impact of several data quality issues, including missing values, noisy attributes, outliers, and label errors, on the predictive accuracy of the machine learning model used in credit risk assessment. Utilizing an open-source dataset, we introduce controlled data corruption using the Pucktrick library to assess the robustness of 10 frequently used models like Random Forest, SVM, and Logistic Regression and so on. Our experiments show significant differences in model robustness based on the nature and severity of the data degradation. Moreover, the proposed methodology and accompanying tools offer practical support for practitioners seeking to enhance data pipeline robustness, and provide researchers with a flexible framework for further experimentation in data-centric AI contexts.


Fairness under Competition

Gradwohl, Ronen, Shapira, Eilam, Tennenholtz, Moshe

arXiv.org Artificial Intelligence

Algorithmic fairness has emerged as a central issue in ML, and it has become standard practice to adjust ML algorithms so that they will satisfy fairness requirements such as Equal Opportunity. In this paper we consider the effects of adopting such fair classifiers on the overall level of ecosystem fairness. Specifically, we introduce the study of fairness with competing firms, and demonstrate the failure of fair classifiers in yielding fair ecosystems. Our results quantify the loss of fairness in systems, under a variety of conditions, based on classifiers' correlation and the level of their data overlap. We show that even if competing classifiers are individually fair, the ecosystem's outcome may be unfair; and that adjusting biased algorithms to improve their individual fairness may lead to an overall decline in ecosystem fairness. In addition to these theoretical results, we also provide supporting experimental evidence. Together, our model and results provide a novel and essential call for action.


Balancing Performance and Reject Inclusion: A Novel Confident Inlier Extrapolation Framework for Credit Scoring

Ribeiro, Athyrson Machado, Raimundo, Marcos Medeiros

arXiv.org Artificial Intelligence

Reject Inference (RI) methods aim to address sample bias by inferring missing repayment data for rejected credit applicants. Traditional approaches often assume that the behavior of rejected clients can be extrapolated from accepted clients, despite potential distributional differences between the two populations. To mitigate this blind extrapolation, we propose a novel Confident Inlier Extrapolation framework (CI-EX). CI-EX iteratively identifies the distribution of rejected client samples using an outlier detection model and assigns labels to rejected individuals closest to the distribution of the accepted population based on probabilities derived from a supervised classification model. The effectiveness of our proposed framework is validated through experiments on two large real-world credit datasets. Performance is evaluated using the Area Under the Curve (AUC) as well as RI-specific metrics such as Kickout and a novel metric introduced in this work, denoted as Area under the Kickout. Our findings reveal that RI methods, including the proposed framework, generally involve a trade-off between AUC and RI-specific metrics. However, the proposed CI-EX framework consistently outperforms existing RI models from the credit literature in terms of RI-specific metrics while maintaining competitive performance in AUC across most experiments.


Hey AI! Can ChatGPT help you to manage your money?

The Guardian

Artificial intelligence seems to have touched every part of our lives. But can it help us manage our money? We put some common personal finance questions to the free version of ChatGPT, one of the most well-known AI chatbots, and asked for its help. Then we gave the answers to some – human – experts and asked them what they thought. We asked: I am 35 years old and want to ensure I have a comfortable retirement. I earn about 35,000 a year and have a workplace pension, in which I have saved 20,000.


KACDP: A Highly Interpretable Credit Default Prediction Model

Liu, Kun, Zhao, Jin

arXiv.org Artificial Intelligence

In today's financial field, individual credit risk prediction has become a crucial part in the risk management of financial institutions. Accurate default prediction can not only help financial institutions significantly reduce losses but also significantly improve the utilization rate of funds, thereby enhancing their competitiveness in the market [1] [2]. With the rapid development of financial technology, numerous machine learning and deep learning techniques are gradually being widely applied in credit risk assessment. However, the existing various methods inevitably expose certain limitations when dealing with high-dimensional and nonlinear data, among which the problems of interpretability and transparency are the most prominent [3]. Traditional credit risk prediction methods mainly include two categories: statistical models and machine learning models. The typical representative of statistical models, such as Logistic regression [4], has the advantage of being simple and easy to use. However, when dealing with complex data, due to relatively strict assumptions, it is often difficult to effectively capture nonlinear relationships. Machine learning models, such as Random Forest (RF) [5], Support Vector Machine (SVM) [6], and Extreme Gradient Boosting Machine (XGBoost) [7], although they perform relatively well in handling high-dimensional data, their interpretability is relatively poor and it is difficult to provide a clear and transparent decision-making process. Deep learning models, like Multi-Layer Perceptron (MLP) [8] and Recurrent Neural Network (RNN) [9], although they have strong expressive ability, in the practical application in the financial field, their black-box characteristics cause the model to severely lack transparency and interpretability, which undoubtedly becomes a major problem in the strictly regulated financial industry [10].


Simulate and Optimise: A two-layer mortgage simulator for designing novel mortgage assistance products

Ardon, Leo, Evans, Benjamin Patrick, Garg, Deepeka, Narayanan, Annapoorani Lakshmi, Henry-Nickie, Makada, Ganesh, Sumitra

arXiv.org Artificial Intelligence

We develop a novel two-layer approach for optimising mortgage relief products through a simulated multi-agent mortgage environment. While the approach is generic, here the environment is calibrated to the US mortgage market based on publicly available census data and regulatory guidelines. Through the simulation layer, we assess the resilience of households to exogenous income shocks, while the optimisation layer explores strategies to improve the robustness of households to these shocks by making novel mortgage assistance products available to households. Households in the simulation are adaptive, learning to make mortgage-related decisions (such as product enrolment or strategic foreclosures) that maximize their utility, balancing their available liquidity and equity. We show how this novel two-layer simulation approach can successfully design novel mortgage assistance products to improve household resilience to exogenous shocks, and balance the costs of providing such products through post-hoc analysis. Previously, such analysis could only be conducted through expensive pilot studies involving real participants, demonstrating the benefit of the approach for designing and evaluating financial products.


Debiasing Alternative Data for Credit Underwriting Using Causal Inference

Lam, Chris

arXiv.org Artificial Intelligence

Alternative data provides valuable insights for lenders to evaluate a borrower's creditworthiness, which could help expand credit access to underserved groups and lower costs for borrowers. But some forms of alternative data have historically been excluded from credit underwriting because it could act as an illegal proxy for a protected class like race or gender, causing redlining. We propose a method for applying causal inference to a supervised machine learning model to debias alternative data so that it might be used for credit underwriting. We demonstrate how our algorithm can be used against a public credit dataset to improve model accuracy across different racial groups, while providing theoretically robust nondiscrimination guarantees.


Applying Hybrid Graph Neural Networks to Strengthen Credit Risk Analysis

Sun, Mengfang, Sun, Wenying, Sun, Ying, Liu, Shaobo, Jiang, Mohan, Xu, Zhen

arXiv.org Artificial Intelligence

This paper presents a novel approach to credit risk prediction by employing Graph Convolutional Neural Networks (GCNNs) to assess the creditworthiness of borrowers. Leveraging the power of big data and artificial intelligence, the proposed method addresses the challenges faced by traditional credit risk assessment models, particularly in handling imbalanced datasets and extracting meaningful features from complex relationships. The paper begins by transforming raw borrower data into graph-structured data, where borrowers and their relationships are represented as nodes and edges, respectively. A classic subgraph convolutional model is then applied to extract local features, followed by the introduction of a hybrid GCNN model that integrates both local and global convolutional operators to capture a comprehensive representation of node features. The hybrid model incorporates an attention mechanism to adaptively select features, mitigating issues of over-smoothing and insufficient feature consideration. The study demonstrates the potential of GCNNs in improving the accuracy of credit risk prediction, offering a robust solution for financial institutions seeking to enhance their lending decision-making processes.


Credit Scores: Performance and Equity

Albanesi, Stefania, Vamossy, Domonkos F.

arXiv.org Artificial Intelligence

Credit scores are critical for allocating consumer debt in the United States, yet little evidence is available on their performance. We benchmark a widely used credit score against a machine learning model of consumer default and find significant misclassification of borrowers, especially those with low scores. Our model improves predictive accuracy for young, low-income, and minority groups due to its superior performance with low quality data, resulting in a gain in standing for these populations. Our findings suggest that improving credit scoring performance could lead to more equitable access to credit.


Dynamic Pricing in Securities Lending Market: Application in Revenue Optimization for an Agent Lender Portfolio

Xu, Jing, Hsu, Yung Cheng, Biscarri, William

arXiv.org Artificial Intelligence

Securities lending is an important part of the financial market structure, where agent lenders help long term institutional investors to lend out their securities to short sellers in exchange for a lending fee. Agent lenders within the market seek to optimize revenue by lending out securities at the highest rate possible. Typically, this rate is set by hard-coded business rules or standard supervised machine learning models. These approaches are often difficult to scale and are not adaptive to changing market conditions. Unlike a traditional stock exchange with a centralized limit order book, the securities lending market is organized similarly to an e-commerce marketplace, where agent lenders and borrowers can transact at any agreed price in a bilateral fashion. This similarity suggests that the use of typical methods for addressing dynamic pricing problems in e-commerce could be effective in the securities lending market. We show that existing contextual bandit frameworks can be successfully utilized in the securities lending market. Using offline evaluation on real historical data, we show that the contextual bandit approach can consistently outperform typical approaches by at least 15% in terms of total revenue generated.