Goto

Collaborating Authors

 total asset


KoTaP: A Panel Dataset for Corporate Tax Avoidance, Performance, and Governance in Korea

arXiv.org Artificial Intelligence

Category V ariable Definition Tax Avoidance CETR Cash Effective T ax Rate = Cash Taxes Paid / Pre - tax Income GETR GAAP Effective Tax Rate = T otal Tax Expense / Pre - tax Income CETR3 Three - year average CETR GETR3 Three - year average GETR CETR5 Five - year average CETR GETR5 Five - year average GETR A_CETR Adjusted Cash Effective Tax Rate A_GETR Adjusted GAAP Effective T ax Rate A_CETR3 Adjusted three - year average CETR A_GETR3 Adjusted three - year average GETR A_CETR5 Adjusted five - year average CETR A_GETR5 Adjusted five - year average GETR TSTA Total Book - T ax Difference (accrual - based measure) TSDA Discretionary Book - Tax Difference (discretionary accrual - based measure) Profitability ROA Return on Assets = Net Income / Lagged T otal Assets ROE Return on Equity = Net Income / Lagged Equity CFO Operating Cash Flow scaled by total assets LOSS Loss dummy (1 if prior - year net income < 0) Stability LEV Leverage = T otal Liabilities / Total Assets CUR Current Ratio = Current Assets / Current Liabilities SIZE Natural logarithm of total assets PPE Ratio of Property, Plant, and Equipment to total assets AGE Natural logarithm of firm age (based on year of establishment) INVREC Ratio of inventories and receivables to total assets Growth GRW Sales growth rate MB Market - to - Book Ratio = Market Capitalization / Book Equity TQ Tobin's Q = (Market Capitalization + Total Liabilities) / T otal Assets Market Valuation & Governance KOSPI KOSPI listing status dummy BIG4 Big4 audit dummy FORN Foreign ownership share (%) OWN Largest shareholder ownership share (%) Stability Measures Stability measures reflect a firm's financial soundness and its ability to meet obligations. Leverage (LEV) is defined as total liabilities divided by total assets, indicating the firm's degree of financial leverage. The current ratio (CUR), calculated as current assets divided by current liabilities, captures short - term liquidity and payment capacity. Firm size (SIZE) is measured as the natural logarithm of total assets, providing a quantitative indicator of scale. The proportion of property, plant, and eq uipment (PPE), defined as tangible fixed assets divided by total assets, is used to assess the structural stability of the asset base.


Credit Network Modeling and Analysis via Large Language Models

arXiv.org Artificial Intelligence

We investigate the application of large language models (LLMs) to construct credit networks from firms' textual financial statements and to analyze the resulting network structures. We start with using LLMs to translate each firm's financial statement into a credit network that pertains solely to that firm. These networks are then aggregated to form a comprehensive credit network representing the whole financial system. During this process, the inconsistencies in financial statements are automatically detected and human intervention is involved. We demonstrate that this translation process is effective across financial statements corresponding to credit networks with diverse topological structures. We further investigate the reasoning capabilities of LLMs in analyzing credit networks and determining optimal strategies for executing financial operations to maximize network performance measured by the total assets of firms, which is an inherently combinatorial optimization challenge. To demonstrate this capability, we focus on two financial operations: portfolio compression and debt removal, applying them to both synthetic and real-world datasets. Our findings show that LLMs can generate coherent reasoning and recommend effective executions of these operations to enhance overall network performance.


A Regression-Based Share Market Prediction Model for Bangladesh

arXiv.org Artificial Intelligence

Share market is one of the most important sectors of economic development of a country. Everyday almost all companies issue their shares and investors buy and sell shares of these companies. Generally investors want to buy shares of the companies whose market liquidity is comparatively greater. Market liquidity depends on the average price of a share. In this paper, a thorough linear regression analysis has been performed on the stock market data of Dhaka Stock Exchange. Later, the linear model has been compared with random forest based on different metrics showing better results for random forest model. However, the amount of individual significance of different factors on the variability of stock price has been identified and explained. This paper also shows that the time series data is not capable of generating a predictive linear model for analysis.


Investigating Numerical Translation with Large Language Models

arXiv.org Artificial Intelligence

The inaccurate translation of numbers can lead to significant security issues, ranging from financial setbacks to medical inaccuracies. While large language models (LLMs) have made significant advancements in machine translation, their capacity for translating numbers has not been thoroughly explored. This study focuses on evaluating the reliability of LLM-based machine translation systems when handling numerical data. In order to systematically test the numerical translation capabilities of currently open source LLMs, we have constructed a numerical translation dataset between Chinese and English based on real business data, encompassing ten types of numerical translation. Experiments on the dataset indicate that errors in numerical translation are a common issue, with most open-source LLMs faltering when faced with our test scenarios. Especially when it comes to numerical types involving large units like ``million", ``billion", and "yi", even the latest llama3.1 8b model can have error rates as high as 20%. Finally, we introduce three potential strategies to mitigate the numerical mistranslations for large units.


Are Logistic Models Really Interpretable?

arXiv.org Artificial Intelligence

The demand for open and trustworthy AI models points towards widespread publishing of model weights. Consumers of these model weights must be able to act accordingly with the information provided. That said, one of the simplest AI classification models, Logistic Regression (LR), has an unwieldy interpretation of its model weights, with greater difficulties when extending LR to generalised additive models. In this work, we show via a User Study that skilled participants are unable to reliably reproduce the action of small LR models given the trained parameters. As an antidote to this, we define Linearised Additive Models (LAMs), an optimal piecewise linear approximation that augments any trained additive model equipped with a sigmoid link function, requiring no retraining. We argue that LAMs are more interpretable than logistic models -- survey participants are shown to solve model reasoning tasks with LAMs much more accurately than with LR given the same information. Furthermore, we show that LAMs do not suffer from large performance penalties in terms of ROC-AUC and calibration with respect to their logistic counterparts on a broad suite of public financial modelling data.


Optimizing Trading Strategies in Quantitative Markets using Multi-Agent Reinforcement Learning

arXiv.org Artificial Intelligence

Quantitative markets are characterized by swift dynamics and abundant uncertainties, making the pursuit of profit-driven stock trading actions inherently challenging. Within this context, reinforcement learning (RL), which operates on a reward-centric mechanism for optimal control, has surfaced as a potentially effective solution to the intricate financial decision-making conundrums presented. This paper delves into the fusion of two established financial trading strategies, namely the constant proportion portfolio insurance (CPPI) and the time-invariant portfolio protection (TIPP), with the multi-agent deep deterministic policy gradient (MADDPG) framework. As a result, we introduce two novel multi-agent RL (MARL) methods, CPPI-MADDPG and TIPP-MADDPG, tailored for probing strategic trading within quantitative markets. To validate these innovations, we implemented them on a diverse selection of 100 real-market shares. Our empirical findings reveal that the CPPI-MADDPG and TIPP-MADDPG strategies consistently outpace their traditional counterparts, affirming their efficacy in the realm of quantitative trading.


Rethinking Log Odds: Linear Probability Modelling and Expert Advice in Interpretable Machine Learning

arXiv.org Artificial Intelligence

We introduce a family of interpretable machine learning models, with two broad additions: Linearised Additive Models (LAMs) which replace the ubiquitous logistic link function in General Additive Models (GAMs); and SubscaleHedge, an expert advice algorithm for combining base models trained on subsets of features called subscales. LAMs can augment any additive binary classification model equipped with a sigmoid link function. Moreover, they afford direct global and local attributions of additive components to the model output in probability space. We argue that LAMs and SubscaleHedge improve the interpretability of their base algorithms. Using rigorous null-hypothesis significance testing on a broad suite of financial modelling data, we show that our algorithms do not suffer from large performance penalties in terms of ROC-AUC and calibration.


Deep Partial Least Squares for Empirical Asset Pricing

arXiv.org Machine Learning

We use deep partial least squares (DPLS) to estimate an asset pricing model for individual stock returns that exploits conditioning information in a flexible and dynamic way while attributing excess returns to a small set of statistical risk factors. The novel contribution is to resolve the non-linear factor structure, thus advancing the current paradigm of deep learning in empirical asset pricing which uses linear stochastic discount factors under an assumption of Gaussian asset returns and factors. This non-linear factor structure is extracted by using projected least squares to jointly project firm characteristics and asset returns on to a subspace of latent factors and using deep learning to learn the non-linear map from the factor loadings to the asset returns. The result of capturing this non-linear risk factor structure is to characterize anomalies in asset returns by both linear risk factor exposure and interaction effects. Thus the well known ability of deep learning to capture outliers, shed lights on the role of convexity and higher order terms in the latent factor structure on the factor risk premia. On the empirical side, we implement our DPLS factor models and exhibit superior performance to LASSO and plain vanilla deep learning models. Furthermore, our network training times are significantly reduced due to the more parsimonious architecture of DPLS. Specifically, using 3290 assets in the Russell 1000 index over a period of December 1989 to January 2018, we assess our DPLS factor model and generate information ratios that are approximately 1.2x greater than deep learning. DPLS explains variation and pricing errors and identifies the most prominent latent factors and firm characteristics.


Fighting Accounting Fraud Through Forensic Data Analytics

arXiv.org Machine Learning

Accounting fraud is a global concern representing a significant threat to the financial system stability due to the resulting diminishing of the market confidence and trust of regulatory authorities. Several tricks can be used to commit accounting fraud, hence the need for non-static regulatory interventions that take into account different fraudulent patterns. Accordingly, this study aims to improve the detection of accounting fraud via the implementation of several machine learning methods to better differentiate between fraud and non-fraud companies, and to further assist the task of examination within the riskier firms by evaluating relevant financial indicators. Out-of-sample results suggest there is a great potential in detecting falsified financial statements through statistical modelling and analysis of publicly available accounting information. The proposed methodology can be of assistance to public auditors and regulatory agencies as it facilitates auditing processes, and supports more targeted and effective examinations of accounting reports.


A hybrid model for bankruptcy prediction using genetic algorithm, fuzzy c-means and mars

arXiv.org Artificial Intelligence

Bankruptcy prediction is very important for all the organization since it affects the economy and rise many social problems with high costs. There are large number of techniques have been developed to predict the bankruptcy, which helps the decision makers such as investors and financial analysts. One of the bankruptcy prediction models is the hybrid model using Fuzzy C-means clustering and MARS, which uses static ratios taken from the bank financial statements for prediction, which has its own theoretical advantages. The performance of existing bankruptcy model can be improved by selecting the best features dynamically depend on the nature of the firm. This dynamic selection can be accomplished by Genetic Algorithm and it improves the performance of prediction model..