Mehta, Dhagash
Supervised Similarity for High-Yield Corporate Bonds with Quantum Cognition Machine Learning
Rosaler, Joshua, Candelori, Luca, Kirakosyan, Vahagn, Musaelian, Kharen, Samson, Ryan, Wells, Martin T., Mehta, Dhagash, Pasquali, Stefano
We investigate the application of quantum cognition machine learning (QCML), a novel paradigm for both supervised and unsupervised learning tasks rooted in the mathematical formalism of quantum theory, to distance metric learning in corporate bond markets. Compared to equities, corporate bonds are relatively illiquid and both trade and quote data in these securities are relatively sparse. Thus, a measure of distance/similarity among corporate bonds is particularly useful for a variety of practical applications in the trading of illiquid bonds, including the identification of similar tradable alternatives, pricing securities with relatively few recent quotes or trades, and explaining the predictions and performance of ML models based on their training data. Previous research has explored supervised similarity learning based on classical tree-based models in this context; here, we explore the application of the QCML paradigm for supervised distance metric learning in the same context, showing that it outperforms classical tree-based models in high-yield (HY) markets, while giving comparable or better performance (depending on the evaluation metric) in investment grade (IG) markets.
A Comparative Study of DSPy Teleprompter Algorithms for Aligning Large Language Models Evaluation Metrics to Human Evaluation
Sarmah, Bhaskarjit, Dutta, Kriti, Grigoryan, Anna, Tiwari, Sachin, Pasquali, Stefano, Mehta, Dhagash
We argue that the Declarative Self-improving Python (DSPy) optimizers are a way to align the large language model (LLM) prompts and their evaluations to the human annotations. We present a comparative analysis of five teleprompter algorithms, namely, Cooperative Prompt Optimization (COPRO), Multi-Stage Instruction Prompt Optimization (MIPRO), BootstrapFewShot, BootstrapFewShot with Optuna, and K-Nearest Neighbor Few Shot, within the DSPy framework with respect to their ability to align with human evaluations. As a concrete example, we focus on optimizing the prompt to align hallucination detection (using LLM as a judge) to human annotated ground truth labels for a publicly available benchmark dataset. Our experiments demonstrate that optimized prompts can outperform various benchmark methods to detect hallucination, and certain telemprompters outperform the others in at least these experiments.
How to Choose a Threshold for an Evaluation Metric for Large Language Models
Sarmah, Bhaskarjit, Li, Mingshu, Lyu, Jingrao, Frank, Sebastian, Castellanos, Nathalia, Pasquali, Stefano, Mehta, Dhagash
To ensure and monitor large language models (LLMs) reliably, various evaluation metrics have been proposed in the literature. However, there is little research on prescribing a methodology to identify a robust threshold on these metrics even though there are many serious implications of an incorrect choice of the thresholds during deployment of the LLMs. Translating the traditional model risk management (MRM) guidelines within regulated industries such as the financial industry, we propose a step-by-step recipe for picking a threshold for a given LLM evaluation metric. We emphasize that such a methodology should start with identifying the risks of the LLM application under consideration and risk tolerance of the stakeholders. We then propose concrete and statistically rigorous procedures to determine a threshold for the given LLM evaluation metric using available ground-truth data. As a concrete example to demonstrate the proposed methodology at work, we employ it on the Faithfulness metric, as implemented in various publicly available libraries, using the publicly available HaluBench dataset. We also lay a foundation for creating systematic approaches to select thresholds, not only for LLMs but for any GenAI applications.
AI versus AI in Financial Crimes and Detection: GenAI Crime Waves to Co-Evolutionary AI
Kurshan, Eren, Mehta, Dhagash, Bruss, Bayan, Balch, Tucker
Adoption of AI by criminal entities across traditional and emerging financial crime paradigms has been a disturbing recent trend. Particularly concerning is the proliferation of generative AI, which has empowered criminal activities ranging from sophisticated phishing schemes to the creation of hard-to-detect deep fakes, and to advanced spoofing attacks to biometric authentication systems. The exploitation of AI by criminal purposes continues to escalate, presenting an unprecedented challenge. AI adoption causes an increasingly complex landscape of fraud typologies intertwined with cybersecurity vulnerabilities. Overall, GenAI has a transformative effect on financial crimes and fraud. According to some estimates, GenAI will quadruple the fraud losses by 2027 with a staggering annual growth rate of over 30% [27]. As crime patterns become more intricate, personalized, and elusive, deploying effective defensive AI strategies becomes indispensable. However, several challenges hinder the necessary progress of AI-based fincrime detection systems. This paper examines the latest trends in AI/ML-driven financial crimes and detection systems. It underscores the urgent need for developing agile AI defenses that can effectively counteract the rapidly emerging threats. It also aims to highlight the need for cooperation across the financial services industry to tackle the GenAI induced crime waves.
Towards Enhanced Local Explainability of Random Forests: a Proximity-Based Approach
Rosaler, Joshua, Desai, Dhruv, Sarmah, Bhaskarjit, Vamvourellis, Dimitrios, Onay, Deran, Mehta, Dhagash, Pasquali, Stefano
We initiate a novel approach to explain the out of sample performance of random forest (RF) models by exploiting the fact that any RF can be formulated as an adaptive weighted K nearest-neighbors model. Specifically, we use the proximity between points in the feature space learned by the RF to re-write random forest predictions exactly as a weighted average of the target labels of training data points. This linearity facilitates a local notion of explainability of RF predictions that generates attributions for any model prediction across observations in the training set, and thereby complements established methods like SHAP, which instead generates attributions for a model prediction across dimensions of the feature space. We demonstrate this approach in the context of a bond pricing model trained on US corporate bond trades, and compare our approach to various existing approaches to model explainability.
Towards reducing hallucination in extracting information from financial reports using Large Language Models
Sarmah, Bhaskarjit, Zhu, Tianjie, Mehta, Dhagash, Pasquali, Stefano
For a financial analyst, the question and answer (Q\&A) segment of the company financial report is a crucial piece of information for various analysis and investment decisions. However, extracting valuable insights from the Q\&A section has posed considerable challenges as the conventional methods such as detailed reading and note-taking lack scalability and are susceptible to human errors, and Optical Character Recognition (OCR) and similar techniques encounter difficulties in accurately processing unstructured transcript text, often missing subtle linguistic nuances that drive investor decisions. Here, we demonstrate the utilization of Large Language Models (LLMs) to efficiently and rapidly extract information from earnings report transcripts while ensuring high accuracy transforming the extraction process as well as reducing hallucination by combining retrieval-augmented generation technique as well as metadata. We evaluate the outcomes of various LLMs with and without using our proposed approach based on various objective metrics for evaluating Q\&A systems, and empirically demonstrate superiority of our method.
Quantifying Outlierness of Funds from their Categories using Supervised Similarity
Desai, Dhruv, Dhiman, Ashmita, Sharma, Tushar, Sharma, Deepika, Mehta, Dhagash, Pasquali, Stefano
Mutual fund categorization has become a standard tool for the investment management industry and is extensively used by allocators for portfolio construction and manager selection, as well as by fund managers for peer analysis and competitive positioning. As a result, a (unintended) miscategorization or lack of precision can significantly impact allocation decisions and investment fund managers. Here, we aim to quantify the effect of miscategorization of funds utilizing a machine learning based approach. We formulate the problem of miscategorization of funds as a distance-based outlier detection problem, where the outliers are the data-points that are far from the rest of the data-points in the given feature space. We implement and employ a Random Forest (RF) based method of distance metric learning, and compute the so-called class-wise outlier measures for each data-point to identify outliers in the data. We test our implementation on various publicly available data sets, and then apply it to mutual fund data. We show that there is a strong relationship between the outlier measures of the funds and their future returns and discuss the implications of our findings.
Learning Mutual Fund Categorization using Natural Language Processing
Vamvourellis, Dimitrios, Toth, Mate Attila, Desai, Dhruv, Mehta, Dhagash, Pasquali, Stefano
These categorization systems go deeper than the broader asset class based classification (equity, fixed income, etc) and provide Categorization of mutual funds or Exchange-Traded-funds (ETFs) further granular categories based on the portfolio breakdown. They have long served the financial analysts to perform peer analysis have been used to identify the top performing as well as worst for various purposes starting from competitor analysis, to quantifying performing funds within their peer groups, called peer analysis portfolio diversification. The categorization methodology of funds; to identify a home-grown fund to recommend against a usually relies on fund composition data in the structured format competitor's fund; to explain similarities and advantages of homegrown extracted from the Form N-1A. Here, we initiate a study to learn products compared to competitors' products for marketing the categorization system directly from the unstructured data as purposes; to quantify portfolio diversification of a given fund of depicted in the forms using natural language processing (NLP).
Machine learning the real discriminant locus
Bernal, Edgar A., Hauenstein, Jonathan D., Mehta, Dhagash, Regan, Margaret H., Tang, Tingting
Parameterized systems of polynomial equations arise in many applications in science and engineering with the real solutions describing, for example, equilibria of a dynamical system, linkages satisfying design constraints, and scene reconstruction in computer vision. Since different parameter values can have a different number of real solutions, the parameter space is decomposed into regions whose boundary forms the real discriminant locus. This article views locating the real discriminant locus as a supervised classification problem in machine learning where the goal is to determine classification boundaries over the parameter space, with the classes being the number of real solutions. For multidimensional parameter spaces, this article presents a novel sampling method which carefully samples the parameter space. At each sample point, homotopy continuation is used to obtain the number of real solutions to the corresponding polynomial system. Machine learning techniques including nearest neighbor and deep learning are used to efficiently approximate the real discriminant locus. One application of having learned the real discriminant locus is to develop a real homotopy method that only tracks the real solution paths unlike traditional methods which track all~complex~solution~paths. Examples show that the proposed approach can efficiently approximate complicated solution boundaries such as those arising from the equilibria of the Kuramoto model.
Machine Learning Fund Categorizations
Mehta, Dhagash, Desai, Dhruv, Pradeep, Jithin
Given the surge in popularity of mutual funds (including exchange-traded funds (ETFs)) as a diversified financial investment, a vast variety of mutual funds from various investment management firms and diversification strategies have become available in the market. Identifying similar mutual funds among such a wide landscape of mutual funds has become more important than ever because of many applications ranging from sales and marketing to portfolio replication, portfolio diversification and tax loss harvesting. The current best method is data-vendor provided categorization which usually relies on curation by human experts with the help of available data. In this work, we establish that an industry wide well-regarded categorization system is learnable using machine learning and largely reproducible, and in turn constructing a truly data-driven categorization. We discuss the intellectual challenges in learning this man-made system, our results and their implications.