Goto

Collaborating Authors

 counterparty


AI Application in Anti-Money Laundering for Sustainable and Transparent Financial Systems

Nie, Chuanhao, Liu, Yunbo, Wang, Chao

arXiv.org Artificial Intelligence

Money laundering and financial fraud remain major threats to global financial stability, costing trillions annually and challenging regulatory oversight. This paper reviews how artificial intelligence (AI) applications can modernize Anti-Money Laundering (AML) workflows by improving detection accuracy, lowering false-positive rates, and reducing the operational burden of manual investigations, thereby supporting more sustainable development. It further highlights future research directions including federated learning for privacy-preserving collaboration, fairness-aware and interpretable AI, reinforcement learning for adaptive defenses, and human-in-the-loop visualization systems to ensure that next-generation AML architectures remain transparent, accountable, and robust. In the final part, the paper proposes an AI-driven KYC application that integrates graph-based retrieval-augmented generation (RAG Graph) with generative models to enhance efficiency, transparency, and decision support in KYC processes related to money-laundering detection. Experimental results show that the RAG-Graph architecture delivers high faithfulness and strong answer relevancy across diverse evaluation settings, thereby enhancing the efficiency and transparency of KYC CDD/EDD workflows and contributing to more sustainable, resource-optimized compliance practices.


The Massive Legal Embedding Benchmark (MLEB)

Butler, Umar, Butler, Abdur-Rahman, Malec, Adrian Lucas

arXiv.org Artificial Intelligence

We present the Massive Legal Embedding Benchmark (MLEB), the largest, most diverse, and most comprehensive open-source benchmark for legal information retrieval to date. MLEB consists of ten expert-annotated datasets spanning multiple jurisdictions (the US, UK, EU, Australia, Ireland, and Singapore), document types (cases, legislation, regulatory guidance, contracts, and literature), and task types (search, zero-shot classification, and question answering). Seven of the datasets in MLEB were newly constructed in order to fill domain and jurisdictional gaps in the open-source legal information retrieval landscape. We document our methodology in building MLEB and creating the new constituent datasets, and release our code, results, and data openly to assist with reproducible evaluations.


Acquiescence Bias in Large Language Models

Braun, Daniel

arXiv.org Artificial Intelligence

Acquiescence bias, i.e. the tendency of humans to agree with statements in surveys, independent of their actual beliefs, is well researched and documented. Since Large Language Models (LLMs) have been shown to be very influenceable by relatively small changes in input and are trained on human-generated data, it is reasonable to assume that they could show a similar tendency. We present a study investigating the presence of acquiescence bias in LLMs across different models, tasks, and languages (English, German, and Polish). Our results indicate that, contrary to humans, LLMs display a bias towards answering no, regardless of whether it indicates agreement or disagreement.


ContractEval: Benchmarking LLMs for Clause-Level Legal Risk Identification in Commercial Contracts

Liu, Shuang, Li, Zelong, Ma, Ruoyun, Zhao, Haiyan, Du, Mengnan

arXiv.org Artificial Intelligence

The potential of large language models (LLMs) in specialized domains such as legal risk analysis remains underexplored. In response to growing interest in locally deploying open-source LLMs for legal tasks while preserving data confidentiality, this paper introduces ContractEval, the first benchmark to thoroughly evaluate whether open-source LLMs could match proprietary LLMs in identifying clause-level legal risks in commercial contracts. Using the Contract Understanding Atticus Dataset (CUAD), we assess 4 proprietary and 15 open-source LLMs. Our results highlight five key findings: (1) Proprietary models outperform open-source models in both correctness and output effectiveness, though some open-source models are competitive in certain specific dimensions. (2) Larger open-source models generally perform better, though the improvement slows down as models get bigger. (3) Reasoning ("thinking") mode improves output effectiveness but reduces correctness, likely due to over-complicating simpler tasks. (4) Open-source models generate "no related clause" responses more frequently even when relevant clauses are present. This suggests "laziness" in thinking or low confidence in extracting relevant content. (5) Model quantization speeds up inference but at the cost of performance drop, showing the tradeoff between efficiency and accuracy. These findings suggest that while most LLMs perform at a level comparable to junior legal assistants, open-source models require targeted fine-tuning to ensure correctness and effectiveness in high-stakes legal settings. ContractEval offers a solid benchmark to guide future development of legal-domain LLMs.


Infrastructure for AI Agents

Chan, Alan, Wei, Kevin, Huang, Sihao, Rajkumar, Nitarshan, Perrier, Elija, Lazar, Seth, Hadfield, Gillian K., Anderljung, Markus

arXiv.org Artificial Intelligence

Increasingly many AI systems can plan and execute interactions in open-ended environments, such as making phone calls or buying online goods. As developers grow the space of tasks that such AI agents can accomplish, we will need tools both to unlock their benefits and manage their risks. Current tools are largely insufficient because they are not designed to shape how agents interact with existing institutions (e.g., legal and economic systems) or actors (e.g., digital service providers, humans, other AI agents). For example, alignment techniques by nature do not assure counterparties that some human will be held accountable when a user instructs an agent to perform an illegal action. To fill this gap, we propose the concept of agent infrastructure: technical systems and shared protocols external to agents that are designed to mediate and influence their interactions with and impacts on their environments. Agent infrastructure comprises both new tools and reconfigurations or extensions of existing tools. For example, to facilitate accountability, protocols that tie users to agents could build upon existing systems for user authentication, such as OpenID. Just as the Internet relies on infrastructure like HTTPS, we argue that agent infrastructure will be similarly indispensable to ecosystems of agents. We identify three functions for agent infrastructure: 1) attributing actions, properties, and other information to specific agents, their users, or other actors; 2) shaping agents' interactions; and 3) detecting and remedying harmful actions from agents. We propose infrastructure that could help achieve each function, explaining use cases, adoption, limitations, and open questions. Making progress on agent infrastructure can prepare society for the adoption of more advanced agents.


Denoising ESG: quantifying data uncertainty from missing data with Machine Learning and prediction intervals

Caprioli, Sergio, Foschi, Jacopo, Crupi, Riccardo, Sabatino, Alessandro

arXiv.org Artificial Intelligence

Environmental, Social, and Governance (ESG) datasets are frequently plagued by significant data gaps, leading to inconsistencies in ESG ratings due to varying imputation methods. This study addresses the missing data issues in ESG datasets using machine learning techniques, comparing K-Nearest Neighbors, Gradient Boosting, Multiple Imputation by Chained Equations (MICE) and Neural Networks. We focus on quantifying the risk induced by data anomalies and provide tools to assess the impacts of this risk on the variability of the scores. By introducing prediction uncertainty using methods such as Predictive Mean Matching and Local Residual Draw, in order to assign confidence measures to individual predictions, we provide a nuanced understanding of prediction uncertainty. Empirical analyses show that these methods improve imputation accuracy and quantify uncertainty, which is required for reliable ESG scoring in banking and finance.


Legal Prompting: Teaching a Language Model to Think Like a Lawyer

Yu, Fangyi, Quartey, Lee, Schilder, Frank

arXiv.org Artificial Intelligence

Large language models that are capable of zero or few-shot prompting approaches have given rise to the new research area of prompt engineering. Recent advances showed that for example Chain-of-Thought (CoT) prompts can improve arithmetic or common sense tasks significantly. We explore how such approaches fare with legal reasoning tasks and take the COLIEE entailment task based on the Japanese Bar exam for testing zero-shot/few-shot and fine-tuning approaches. Our findings show that while CoT prompting and fine-tuning with explanations approaches show improvements, the best results are produced by prompts that are derived from specific legal reasoning techniques such as IRAC (Issue, Rule, Application, Conclusion). Based on our experiments we improve the 2021 best result from 0.7037 accuracy to 0.8148 accuracy and beat the 2022 best system of 0.6789 accuracy with an accuracy of 0.7431.


Rating Triggers for Collateral-Inclusive XVA via Machine Learning and SDEs on Lie Groups

Kamm, Kevin, Muniz, Michelle

arXiv.org Artificial Intelligence

Specifically, we focus on calibrating the model to both historical data (rating transition matrices) and market data (CDS quotes) and compare the most popular choices of changes of measure to switch from the historical probability to the risk-neutral one. For this, we show how the classical Girsanov theorem can be applied in the Lie group setting. Moreover, we overcome some of the imperfections of rating matrices published by rating agencies, which are computed with the cohort method, by using a novel Deep Learning approach. This leads to an improvement of the entire scheme and makes the model more robust for applications. We apply our model to compute bilateral credit and debit valuation adjustments of a netting set under a CSA with thresholds depending on ratings of the two parties.


Artificial Intelligence in Financial Analytics

#artificialintelligence

Successful businesses care for extensive cash flow planning. Being aligned with your payment terms for payables and receivables, or planning your cash needs for the next quarter or year is not a minor task. Many finance teams that we talked to had the data to make it right, but there was some hiccup which hampered end results. What is cash flow forecast? Cash flow forecast has traditionally been performed based on experience and intuition, in an excel environment.


Machine learning is not just for the buy side - Risk.net

#artificialintelligence

The most common application being researched for machine learning is optimal execution. When large trades are executed in the market, it could potentially push prices in an unfavourable direction, so it makes sense that traders are keen on optimising this cost. So far, most of the interest in applying machine learning technology to reduce trading costs has been from the buy side. However, recent research by quants from Standard Chartered shows this may be about to change. In this month's first technical, Evolutionary algos for optimising MVA, Alexei Kondratyev, a managing director at Standard Chartered in London, and George Giorgidze a senior quantitative developer in the strats team within the same bank, propose machine learning techniques to optimise initial margin costs through trade selection.