Tax
Japan's wealthy fail to declare 65.5 billion in income
Wealthy people in Japan failed to declare a total of 65.5 billion in taxable income in the year through June, down 33.2% from the year before, a National Tax Agency report showed Friday. During the year, the agency conducted 2,407 investigations targeting the wealthy, including those with significant holdings in securities and real estate, down 18.2%. It collected back taxes totaling 17 billion, down 7.1%. Undeclared income among all people subject to investigations, including the wealthy, rose 10.2% to a record 996.4 billion. Total back taxes grew 2.2% to 139.8 billion, also a record high.
The US Treasury is using AI (a vehicle for fraud) to detect fraud
AI has been used to defraud people through everything from calling voters to faking celebrity giveaways. Now, the US Treasury Department claims machine learning AI has played a critical part in its enhanced fraud detection processes over the past year -- if a broken clock can be right twice a day, maybe AI can do something good one time? In a new release, the Treasury states it prevented and recovered "fraud and improper payments" worth over 4 billion over the last fiscal year (October 2023 to September 2024). This number represents a tremendous increase from the previous year, which reached just 652.7 million. One-fourth of the 4 billion apparently comes from recovery by "expediting the identification of Treasury check fraud with machine learning AI." Again, does it feel a bit like making a deal with the devil?
AI helping US Treasury bust fraudsters, saving billions
The United States Treasury Department is turning more to artificial intelligence (AI) to fight fraud, using the technology to thwart 4bn in improper payments in the last year. The agency released the estimate in a press release Thursday announcing the success of its "technology and data-driven approach". In fiscal year 2024, which ran from October 2023 to September 2024, the Treasury used machine-learning AI to halt 1bn in cheque fraud, it said. At the same time, its AI processes helped weed out 3bn in other improper payments, including by pinpointing at-risk transactions and improving screening, it added. The 4bn total annual fraud prevention was six times higher than that captured in the previous year, according to the agency.
Realistic Synthetic Financial Transactions for Anti-Money Laundering Models Erik Altman 1 Bรฉni Egressy
With the widespread digitization of finance and the increasing popularity of cryptocurrencies, the sophistication of fraud schemes devised by cybercriminals is growing. Money laundering - the movement of illicit funds to conceal their origins - can cross bank and national boundaries, producing complex transaction patterns. The UN estimates 2-5% of global GDP or $0.8 - $2.0 trillion dollars are laundered globally each year. Unfortunately, real data to train machine learning models to detect laundering is generally not available, and previous synthetic data generators have had significant shortcomings. A realistic, standardized, publicly-available benchmark is needed for comparing models and for the advancement of the area. To this end, this paper contributes a synthetic financial transaction dataset generator and a set of synthetically generated AML (Anti-Money Laundering) datasets. We have calibrated this agent-based generator to match real transactions as closely as possible and made the datasets public. We describe the generator in detail and demonstrate how the datasets generated can help compare different machine learning models in terms of their AML abilities. In a key way, using synthetic data in these comparisons can be even better than using real data: the ground truth labels are complete, whilst many laundering transactions in real data are never detected.
An Evaluation of Explanation Methods for Black-Box Detectors of Machine-Generated Text
Schoenegger, Loris, Xia, Yuxi, Roth, Benjamin
The increasing difficulty to distinguish language-model-generated from human-written text has led to the development of detectors of machine-generated text (MGT). However, in many contexts, a black-box prediction is not sufficient, it is equally important to know on what grounds a detector made that prediction. Explanation methods that estimate feature importance promise to provide indications of which parts of an input are used by classifiers for prediction. However, the quality of different explanation methods has not previously been assessed for detectors of MGT. This study conducts the first systematic evaluation of explanation quality for this task. The dimensions of faithfulness and stability are assessed with five automated experiments, and usefulness is evaluated in a user study. We use a dataset of ChatGPT-generated and human-written documents, and pair predictions of three existing language-model-based detectors with the corresponding SHAP, LIME, and Anchor explanations. We find that SHAP performs best in terms of faithfulness, stability, and in helping users to predict the detector's behavior. In contrast, LIME, perceived as most useful by users, scores the worst in terms of user performance at predicting the detectors' behavior.
Can LLMs Beat Humans in Debating? A Dynamic Multi-agent Framework for Competitive Debate
Zhang, Yiqun, Yang, Xiaocui, Feng, Shi, Wang, Daling, Zhang, Yifei, Song, Kaisong
Competitive debate is a complex task of computational argumentation. Large Language Models (LLMs) suffer from hallucinations and lack competitiveness in this field. To address these challenges, we introduce Agent for Debate (Agent4Debate), a dynamic multi-agent framework based on LLMs designed to enhance their capabilities in competitive debate. Drawing inspiration from human behavior in debate preparation and execution, Agent4Debate employs a collaborative architecture where four specialized agents, involving Searcher, Analyzer, Writer, and Reviewer, dynamically interact and cooperate. These agents work throughout the debate process, covering multiple stages from initial research and argument formulation to rebuttal and summary. To comprehensively evaluate framework performance, we construct the Competitive Debate Arena, comprising 66 carefully selected Chinese debate motions. We recruit ten experienced human debaters and collect records of 200 debates involving Agent4Debate, baseline models, and humans. The evaluation employs the Debatrix automatic scoring system and professional human reviewers based on the established Debatrix-Elo and Human-Elo ranking. Experimental results indicate that the state-of-the-art Agent4Debate exhibits capabilities comparable to those of humans. Furthermore, ablation studies demonstrate the effectiveness of each component in the agent structure.
Tax Credits and Household Behavior: The Roles of Myopic Decision-Making and Liquidity in a Simulated Economy
Dong, Jialin, Dwarakanath, Kshama, Vyetrenko, Svitlana
There has been a growing interest in multi-agent simulators in the domain of economic modeling. However, contemporary research often involves developing reinforcement learning (RL) based models that focus solely on a single type of agents, such as households, firms, or the government. Such an approach overlooks the adaptation of interacting agents thereby failing to capture the complexity of real-world economic systems. In this work, we consider a multi-agent simulator comprised of RL agents of numerous types, including heterogeneous households, firm, central bank and government. In particular, we focus on the crucial role of the government in distributing tax credits to households. We conduct two broad categories of comprehensive experiments dealing with the impact of tax credits on 1) households with varied degrees of myopia (short-sightedness in spending and saving decisions), and 2) households with diverse liquidity profiles. The first category of experiments examines the impact of the frequency of tax credits (e.g. annual vs quarterly) on consumption patterns of myopic households. The second category of experiments focuses on the impact of varying tax credit distribution strategies on households with differing liquidities. We validate our simulation model by reproducing trends observed in real households upon receipt of unforeseen, uniform tax credits, as documented in a JPMorgan Chase report. Based on the results of the latter, we propose an innovative tax credit distribution strategy for the government to reduce inequality among households. We demonstrate the efficacy of this strategy in improving social welfare in our simulation results.