Goto

Collaborating Authors

 Financial News


MultiFinBen: Benchmarking Large Language Models for Multilingual and Multimodal Financial Application

arXiv.org Artificial Intelligence

Real-world financial analysis involves information across multiple languages and modalities, from reports and news to scanned filings and meeting recordings. Yet most existing evaluations of LLMs in finance remain text-only, monolingual, and largely saturated by current models. To bridge these gaps, we present MultiFinBen, the first expert-annotated multilingual (five languages) and multimodal (text, vision, audio) benchmark for evaluating LLMs in realistic financial contexts. MultiFinBen introduces two new task families: multilingual financial reasoning, which tests cross-lingual evidence integration from filings and news, and financial OCR, which extracts structured text from scanned documents containing tables and charts. Rather than aggregating all available datasets, we apply a structured, difficulty-aware selection based on advanced model performance, ensuring balanced challenge and removing redundant tasks. Evaluating 21 leading LLMs shows that even frontier multimodal models like GPT-4o achieve only 46.01% overall, stronger on vision and audio but dropping sharply in multilingual settings. These findings expose persistent limitations in multilingual, multimodal, and expert-level financial reasoning. All datasets, evaluation scripts, and leaderboards are publicly released.



6d0f9c415e2d779c78f32b74668e9d02-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing Systems

Fact-checking is extensively studied in the context of misinformation and disinformation, addressing objective inaccuracies. However, a softer form of misinformation involves responses that are factually correct but lack certain features such as clarity and relevance. This challenge is prevalent in formal Question-Answer (QA) settings such as press conferences in finance, politics, sports, and other domains, where subjective answers can obscure transparency. Despite this, there is a lack of manually annotated datasets for subjective features across multiple dimensions. To address this gap, we introduce SubjECTive-QA, a human annotated dataset on Earnings Call Transcripts' (ECTs) QA sessions as the answers given by company representatives are often open to subjective interpretations and scrutiny. The dataset includes 49, 446 annotations for long-form QA pairs across six features: Assertive, Cautious, Optimistic, Specific, Clear, and Relevant . These features are carefully selected to encompass the key attributes that reflect the tone of the answers provided during QA sessions across different domains. Our findings are that the best-performing Pre-trained Language Model (PLM), RoBERTa-base, has similar weighted F1 scores to Llama-3-70b-Chat on features with lower subjectivity, such as Relevant and Clear, with a mean difference of 2 .




SAE-FiRE: Enhancing Earnings Surprise Predictions Through Sparse Autoencoder Feature Selection

arXiv.org Artificial Intelligence

Predicting earnings surprises from financial documents, such as earnings conference calls, regulatory filings, and financial news, has become increasingly important in financial economics. However, these financial documents present significant analytical challenges, typically containing over 5,000 words with substantial redundancy and industry-specific terminology that creates obstacles for language models. In this work, we propose the SAE-FiRE (Sparse Autoencoder for Financial Representation Enhancement) framework to address these limitations by extracting key information while eliminating redundancy. SAE-FiRE employs Sparse Autoencoders (SAEs) to decompose dense neural representations from large language models into interpretable sparse components, then applies statistical feature selection methods, including ANOVA F-tests and tree-based importance scoring, to identify the top-k most discriminative dimensions for classification. By systematically filtering out noise that might otherwise lead to overfitting, we enable more robust and generalizable predictions. Experimental results across three financial datasets demonstrate that SAE-FiRE significantly outperforms baseline approaches.


OpenAI's Blockbuster AMD Deal Is a Bet on Near-Limitless Demand for AI

WIRED

OpenAI's Blockbuster AMD Deal Is a Bet on Near-Limitless Demand for AI OpenAI's latest move in the race to build massive data centers in the US shows it believes demand for AI will keep surging--even as skeptics warn of a bubble. Sam Altman, CEO of OpenAI, Lisa Su, CEO of Advanced Micro Devices, and Michael Intrator, CEO of CoreWeave, arrive to testify during the Senate on Thursday, May 8, 2025.Photograph: Tom Williams; Getty Images Save this storyOpenAI announced on Monday that it will acquire several data centers' worth of chips from AMD in a blockbuster deal that could also give OpenAI the option to acquire a roughly 10 percent stake in the chipmaker. It's another bold bet from OpenAI that demand for generative artificial intelligence will continue rising--bubble be damned. "Excited to partner with AMD to use their chips to serve our users!" OpenAI CEO Sam Altman said on X, adding that the company will also ramp up its investments in Nvidia chips. He added: "The world needs much more compute " OpenAI said in a blog post this morning that it would commit to purchasing 6 gigawatts' worth of AMD chips over the next several years.


Tesla sales jump as buyers scramble before EV tax credit expires

Al Jazeera

Tesla sales have surged in the third quarter as buyers in the United States rushed to take advantage of electric vehicle (EV) tax credits that were eliminated under President Donald Trump's sweeping tax bill passed this year. On Thursday, the automaker reported a 7.4 percent increase in sales compared with the same period last year as demand was driven by customers looking to buy before the credits officially expired at the end of September. Tesla also delivered 481,166 units of its Model 3 compact sedan and Model Y crossover in the quarter, well above Wall Street expectations. The Elon Musk-led carmaker frequently talked up the expiry of the tax credits, using it alongside discounts and financing deals to spur sales and leases of its EVs. Investors are worried because sales are now expected to slump as the $7,500 federal tax credit disappears.


Gaming giant Electronic Arts bought in unprecedented 55bn deal

BBC News

Electronic Arts (EA), one of the biggest gaming companies in the world, has agreed a deal to sell the company for $55bn (£41bn). The consortium of buyers include Saudi Arabia's Public Investment Fund (PIF), Silver Lake and Jared Kushner's Affinity Partners. EA is known for making and publishing best-selling games such as EA FC, formerly known as Fifa, The Sims and Mass Effect. It is understood to be the largest leveraged buyout in history - where a significant amount of the purchase is financed by borrowing money. The deal will take EA private - meaning all of its public shares will be purchased and it will no longer be traded on a stock exchange.


MASS: Muli-agent simulation scaling for portfolio construction

arXiv.org Artificial Intelligence

The application of LLM-based agents in financial investment has shown significant promise, yet existing approaches often require intermediate steps like predicting individual stock movements or rely on predefined, static workflows. These limitations restrict their adaptability and effectiveness in constructing optimal portfolios. In this paper, we introduce the Multi-Agent Scaling Simulation (MASS), a novel framework that leverages multi-agent simulation for direct, end-to-end portfolio construction. At its core, MASS employs a backward optimization process to dynamically learn the optimal distribution of heterogeneous agents, enabling the system to adapt to evolving market regimes. A key finding enabled by our framework is the exploration of the scaling effect for portfolio construction: we demonstrate that as the number of agents increases exponentially (up to 512), the aggregated decisions yield progressively higher excess returns. Extensive experiments on a challenging, self-collected dataset from the 2023 Chinese A-share market show that MASS consistently outperforms seven state-of-the-art baselines. Further backtesting, stability analyses and the experiment on data leakage concerns validate its enhanced profitability and robustness. We have open-sourced our code, dataset, and training snapshots at https://github.com/gta0804/MASS/ to foster further research.