Financial News
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Wan, Fanqi, Shen, Weizhou, Liao, Shengyi, Shi, Yingcheng, Li, Chenliang, Yang, Ziyi, Zhang, Ji, Huang, Fei, Zhou, Jingren, Yan, Ming
Recent large reasoning models (LRMs) have demonstrated strong reasoning capabilities through reinforcement learning (RL). These improvements have primarily been observed within the short-context reasoning tasks. In contrast, extending LRMs to effectively process and reason on long-context inputs via RL remains a critical unsolved challenge. To bridge this gap, we first formalize the paradigm of long-context reasoning RL, and identify key challenges in suboptimal training efficiency and unstable optimization process. To address these issues, we propose QwenLong-L1, a framework that adapts short-context LRMs to long-context scenarios via progressive context scaling. Specifically, we utilize a warm-up supervised fine-tuning (SFT) stage to establish a robust initial policy, followed by a curriculum-guided phased RL technique to stabilize the policy evolution, and enhanced with a difficulty-aware retrospective sampling strategy to incentivize the policy exploration. Experiments on seven long-context document question-answering benchmarks demonstrate that QwenLong-L1-32B outperforms flagship LRMs like OpenAI-o3-mini and Qwen3-235B-A22B, achieving performance on par with Claude-3.7-Sonnet-Thinking, demonstrating leading performance among state-of-the-art LRMs. This work advances the development of practical long-context LRMs capable of robust reasoning across information-intensive environments.
SubjECTive-QA: Measuring Subjectivity in Earnings Call Transcripts' QA Through Six-Dimensional Feature Analysis
Fact-checking is extensively studied in the context of misinformation and disinformation, addressing objective inaccuracies. However, a softer form of misinformation involves responses that are factually correct but lack certain features such as clarity and relevance. This challenge is prevalent in formal Question-Answer (QA) settings such as press conferences in finance, politics, sports, and other domains, where subjective answers can obscure transparency. Despite this, there is a lack of manually annotated datasets for subjective features across multiple dimensions. To address this gap, we introduce SubjECTive-QA, a human annotated dataset on Earnings Call Transcripts' (ECTs) QA sessions as the answers given by company representatives are often open to subjective interpretations and scrutiny.
Vague Knowledge: Evidence from Analyst Reports
People in the real world often possess vague knowledge of future payoffs, for which quantification is not feasible or desirable. We argue that language, with differing ability to convey vague information, plays an important but less-known role in representing subjective expectations. Empirically, we find that in their reports, analysts include useful information in linguistic expressions but not numerical forecasts. Specifically, the textual tone of analyst reports has predictive power for forecast errors and subsequent revisions in numerical forecasts, and this relation becomes stronger when analyst's language is vaguer, when uncertainty is higher, and when analysts are busier. Overall, our theory and evidence suggest that some useful information is vaguely known and only communicated through language.
Can AI Read Between The Lines? Benchmarking LLMs On Financial Nuance
Kubica, Dominick, Gordon, Dylan T., Emura, Nanami, Saini, Derleen, Goldenberg, Charlie
As of 2025, Generative Artificial Intelligence (GenAI) has become a central tool for productivity across industries. Beyond text generation, GenAI now plays a critical role in coding, data analysis, and research workflows. As large language models (LLMs) continue to evolve, it is essential to assess the reliability and accuracy of their outputs, especially in specialized, high-stakes domains like finance. Most modern LLMs transform text into numerical vectors, which are used in operations such as cosine similarity searches to generate responses. However, this abstraction process can lead to misinterpretation of emotional tone, particularly in nuanced financial contexts. While LLMs generally excel at identifying sentiment in everyday language, these models often struggle with the nuanced, strategically ambiguous language found in earnings call transcripts. Financial disclosures frequently embed sentiment in hedged statements, forward-looking language, and industry-specific jargon, making it difficult even for human analysts to interpret consistently, let alone AI models. This paper presents findings from the Santa Clara Microsoft Practicum Project, led by Professor Charlie Goldenberg, which benchmarks the performance of Microsoft's Copilot, OpenAI's ChatGPT, Google's Gemini, and traditional machine learning models for sentiment analysis of financial text. Using Microsoft earnings call transcripts, the analysis assesses how well LLM-derived sentiment correlates with market sentiment and stock movements and evaluates the accuracy of model outputs. Prompt engineering techniques are also examined to improve sentiment analysis results. Visualizations of sentiment consistency are developed to evaluate alignment between tone and stock performance, with sentiment trends analyzed across Microsoft's lines of business to determine which segments exert the greatest influence.
FinMaster: A Holistic Benchmark for Mastering Full-Pipeline Financial Workflows with LLMs
Jiang, Junzhe, Yang, Chang, Cui, Aixin, Jin, Sihan, Wang, Ruiyu, Li, Bo, Huang, Xiao, Sun, Dongning, Wang, Xinrun
Financial tasks are pivotal to global economic stability; however, their execution faces challenges including labor intensive processes, low error tolerance, data fragmentation, and tool limitations. Although large language models (LLMs) have succeeded in various natural language processing tasks and have shown potential in automating workflows through reasoning and contextual understanding, current benchmarks for evaluating LLMs in finance lack sufficient domain-specific data, have simplistic task design, and incomplete evaluation frameworks. To address these gaps, this article presents FinMaster, a comprehensive financial benchmark designed to systematically assess the capabilities of LLM in financial literacy, accounting, auditing, and consulting. Specifically, FinMaster comprises three main modules: i) FinSim, which builds simulators that generate synthetic, privacy-compliant financial data for companies to replicate market dynamics; ii) FinSuite, which provides tasks in core financial domains, spanning 183 tasks of various types and difficulty levels; and iii) FinEval, which develops a unified interface for evaluation. Extensive experiments over state-of-the-art LLMs reveal critical capability gaps in financial reasoning, with accuracy dropping from over 90% on basic tasks to merely 40% on complex scenarios requiring multi-step reasoning. This degradation exhibits the propagation of computational errors, where single-metric calculations initially demonstrating 58% accuracy decreased to 37% in multimetric scenarios. To the best of our knowledge, FinMaster is the first benchmark that covers full-pipeline financial workflows with challenging tasks. We hope that FinMaster can bridge the gap between research and industry practitioners, driving the adoption of LLMs in real-world financial practices to enhance efficiency and accuracy.
SoftBank profit doubles as AI demand boosts chip sales and startups
SoftBank reported a 124% jump in quarterly profit on resilient AI demand that's supporting startup valuations and chip unit sales, a boost to its aggressive data center investment plans. The Tokyo-based company reported net income of 517.18 billion ( 3.5 billion) in its fiscal fourth quarter. It was helped by the Vision Fund, which swung to a profit of 26.1 billion mainly on a surge in the value of TikTok owner ByteDance and its strong international sales. The earnings come at a critical juncture for SoftBank as it plans to invest 30 billion in OpenAI while leading a 100 billion foray into building AI hardware in the United States. Maintaining a healthy cash flow and balance sheet is key to securing the billions of dollars needed at minimum cost.
SoftBank profit doubles as AI demand boosts chip sales and startup valuations
SoftBank Group reported a 124% jump in quarterly profit on resilient AI demand that's supporting startup valuations and chip unit sales, a boost to its aggressive data center investment plans. The Tokyo-based company reported net income of 517.18 billion ( 3.5 billion) in its fiscal fourth quarter. It was helped by the Vision Fund, which swung to a profit of 26.1 billion. The earnings come at a critical juncture for SoftBank as it plans to invest 30 billion in OpenAI while leading a 100 billion foray into building AI hardware in the U.S. Maintaining a healthy cash flow and balance sheet is key to securing the billions of dollars needed at minimum cost.
NTT to launch 16.5 billion tender offer for NTT data in AI push
NTT plans to make its AI powerhouse NTT Data Group a wholly owned subsidiary in a deal worth 2.37 trillion ( 16.5 billion), the latest in a series of Japanese parent companies absorbing their listed units. The country's biggest telecom operator is launching a tender offer of 4,000 per share for all stock it doesn't own in NTT Data. That represents a premium of 34% to its close the previous day. The tender will take place from Friday to June 19, with NTT Data to delist after the tender. News of the deal sent NTT Data shares up by their daily limit of 17% on Thursday, its highest in 25 years.
Meta to report quarterly earnings amid tariff uncertainty and AI investment
Meta is set to report its first quarter earnings on Wednesday after the bell, and investors will be looking for news on whether the company met its quarterly revenue goals of somewhere between 39.5bn and 41.8bn. Wall Street is projecting the company will post 41.36bn in revenue on 5.21 in earnings per share. While Meta has repeatedly beaten Wall Street expectations in the past few quarters, analysts were disappointed by the first quarter revenue outlook Meta chief executive Mark Zuckerberg shared at the end of 2024. The company is also planning on spending up to 65bn on AI infrastructure by the end of 2025. Uncertainty over Donald Trump's sweeping tariffs may yet roil ad markets, clouding the company's financial outlook for near future quarters.
FinNLI: Novel Dataset for Multi-Genre Financial Natural Language Inference Benchmarking
Magomere, Jabez, Kochkina, Elena, Mensah, Samuel, Kaur, Simerjot, Smiley, Charese H.
We introduce FinNLI, a benchmark dataset for Financial Natural Language Inference (FinNLI) across diverse financial texts like SEC Filings, Annual Reports, and Earnings Call transcripts. Our dataset framework ensures diverse premise-hypothesis pairs while minimizing spurious correlations. FinNLI comprises 21,304 pairs, including a high-quality test set of 3,304 instances annotated by finance experts. Evaluations show that domain shift significantly degrades general-domain NLI performance. The highest Macro F1 scores for pre-trained (PLMs) and large language models (LLMs) baselines are 74.57% and 78.62%, respectively, highlighting the dataset's difficulty. Surprisingly, instruction-tuned financial LLMs perform poorly, suggesting limited generalizability. FinNLI exposes weaknesses in current LLMs for financial reasoning, indicating room for improvement.