Goto

Collaborating Authors

 portfolio


Uncertainty-Adjusted Sorting for Asset Pricing with Machine Learning

Liu, Yan, Luo, Ye, Wang, Zigan, Zhang, Xiaowei

arXiv.org Machine Learning

A large and rapidly expanding literature demonstrates that machine learning (ML) methods substantially improve out-of-sample asset return prediction relative to conventional linear benchmarks, and that these statistical gains often translate into economically meaningful portfolio performance. Seminal contributions such as Gu et al. (2020) document large Sharpe ratio improvements from nonlinear learners in U.S. equities, while subsequent work extends these findings to stochastic discount factor estimation (Chen et al. 2024), international equity markets (Leippold et al. 2022), and bond return forecasting (Kelly et al. 2019, Bianchi et al. 2020). Collectively, this literature establishes ML as a powerful tool for extracting conditional expected returns in environments characterized by noisy signals, nonlinear interactions, and pervasive multicollinearity.


The Nonstationarity-Complexity Tradeoff in Return Prediction

Capponi, Agostino, Huang, Chengpiao, Sidaoui, J. Antonio, Wang, Kaizheng, Zou, Jiacheng

arXiv.org Machine Learning

We investigate machine learning models for stock return prediction in non-stationary environments, revealing a fundamental nonstationarity-complexity tradeoff: complex models reduce misspecification error but require longer training windows that introduce stronger non-stationarity. We resolve this tension with a novel model selection method that jointly optimizes model class and training window size using a tournament procedure that adaptively evaluates candidates on non-stationary validation data. Our theoretical analysis demonstrates that this approach balances misspecification error, estimation variance, and non-stationarity, performing close to the best model in hindsight. Applying our method to 17 industry portfolio returns, we consistently outperform standard rolling-window benchmarks, improving out-of-sample $R^2$ by 14-23% on average. During NBER-designated recessions, improvements are substantial: our method achieves positive $R^2$ during the Gulf War recession while benchmarks are negative, and improves $R^2$ in absolute terms by at least 80bps during the 2001 recession as well as superior performance during the 2008 Financial Crisis. Economically, a trading strategy based on our selected model generates 31% higher cumulative returns averaged across the industries.


SoftBank to acquire DigitalBridge for 4bn in move to deepen ties to AI

The Guardian

Acquisition would further expand SoftBank's investments in artificial intelligence as it tries to center itself in the boom SoftBank Group will acquire digital infrastructure investor DigitalBridge Group in a deal valued at $4bn, the companies said on Monday, as the Japanese investment firm looks to deepen its AI-related portfolio. The acquisition would expand SoftBank's exposure to digital infrastructure as the Japanese conglomerate is positioning its portfolio to focus on artificial intelligence. SoftBank's billionaire founder Masayoshi Son is seeking to capitalize on surging demand for the computing capacity that underpins artificial intelligence applications. DigitalBridge invests in digital infrastructure sectors such as datacenters, cell towers, fiber networks, small-cell systems and edge infrastructure, with a portfolio including companies such as Vantage Data Centers, Zayo, Switch and AtlasEdge. Founded in 1991 as real estate-focused Colony Capital, the firm pivoted under CEO Marc Ganzi into digital infrastructure and rebranded as DigitalBridge in 2021 after shedding most of its legacy property assets.


Robust Portfolio Optimization

Neural Information Processing Systems

We propose a robust portfolio optimization approach based on quantile statistics. The proposed method is robust to extreme events in asset returns, and accommodates large portfolios under limited historical data. Specifically, we show that the risk of the estimated portfolio converges to the oracle optimal risk with parametric rate under weakly dependent asset returns. The theory does not rely on higher order moment assumptions, thus allowing for heavy-tailed asset returns. Moreover, the rate of convergence quantifies that the size of the portfolio under management is allowed to scale exponentially with the sample size of the historical data. The empirical effectiveness of the proposed method is demonstrated under both synthetic and real stock data. Our work extends existing ones by achieving robustness in high dimensions, and by allowing serial dependence.


Regret Bounds for Online Portfolio Selection with a Cardinality Constraint

Neural Information Processing Systems

Online portfolio selection is a sequential decision-making problem in which a learner repetitively selects a portfolio over a set of assets, aiming to maximize long-term return. In this paper, we study the problem with the cardinality constraint that the number of assets in a portfolio is restricted to be at most k, and consider two scenarios: (i) in the full-feedback setting, the learner can observe price relatives (rates of return to cost) for all assets, and (ii) in the bandit-feedback setting, the learner can observe price relatives only for invested assets. We propose efficient algorithms for these scenarios that achieve sublinear regrets. We also provide regret (statistical) lower bounds for both scenarios which nearly match the upper bounds when k is a constant. In addition, we give a computational lower bound which implies that no algorithm maintains both computational efficiency, as well as a small regret upper bound.


A Globally Optimal Portfolio for m-Sparse Sharpe Ratio Maximization

Neural Information Processing Systems

The Sharpe ratio is an important and widely-used risk-adjusted return in financial engineering. In modern portfolio management, one may require an m-sparse (no more than m active assets) portfolio to save managerial and financial costs.


LLM-Generated Counterfactual Stress Scenarios for Portfolio Risk Simulation via Hybrid Prompt-RAG Pipeline

Soleimani, Masoud

arXiv.org Artificial Intelligence

We develop a transparent and fully auditable LLM-based pipeline for macro-financial stress testing, combining structured prompting with optional retrieval of country fundamentals and news. The system generates machine-readable macroeconomic scenarios for the G7, which cover GDP growth, inflation, and policy rates, and are translated into portfolio losses through a factor-based mapping that enables Value-at-Risk and Expected Shortfall assessment relative to classical econometric baselines. Across models, countries, and retrieval settings, the LLMs produce coherent and country-specific stress narratives, yielding stable tail-risk amplification with limited sensitivity to retrieval choices. Comprehensive plausibility checks, scenario diagnostics, and ANOVA-based variance decomposition show that risk variation is driven primarily by portfolio composition and prompt design rather than by the retrieval mechanism. The pipeline incorporates snapshotting, deterministic modes, and hash-verified artifacts to ensure reproducibility and auditability. Overall, the results demonstrate that LLM-generated macro scenarios, when paired with transparent structure and rigorous validation, can provide a scalable and interpretable complement to traditional stress-testing frameworks.


Learning to Hedge Swaptions

Ahmadi, Zaniar, Godin, Frédéric

arXiv.org Artificial Intelligence

This paper investigates the deep hedging framework, based on reinforcement learning (RL), for the dynamic hedging of swaptions, contrasting its performance with traditional sensitivity-based rho-hedging. We design agents under three distinct objective functions (mean squared error, downside risk, and Conditional Value-at-Risk) to capture alternative risk preferences and evaluate how these objectives shape hedging styles. Relying on a three-factor arbitrage-free dynamic Nelson-Siegel model for our simulation experiments, our findings show that near-optimal hedging effectiveness is achieved when using two swaps as hedging instruments. Deep hedging strategies dynamically adapt the hedging portfolio's exposure to risk factors across states of the market. In our experiments, their out-performance over rho-hedging strategies persists even in the presence some of model misspecification. These results highlight RL's potential to deliver more efficient and resilient swaption hedging strategies.


Statistical Arbitrage in Polish Equities Market Using Deep Learning Techniques

Adamczyk, Marek, Dąbrowski, Michał

arXiv.org Artificial Intelligence

We study a systematic approach to a popular Statistical Arbitrage technique: Pairs Trading. Instead of relying on two highly correlated assets, we replace the second asset with a replication of the first using risk factor representations. These factors are obtained through Principal Components Analysis (PCA), exchange traded funds (ETFs), and, as our main contribution, Long Short Term Memory networks (LSTMs). Residuals between the main asset and its replication are examined for mean reversion properties, and trading signals are generated for sufficiently fast mean reverting portfolios. Beyond introducing a deep learning based replication method, we adapt the framework of Avellaneda and Lee (2008) to the Polish market. Accordingly, components of WIG20, mWIG40, and selected sector indices replace the original S&P500 universe, and market parameters such as the risk free rate and transaction costs are updated to reflect local conditions. We outline the full strategy pipeline: risk factor construction, residual modeling via the Ornstein Uhlenbeck process, and signal generation. Each replication technique is described together with its practical implementation. Strategy performance is evaluated over two periods: 2017-2019 and the recessive year 2020. All methods yield profits in 2017-2019, with PCA achieving roughly 20 percent cumulative return and an annualized Sharpe ratio of up to 2.63. Despite multiple adaptations, our conclusions remain consistent with those of the original paper. During the COVID-19 recession, only the ETF based approach remains profitable (about 5 percent annual return), while PCA and LSTM methods underperform. LSTM results, although negative, are promising and indicate potential for future optimization.


Molecular Embedding-Based Algorithm Selection in Protein-Ligand Docking

Wang, Jiabao Brad, Cao, Siyuan, Wu, Hongxuan, Yuan, Yiliang, Misir, Mustafa

arXiv.org Artificial Intelligence

Selecting an effective docking algorithm is highly context-dependent, and no single method performs reliably across structural, chemical, or protocol regimes. We introduce MolAS, a lightweight algorithm selection system that predicts per-algorithm performance from pretrained protein-ligand embeddings using attentional pooling and a shallow residual decoder. With only hundreds to a few thousand labelled complexes, MolAS achieves up to 15% absolute improvement over the single-best solver (SBS) and closes 17-66% of the Virtual Best Solver (VBS)-SBS gap across five diverse docking benchmarks. Analyses of reliability, embedding geometry, and solver-selection patterns show that MolAS succeeds when the oracle landscape exhibits low entropy and separable solver behaviour, but collapses under protocol-induced hierarchy shifts. These findings indicate that the main barrier to robust docking AS is not representational capacity but instability in solver rankings across pose-generation regimes, positioning MolAS as both a practical in-domain selector and a diagnostic tool for assessing when AS is feasible.