AITopics

Empirical Bayes (EB) estimators can match the first-order asymptotic risk of maximum likelihood (ML) while behaving very differently at second order: recent excess mean squared error (XMSE) analysis shows that kernel-based EB estimation may be worse than ML when the kernel is poorly aligned with the true parameter. This paper turns that diagnostic into a design principle. We propose an XMSE-aware mixed estimator that interpolates between ML and EB shrinkage. Its fixed-weight XMSE is a scalar quadratic, yielding a closed-form oracle mixing weight that is no worse than both ML and the base EB estimator at the XMSE scale. A plug-in implementation based on finite-sample XMSE approximations is proved consistent, with a second-order oracle regret rate for an interior oracle weight. We further establish a transfer of the regret bound to the fixed-weight risk curve evaluated at the selected weight, a thresholded boundary rule, and extensions to compact kernel families and to finite and growing kernel dictionaries with high-probability oracle bounds. Finite impulse response simulations with SURE-tuned, hard-selection, and trace-corrected baselines, together with the public Silverbox and Cascaded Tanks benchmarks, show that the proposed estimator retains most of the benefit of regularization when it is helpful and retreats toward ML under kernel misspecification, with an identified finite-de analyzed on the benchmarks.

artificial intelligence, estimator, machine learning, (18 more...)

2606.26975

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

The Geometry of Updates: Fisher Alignment at Vocabulary Scale

Sweeney, John

Training-free source selection for LLM families with shared vocabularies arises in scientific string domains such as SMILES, protein, and genomic sequences, where candidate corpora share a tokenizer but differ in prediction targets. This creates an activation-dark regime: representation-similarity metrics can be uninformative without assumptions about label-conditioned error geometry, while classical update-geometry metrics are computationally prohibitive at vocabulary scale. We show that, in a shared-output head setting, representation metrics (e.g., CKA) are non-identifiable for transfer; models can share identical representations yet have orthogonal head updates. The key identity is that head Fisher alignment is exactly a cosine between kernel mean embeddings in the joint activation-error space, exposing activation, error, and coupling factors rather than requiring a materialized Fisher matrix. FisherSketch estimates this cosine directly in a single streaming pass, making K=128,256 head Fisher alignment practical with a 16 KB task signature (m=4096) and a 192 KB per-task streaming state, small enough to store next to a model hash, but encoding transfer-relevant update structure. Beyond source selection, the same signatures and marginals provide a diagnostic instrument for studying whether LLM task similarity is driven by activations, errors, or their coupling; shared-parameter and internal-layer validations, together with Llama-3.1-8B verbalizer-shift experiments, show that FisherSketch remains informative when activation similarity cannot distinguish tasks.

large language model, machine learning, natural language, (17 more...)

2606.27242

Genre:

Research Report > New Finding (0.87)
Research Report > Experimental Study (0.67)

Industry:

Health & Medicine (0.65)
Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Lausser, Tobias, Vuolo, Joao Eduardo, Zagst, Rudi

Data-Driven Duration Management -- Term Structure Forecasting Using Machine Learning

This paper compares different methods for forecasting the term structure of U.S. and European zero-coupon government bonds using both traditional econometric and Machine Learning (ML) approaches. We compare classical models (e.g., Dynamic Nelson-Siegel (DNS) and Principal Component Analysis (PCA)) with different Neural Network (NN) architectures, including those inspired by the classical models, on the U.S. Treasury market and bonds issued by the European Central Bank (ECB). To enhance predictive performance, macroeconomic variables are incorporated. The findings for both markets are separately analyzed and compared. To this end, we propose a robust model evaluation framework combining statistical accuracy metrics - such as RMSE, MAE, and directional accuracy - with the economic relevance of a quantitative bond trading strategy. Results show that NNs consistently outperform traditional models in both forecasting accuracy and portfolio performance. For the U.S., the most effective approach is a direct-forecasting NN that incorporates DNS factors to reduce the dimensionality of zero-rate data and an Autoencoder (AE) to extract macroeconomic features, while for Europe, the optimal model is a factor-based NN using PCA-derived zero-rate factors without the integration of macroeconomic variables. Overall, the paper demonstrates how combining traditional modeling approaches with modern ML techniques and evaluation can improve yield curve forecasts and support applications in fixed-income portfolio construction.

artificial intelligence, deep learning, machine learning, (18 more...)

2606.26815

Country:

North America > United States (1.00)
Europe (1.00)

Genre: Research Report > New Finding (1.00)

Industry:

Banking & Finance > Trading (1.00)
Banking & Finance > Economy (1.00)
Government > Regional Government > North America Government > United States Government (0.88)
Government > Regional Government > Europe Government (0.87)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)

Learning Probabilistic Filters with Strictly Proper Scoring Rules

Bach, Eviatar, Baptista, Ricardo, Bröcker, Jochen, Chen, Bohan, Stuart, Andrew

Bayesian filtering of partially and noisily observed dynamical systems seeks to infer the evolving conditional distribution of the state of a dynamical system, given observations, in an online fashion. This Bayesian filtering distribution is the natural object for uncertainty quantification, but it is rarely available as a supervised learning target. However, one can often use the forecast model to generate synthetic system trajectories, along with synthetic observations. We introduce the proper scoring ensemble filter (PSEF), an ensemble data assimilation method based on training an analysis map to approximate the filtering distribution using only synthetic state--observation trajectories. The analysis step is represented as a permutation-invariant, transformer-based map that takes as input a forecast ensemble and observations, producing an analysis ensemble. Training is based on strictly proper scoring rules -- with the energy score used in our implementation -- so that probabilistic accuracy is rewarded over the whole probability distribution. We prove that, under a realizability assumption, the population objective is minimized by the true Bayesian filtering distribution. We also derive the finite-ensemble empirical objective used in training and relate its single state--observation trajectory form to the population objective, using a mean-field consistency argument. Numerical experiments show that the learned filter accurately approximates challenging filtering distributions, including nonlinear, non-Gaussian, and multi-modal posteriors, and achieves stronger performance in data assimilation tasks than classical methods or learning-based methods with mean-squared-error objectives. For close-to-Gaussian problems, learning a correction to the EnKF is the best approach, while for highly non-Gaussian problems an end-to-end approach that discards this inductive bias is superior.

artificial intelligence, data mining, machine learning, (18 more...)

2606.26497

Country:

North America > United States (1.00)
Europe (0.67)

Genre: Research Report > New Finding (0.65)

Industry:

Leisure & Entertainment > Games (0.63)
Government > Regional Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.92)
(2 more...)

Gupta, Naveen, Silmana, Vaibhav, Sivananthan, S.

Scalable Operator Learning via Nyström Approximation With Denoising Applications

In this paper, we study Nyström subsampling for vector-valued regression in vector-valued reproducing kernel Hilbert spaces. Standard kernel methods often suffer from prohibitive computational costs due to the construction and inversion of large kernel matrices, which limits their scalability to large datasets. To overcome this bottleneck, we propose an efficient operator learning algorithm based on Nyström subsampling that accommodates functional outputs. Under general source conditions characterized by index functions-extending beyond the classical Hölder-type and operator-monotone frameworks-we establish minimax-optimal convergence rates for the proposed estimator. As an application of the proposed framework, we consider function denoising problems. Unlike classical denoising methods, which are typically tailored to specific signal representations or noise models, our approach formulates denoising within a general operator learning framework. Numerical experiments on signal denoising, real-time audio denoising, image denoising, inverse Radon transform reconstruction, and energy-efficiency prediction confirm that the proposed method achieves performance comparable to full kernel methods while substantially reducing computational cost.

artificial intelligence, machine learning, operator, (18 more...)

2606.26652

Country: North America > United States (0.93)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Nishikawa, Naoki, Takakura, Shokichi, Hasegawa, Satoshi

Escaping Iterative Parameter-Space Noise: Differentially Private Learning with a Hypernetwork

Differentially private (DP) training of neural networks is often hindered by the large amount of noise required by gradient-based methods such as DP-SGD, which repeatedly inject high-dimensional noise in parameter space throughout training. In this paper, we propose a new framework for DP learning that avoids iterative optimization in parameter space. Instead of updating the target model using privatized gradients, we employ a hypernetwork trained on public datasets to map a private dataset to the parameters of the target model. Specifically, each example is embedded into a low-dimensional representation, the embeddings are aggregated and perturbed to obtain a DP dataset embedding, and the hypernetwork generates the target model parameters from this noisy embedding. Because privacy noise is injected only once into a low-dimensional dataset representation, our approach can significantly reduce the adverse effect of noise. We theoretically show in a synthetic setting that, under a fixed privacy budget, models produced by our approach achieve higher utility than those trained with DP-SGD. Moreover, we apply our approach to LoRA fine-tuning of diffusion models and show that it achieves lower FID than LoRA models trained with DP-SGD and other public-data-guided methods.

artificial intelligence, machine learning, natural language, (18 more...)

2606.26772

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Statistical and Structural Approaches to Algorithmic Fairness

Ferrara, Antonio

Modern machine learning systems have outgrown their origins as isolated predictive constructs, evolving into complex socio-technical architectures that actively mediate human opportunity. As algorithms increasingly determine access to economic and social opportunities, it has become widely recognized that these systems are deeply embedded with the structural inequalities and prejudices of their environments. The field of algorithmic fairness emerged in response to the growing recognition that models optimized for predictive accuracy can systematically disadvantage marginalized groups. Early mitigation strategies, however, rested on fragile simplifications that limited their effectiveness in complex sociotechnical environments. This thesis identifies and addresses two fundamental limitations of contemporary fairness paradigms: the reliance on deterministic point estimates for auditing and the treatment of individuals as isolated entities devoid of structural context. First, the diagnosis of algorithmic unfairness has traditionally depended on scalar metrics that fail to capture the nuances of real-world deployment. This deterministic approach ignores the high statistical variance inherent in small, intersectional groups, often leading to false alarms or missed detections of bias. Furthermore, standard auditing struggles with the opacity of black-box models, frequently conflating unjustifiable bias with the influence of legitimate features.

information retrieval, machine learning, natural language, (21 more...)

2606.262

Country:

North America > United States (1.00)
Europe > United Kingdom > England (1.00)
Europe > Germany (1.00)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (1.00)
Research Report > Promising Solution (0.67)

Industry:

Leisure & Entertainment (1.00)
Law > Civil Rights & Constitutional Law (1.00)
Information Technology > Security & Privacy (1.00)
(8 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
(7 more...)

Ongie, Greg, Parhi, Rahul

Representation Costs in Data Science: Foundations and the Quasi-Banach Spaces of Deep Neural Networks

We develop a general framework for analyzing representation costs of parametric data-fitting methods through their parameter-space regularizers. From this abstract perspective, we define representation costs for arbitrary parametric models and reveal their induced (native) function spaces. This unifies recent function-space views of data-fitting methods. We also prove that many natural results hold in this abstract setting, including representer theorems for parametric methods on their native spaces. The framework also rigorously connects parametric methods with their equivalent nonparametric descriptions under sufficient overparameterization. Classical methods and their native spaces, such as kernel methods / reproducing kernel Hilbert spaces, wavelets / Besov spaces, and shallow neural networks / variation spaces emerge as special cases of our abstract framework. A byproduct of "axiomatizing" the study of representation costs is that we also immediately obtain new results for deep neural networks: For depth-$L$ feedforward ReLU networks, their induced native spaces are $p$-normable quasi-Banach spaces with $p = 2/L$. This reveals that the inductive bias of deep neural networks (as given by the representation cost) cannot be captured by norms for depths $L > 2$.

artificial intelligence, machine learning, representation cost, (18 more...)

2606.14954

Country:

North America > United States (1.00)
Europe > United Kingdom > England (0.27)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Beyond Global Divergences: A Local-Mass Perspective on Bayesian Inference

Xu, Hanli, He, Fengxiang, Moka, Sarat

Global objectives, such as KL divergence and ELBO, are widely used in Bayesian inference for measuring distributional discrepancy. This paper studies their local-mass behaviour that is not directly captured by such objectives. We introduce and use two mathematical tools: (1) Mass Index for recording the polynomial and logarithmic decay scales of local mass, and (2) regularised extended KL (RE-KL), a set-localised divergence that can be formulated in the presence of singular components. Mass Indices help characterise how Bayesian updating changes local mass: (1) power-log likelihood factors shift it explicitly, and (2) parameter-dependent supports, or their smooth softenings, may change the local scale through the amount of mass that remains near the parameter value. Using local RE-KL, we prove absolute, relative, and directional inequalities for comparing local small-ball masses under the two KL directions. Together, these results provide a local theoretical account of local mass behaviour. Experiments provide controlled illustrations of the local behaviour. Code is available at https://github.com/Forsythia0604/Local-Mass-Framework.

artificial intelligence, machine learning, mipow, (16 more...)

2606.2709

Genre: Research Report > Experimental Study (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Liu, Haitong, Sridharan, Deepak Narayanan, Steurer, David, Wiedmer, Manuel

Fast algorithms for learning a Gaussian under halfspace truncation with optimal sample complexity

We study the fundamental problem of learning a high-dimensional Gaussian truncated to an unknown halfspace. Lee, Mehrotra and Zampetakis (FOCS'24) recently obtained the first polynomial time algorithm for this problem, but their resulting sample and time complexity bounds are not optimal. Under non-trivial truncation, for any target accuracy $\varepsilon > 0$ and dimension $d$ we give an efficient algorithm that uses $n = \tilde{O}(d^2/\varepsilon^2)$ samples and learns the underlying Gaussian to error $\varepsilon$ in total variation distance. Our algorithm is also fast: its runtime is dominated by the cost of computing the empirical covariance matrix. Both our sample and time complexity are optimal in terms of $d$ and $\varepsilon$ even without truncation: in this regard, we can learn a Gaussian under halfspace truncation for free. The key ingredient behind our result is a novel reinterpretation of the low-degree moments of the truncated Gaussian in terms of a relative truncation parameter. This relative truncation parameter uniquely determines the parameters of the untruncated Gaussian and enables direct parameter recovery. This reinterpretation allows us to circumvent the time intensive projected stochastic gradient descent procedure that is widely used in learning under truncation.

artificial intelligence, claim 3, machine learning, (17 more...)

2606.27298

Country: North America > United States (1.00)

Genre:

Research Report (0.70)
Instructional Material (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)