eptember 18
Holdout cross-validation for large non-Gaussian covariance matrix estimation using Weingarten calculus
Lamrani, Lamia, Collins, Benoît, Bouchaud, Jean-Philippe
Cross-validation is one of the most widely used methods for model selection and evaluation; its efficiency for large covariance matrix estimation appears robust in practice, but little is known about the theoretical behavior of its error. In this paper, we derive the expected Frobenius error of the holdout method, a particular cross-validation procedure that involves a single train and test split, for a generic rotationally invariant multiplicative noise model, therefore extending previous results to non-Gaussian data distributions. Our approach involves using the Weingarten calculus and the Ledoit-Péché formula to derive the oracle eigenvalues in the high-dimensional limit. When the population covariance matrix follows an inverse Wishart distribution, we approximate the expected holdout error, first with a linear shrinkage, then with a quadratic shrinkage to approximate the oracle eigenvalues. Under the linear approximation, we find that the optimal train-test split ratio is proportional to the square root of the matrix dimension. Then we compute Monte Carlo simulations of the holdout error for different distributions of the norm of the noise, such as the Gaussian, Student, and Laplace distributions and observe that the quadratic approximation yields a substantial improvement, especially around the optimal train-test split ratio. We also observe that a higher fourth-order moment of the Euclidean norm of the noise vector sharpens the holdout error curve near the optimal split and lowers the ideal train-test ratio, making the choice of the train-test ratio more important when performing the holdout method.
Breaking the Cycle of Incarceration With Targeted Mental Health Outreach: A Case Study in Machine Learning for Public Policy
Rodolfa, Kit T., Salomon, Erika, Yao, Jin, Yoder, Steve, Sullivan, Robert, McGuire, Kevin, Dickinson, Allie, MacDougall, Rob, Seidler, Brian, Sung, Christina, Herdeman, Claire, Ghani, Rayid
Many incarcerated individuals face significant and complex challenges, including mental illness, substance dependence, and homelessness, yet jails and prisons are often poorly equipped to address these needs. With little support from the existing criminal justice system, these needs can remain untreated and worsen, often leading to further offenses and a cycle of incarceration with adverse outcomes both for the individual and for public safety, with particularly large impacts on communities of color that continue to widen the already extensive racial disparities in criminal justice outcomes. Responding to these failures, a growing number of criminal justice stakeholders are seeking to break this cycle through innovative approaches such as community-driven and alternative approaches to policing, mentoring, community building, restorative justice, pretrial diversion, holistic defense, and social service connections. Here we report on a collaboration between Johnson County, Kansas, and Carnegie Mellon University to perform targeted, proactive mental health outreach in an effort to reduce reincarceration rates. This paper describes the data used, our predictive modeling approach and results, as well as the design and analysis of a field trial conducted to confirm our model's predictive power, evaluate the impact of this targeted outreach, and understand at what level of reincarceration risk outreach might be most effective. Through this trial, we find that our model is highly predictive of new jail bookings, with more than half of individuals in the trial's highest-risk group returning to jail in the following year. Outreach was most effective among these highest-risk individuals, with impacts on mental health utilization, EMS dispatches, and criminal justice involvement.
Exploring Major Transitions in the Evolution of Biological Cognition With Artificial Neural Networks
Voudouris, Konstantinos, Barron, Andrew, Halina, Marta, Klein, Colin, Patel, Matishalin
Transitional accounts of evolution emphasise a few changes that shape what is evolvable, with dramatic consequences for derived lineages. More recently it has been proposed that cognition might also have evolved via a series of major transitions that manipulate the structure of biological neural networks, fundamentally changing the flow of information. We used idealised models of information flow, artificial neural networks (ANNs), to evaluate whether changes in information flow in a network can yield a transitional change in cognitive performance. We compared networks with feed-forward, recurrent and laminated topologies, and tested their performance learning artificial grammars that differed in complexity, controlling for network size and resources. We documented a qualitative expansion in the types of input that recurrent networks can process compared to feed-forward networks, and a related qualitative increase in performance for learning the most complex grammars. We also noted how the difficulty in training recurrent networks poses a form of transition barrier and contingent irreversibility -- other key features of evolutionary transitions. Not all changes in network topology confer a performance advantage in this task set. Laminated networks did not outperform non-laminated networks in grammar learning. Overall, our findings show how some changes in information flow can yield transitions in cognitive performance.
DSCC-HS: A Dynamic Self-Reinforcing Framework for Hallucination Suppression in Large Language Models
Large Language Model (LLM) hallucination is a significant barrier to their reliable deployment. Current methods like Retrieval-Augmented Generation (RAG) are often reactive. We introduce **Dynamic Self-reinforcing Calibration for Hallucination Suppression (DSCC-HS)**, a novel, proactive framework that intervenes during autoregressive decoding. Inspired by dual-process cognitive theory, DSCC-HS uses a compact proxy model, trained in adversarial roles as a Factual Alignment Proxy (FAP) and a Hallucination Detection Proxy (HDP). During inference, these proxies dynamically steer a large target model by injecting a real-time steering vector, which is the difference between FAP and HDP logits, at each decoding step. This plug-and-play approach requires no modification to the target model. Our experiments on TruthfulQA and BioGEN show DSCC-HS achieves state-of-the-art performance. On TruthfulQA, it reached a 99.2% Factual Consistency Rate (FCR). On the long-form BioGEN benchmark, it attained the highest FActScore of 46.50. These results validate DSCC-HS as a principled and efficient solution for enhancing LLM factuality.
Annotating Satellite Images of Forests with Keywords from a Specialized Corpus in the Context of Change Detection
Neptune, Nathalie, Mothe, Josiane
The Amazon rain forest is a vital ecosystem that plays a crucial role in regulating the Earth's climate and providing habitat for countless species. Deforestation in the Amazon is a major concern as it has a significant impact on global carbon emissions and biodiversity. In this paper, we present a method for detecting deforestation in the Amazon using image pairs from Earth observation satellites. Our method leverages deep learning techniques to compare the images of the same area at different dates and identify changes in the forest cover. We also propose a visual semantic model that automatically annotates the detected changes with relevant keywords. The candidate annotation for images are extracted from scientific documents related to the Amazon region. We evaluate our approach on a dataset of Amazon image pairs and demonstrate its effectiveness in detecting deforestation and generating relevant annotations. Our method provides a useful tool for monitoring and studying the impact of deforestation in the Amazon. While we focus on environment applications of our work by using images of deforestation in the Amazon rain forest to demonstrate the effectiveness of our proposed approach, it is generic enough to be applied to other domains.
Complexity Bounds for Smooth Convex Multiobjective Optimization
We study the oracle complexity of finding $\varepsilon$-Pareto stationary points in smooth multiobjective optimization with $m$ objectives. The progress metric is the Pareto stationarity gap $\mathcal{G}(x)$ (the norm of an optimal convex combination of gradients). Our contributions are fourfold. (i) For strongly convex objectives, any span first-order method (iterates lie in the span of past gradients) exhibits linear convergence no faster than $\exp(-Θ(T/\sqrtκ))$ after $T$ oracle calls, where $κ$ is the condition number, implying $Θ(\sqrtκ\log(1/\varepsilon))$ iterations; this matches classical accelerated upper bounds. (ii) For convex problems and oblivious one-step methods (a fixed scalarization with pre-scheduled step sizes), we prove a lower bound of order $1/T$ on the best gradient norm among the first $T$ iterates. (iii) Although accelerated gradient descent is outside this restricted class, it is an oblivious span method and attains the same $1/T$ upper rate on a fixed scalarization. (iv) For convex problems and general span methods with adaptive scalarizations, we establish a universal lower bound of order $1/T^{2}$ on the gradient norm of the final iterate after $T$ steps, highlighting a gap between known upper bounds and worst-case guarantees. All bounds hold on non-degenerate instances with distinct objectives and non-singleton Pareto fronts; rates are stated up to universal constants and natural problem scaling.
Graph representation forecasting of patient's medical conditions: towards a digital twin
Barbiero, Pietro, Torné, Ramon Viñas, Lió, Pietro
Objective: Modern medicine needs to shift from a wait and react, curative discipline to a preventative, interdisciplinary science aiming at providing personalised, systemic and precise treatment plans to patients. The aim of this work is to present how the integration of machine learning approaches with mechanistic computational modelling could yield a reliable infrastructure to run probabilistic simulations where the entire organism is considered as a whole. Methods: We propose a general framework that composes advanced AI approaches and integrates mathematical modelling in order to provide a panoramic view over current and future physiological conditions. The proposed architecture is based on a graph neural network (GNNs) forecasting clinically relevant endpoints (such as blood pressure) and a generative adversarial network (GANs) providing a proof of concept of transcriptomic integrability. Results: We show the results of the investigation of pathological effects of overexpression of ACE2 across different signalling pathways in multiple tissues on cardiovascular functions. We provide a proof of concept of integrating a large set of composable clinical models using molecular data to drive local and global clinical parameters and derive future trajectories representing the evolution of the physiological state of the patient. Significance: We argue that the graph representation of a computational patient has potential to solve important technological challenges in integrating multiscale computational modelling with AI. We believe that this work represents a step forward towards a healthcare digital twin.
The Grammar of Interactive Explanatory Model Analysis
Baniecki, Hubert, Biecek, Przemyslaw
When analysing a complex system, very often an answer to one question raises new questions. This also applies to the explanatory analysis of machine learning models. We cannot sufficiently explain a complex model using a single method that gives only one perspective. Isolated explanations are prone to misunderstanding, which inevitably leads to wrong reasoning. Surprisingly, the majority of methods developed for Explainable Artificial Intelligence (XAI) focus on a single aspect of the model behaviour. In this paper, we show the problem of model explainability as an interactive and sequential analysis of a model. We show how different XAI methods complement each other and why it is essential to juxtapose them together. The proposed process of Interactive Explanatory Model Analysis (IEMA) derives from the theoretical, algorithmic side of the model explanation and aims to embrace ideas developed in cognitive sciences. Its grammar is implemented in the modelStudio framework that adopts interactivity, customisability and automation as its main traits.
Spectral Flow on the Manifold of SPD Matrices for Multimodal Data Processing
Katz, Ori, Lederman, Roy R., Talmon, Ronen
In this paper, we consider data acquired by multimodal sensors capturing complementary aspects and features of a measured phenomenon. We focus on a scenario in which the measurements share mutual sources of variability but might also be contaminated by other measurement-specific sources such as interferences or noise. Our approach combines manifold learning, which is a class of nonlinear data-driven dimension reduction methods, with the well-known Riemannian geometry of symmetric and positive-definite (SPD) matrices. Manifold learning typically includes the spectral analysis of a kernel built from the measurements. Here, we take a different approach, utilizing the Riemannian geometry of the kernels. In particular, we study the way the spectrum of the kernels changes along geodesic paths on the manifold of SPD matrices. We show that this change enables us, in a purely unsupervised manner, to derive a compact, yet informative, description of the relations between the measurements, in terms of their underlying components. Based on this result, we present new algorithms for extracting the common latent components and for identifying common and measurement-specific components.
sktime: A Unified Interface for Machine Learning with Time Series
Löning, Markus, Bagnall, Anthony, Ganesh, Sajaysurya, Kazakov, Viktor, Lines, Jason, Király, Franz J.
Our goal is to extend existing machine learning capabilities, most notably scik it-learn [16], to the temporal data setting by providing a unified interface for several time series learning tasks. Time series data is ubiquitous in many applications. Exampl es include sensor readings from industrial processes, spectroscopy wave length data from chemical samples, or bed -side monitor medical data from patients. There is a broad variety of distinct but closely related learning task s that arise in such contexts, including time series classifi cation, forecasting and annotation among others.