predictive performance
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Asia > Middle East > Israel (0.04)
- North America > United States (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Denmark (0.05)
- North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
- Europe > United Kingdom > England > Nottinghamshire > Nottingham (0.14)
- North America > United States > Wisconsin (0.04)
- North America > United States > Florida > Broward County (0.04)
- (3 more...)
- Information Technology > Data Science > Data Mining (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.30)
- Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.30)
The Powers of Precision: Structure-Informed Detection in Complex Systems -- From Customer Churn to Seizure Onset
Santos, Augusto, Santos, Teresa, Rodrigues, Catarina, Moura, José M. F.
Emergent phenomena -- onset of epileptic seizures, sudden customer churn, or pandemic outbreaks -- often arise from hidden causal interactions in complex systems. We propose a machine learning method for their early detection that addresses a core challenge: unveiling and harnessing a system's latent causal structure despite the data-generating process being unknown and partially observed. The method learns an optimal feature representation from a one-parameter family of estimators -- powers of the empirical covariance or precision matrix -- offering a principled way to tune in to the underlying structure driving the emergence of critical events. A supervised learning module then classifies the learned representation. We prove structural consistency of the family and demonstrate the empirical soundness of our approach on seizure detection and churn prediction, attaining competitive results in both. Beyond prediction, and toward explainability, we ascertain that the optimal covariance power exhibits evidence of good identifiability while capturing structural signatures, thus reconciling predictive performance with interpretable statistical structure.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- (4 more...)
- Health & Medicine > Health Care Technology (0.93)
- Health & Medicine > Therapeutic Area > Neurology > Epilepsy (0.34)
How Well Do LLMs Predict Human Behavior? A Measure of their Pretrained Knowledge
Gao, Wayne, Han, Sukjin, Liang, Annie
Large language models (LLMs) are increasingly used in economics as predictive tools--both to generate synthetic responses in place of human subjects (Horton, 2023; Anthis et al., 2025), and to forecast economic outcomes directly (Hewitt et al., 2024a; Faria-e Castro and Leibovici, 2024; Chan-Lau et al., 2025). Their appeal in these roles is obvious: A pretrained LLM embeds a vast amount of information and can be deployed at negligible cost, often in settings where collecting new, domain-specific human data would be expensive or infeasible. What remains unclear is how to assess the quality of these predictions. This paper proposes a measure that quantifies the domain-specific value of LLMs in an interpretable unit: the amount of human data they substitute for. Specifically, we ask how much human data would be required for a conventional model trained on that data to match the predictive performance of the pretrained LLM in that domain.
- North America > United States > Tennessee (0.04)
- North America > United States > Pennsylvania (0.04)
- Asia > Middle East > Israel (0.04)
- Health & Medicine (0.93)
- Government > Regional Government > North America Government > United States Government (0.92)
- Banking & Finance > Economy (0.67)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
A brief note on learning problem with global perspectives
In this brief note, we considers the problem of learning with dynamic-optimizing principal-agent setting, in which the agents are allowed to have global perspectives about the learning process, i.e., the ability to view things according to their relative importances or in their true relations based-on some aggregated information shared by the principal. Whereas, the principal, which is exerting an influence on the learning process of the agents in the aggregation, is primarily tasked to solve a high-level optimization problem posed as an empirical-likelihood estimator under conditional moment restrictions model that also accounts information about the agents' predictive performances on out-of-samples as well as a set of private datasets available only to the principal (e.g., see [1], [2], [3], [4] and [5] for further discussions on empirical likelihood methods with moment restrictions). Here, we provide a coherent mathematical argument which is necessary for characterizing the learning process behind this abstract dynamic-optimizing principal-agent learning framework. Note that, due to the inherent feedbacks behavior among the agents, the proposed learning framework remarkably offers some advantages in terms of stability and consistency, despite that both the principal and the agents do not necessarily need to have any knowledge of the sample distributions or the quality of each others datasets. Finally, it is worth remarking that such a learning framework can provide new insights in the context of collaborative learning problem with global perspectives that exploits the principal-agent setting (e.g., see [6], [7], [8] or [9] for related discussions), although we acknowledge that there are a number of conceptual and theoretical problems, such as small sample properties, still need to be addressed.
- South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
- North America > United States (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Multi-task Modeling for Engineering Applications with Sparse Data
Comlek, Yigitcan, Krishnan, R. Murali, Ravi, Sandipp Krishnan, Moghaddas, Amin, Giorjao, Rafael, Eff, Michael, Samaddar, Anirban, Ramachandra, Nesar S., Madireddy, Sandeep, Wang, Liping
Modern engineering and scientific workflows frequently require simultaneous prediction across related tasks and fidelity levels [1-6]. In such contexts, some outputs are scarce and expensive to obtain, while others are cheaper and more abundant. Multi-task Gaussian processes (MTGPs), also known as multi-output Gaussian processes, offer a principled Bayesian framework to exploit inter-task correlations, enabling knowledge sharing that improves predictive accuracy and reduces the demand for large high-fidelity datasets [7-9]. Over decades of development, MTGPs have been applied across diverse domains, including time series forecasting, multitask optimization, and multifidelity classification, demonstrating their broad utility wherever data cost asymmetries and cross-task dependencies are present [10-16]. The central motivation for MTGPs is to leverage dependencies among related tasks to enhance predictive quality when high-fidelity information is limited [17]. For example, predicting an airfoil's lift coefficient from limited, expensive high-fidelity computational fluid dynamics (CFD) simulations can benefit from correlating with sufficient low-fidelity simulations [3]. Recent work in joint multi-objective and multifidelity optimization has also utilized MT - GPs to balance exploration and exploitation across tasks, improving predictive performance and decision-making by explicitly modeling relationships among outputs and fidelities [12].
- North America > United States > Ohio > Franklin County > Columbus (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
- Government > Regional Government > North America Government > United States Government (1.00)
- Energy (1.00)
- Information Technology > Modeling & Simulation (1.00)
- Information Technology > Data Science (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.35)
ROOFS: RObust biOmarker Feature Selection
Bakhmach, Anastasiia, Dufossé, Paul, Vaglio, Andrea, Monville, Florence, Greillier, Laurent, Barlési, Fabrice, Benzekry, Sébastien
Feature selection (FS) is essential for biomarker discovery and in the analysis of biomedical datasets. However, challenges such as high-dimensional feature space, low sample size, multicollinearity, and missing values make FS non-trivial. Moreover, FS performances vary across datasets and predictive tasks. We propose roofs, a Python package available at https://gitlab.inria.fr/compo/roofs, designed to help researchers in the choice of FS method adapted to their problem. Roofs benchmarks multiple FS methods on the user's data and generates reports that summarize a comprehensive set of evaluation metrics, including downstream predictive performance estimated using optimism correction, stability, reliability of individual features, and true positive and false positive rates assessed on semi-synthetic data with a simulated outcome. We demonstrate the utility of roofs on data from the PIONeeR clinical trial, aimed at identifying predictors of resistance to anti-PD-(L)1 immunotherapy in lung cancer. The PIONeeR dataset contained 374 multi-source blood and tumor biomarkers from 435 patients. A reduced subset of 214 features was obtained through iterative variance inflation factor pre-filtering. Of the 34 FS methods gathered in roofs, we evaluated 23 in combination with 11 classifiers (253 models in total) and identified a filter based on the union of Benjamini-Hochberg false discovery rate-adjusted p-values from t-test and logistic regression as the optimal approach, outperforming other methods including the widely used LASSO. We conclude that comprehensive benchmarking with roofs has the potential to improve the robustness and reproducibility of FS discoveries and increase the translational value of clinical models.
- Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.05)
- North America > United States (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (3 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Health & Medicine > Therapeutic Area > Oncology > Lung Cancer (0.48)