AITopics

Recent work in machine learning increasingly attributes human-like capabilities such as reasoning or theory of mind to large language models (LLMs) on the basis of benchmark performance. This paper examines this practice through the lens of construct validity, understood as the problem of linking theoretical capabilities to their empirical measurements. It contrasts three influential frameworks: the nomological account developed by Cronbach and Meehl, the inferential account proposed by Messick and refined by Kane, and Borsboom's causal account. I argue that the nomological account provides the most suitable foundation for current LLM capability research. It avoids the strong ontological commitments of the causal account while offering a more substantive framework for articulating construct meaning than the inferential account. I explore the conceptual implications of adopting the nomological account for LLM research through a concrete case: the assessment of reasoning capabilities in LLMs.

constructvalidity, large language model, natural language, (17 more...)

2603.15121

Country: Europe > France (0.04)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Kosko, Matthew, J, Falco, Bargagli-Stoffi, null, Wang, Lin, Santacatterina, Michele

Fast Uncertainty Quantification for Kernel-Based Estimators in Large-Scale Causal Inference

Kernel methods are widely used in causal inference for tasks such as treatment effect estimation, policy evaluation, and policy learning. The bootstrap is a standard tool for uncertainty quantification because of its broad applicability. As increasingly large datasets become available, such as the 2023 U.S. Natality data from the National Vital Statistics System (NVSS), which includes 3,596,017 registered births, the computational demands of these methods increase substantially. Kernel methods are known to scale poorly with sample size, and this limitation is further exacerbated by the repeated re-fitting required by the bootstrap. As a result, bootstrap-based inference for kernel-based estimators can become computationally infeasible in large-scale settings. In this paper, we address these challenges by extending the causal Bag of Little Bootstraps (cBLB) algorithm to kernel methods. Our approach achieves computational scalability by combining subsampling and resampling while preserving first-order uncertainty quantification and asymptotically correct coverage. We evaluate the method across three representative implementations: kernelized augmented outcome-weighted learning, kernel-based minimax weighting, and double machine learning with kernel support vector machines. We show in simulations that our method yields confidence intervals with nominal coverage at a fraction of the computational cost. We further demonstrate its utility in a real-world application by estimating the effect of any amount of smoking on birth weight, as well as the optimal treatment regime, using the NVSS dataset, where the standard bootstrap is prohibitively expensive computationally and effectively infeasible at this scale.

artificial intelligence, estimator, machine learning, (17 more...)

2603.13662

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(4 more...)

Genre: Research Report > Experimental Study (0.93)

Industry:

Education (0.67)
Health & Medicine > Therapeutic Area > Immunology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.55)

Gloeckler, Manuel, Manzano-Patrón, J. P., Sotiropoulos, Stamatios N., Schröder, Cornelius, Macke, Jakob H.

Scalable Simulation-Based Model Inference with Test-Time Complexity Control

Simulation plays a central role in scientific discovery. In many applications, the bottleneck is no longer running a simulator; it is choosing among large families of plausible simulators, each corresponding to different forward models/hypotheses consistent with observations. Over large model families, classical Bayesian workflows for model selection are impractical. Furthermore, amortized model selection methods typically hard-code a fixed model prior or complexity penalty at training time, requiring users to commit to a particular parsimony assumption before seeing the data. We introduce PRISM, a simulation-based encoder-decoder that infers a joint posterior over both discrete model structures and associated continuous parameters, while enabling test-time control of model complexity via a tunable model prior that the network is conditioned on. We show that PRISM scales to families with combinatorially many (up to billions) of model instantiations on a synthetic symbolic regression task. As a scientific application, we evaluate PRISM on biophysical modeling for diffusion MRI data, showing the ability to perform model selection across several multi-compartment models, on both synthetic and in vivo neuroimaging data.

artificial intelligence, machine learning, posterior, (18 more...)

2603.15292

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
Asia > Middle East > Jordan (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.86)
Health & Medicine > Diagnostic Medicine > Imaging (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Kuo, En-Ya, Motsch, Sebastien

EmDT: Embedding Diffusion Transformer for Tabular Data Generation in Fraud Detection

Imbalanced datasets pose a difficulty in fraud detection, as classifiers are often biased toward the majority class and perform poorly on rare fraudulent transactions. Synthetic data generation is therefore commonly used to mitigate this problem. In this work, we propose the Clustered Embedding Diffusion-Transformer (EmDT), a diffusion model designed to generate fraudulent samples. Our key innovation is to leverage UMAP clustering to identify distinct fraudulent patterns, and train a Transformer denoising network with sinusoidal positional embeddings to capture feature relationships throughout the diffusion process. Once the synthetic data has been generated, we employ a standard decision-tree-based classifier (e.g., XGBoost) for classification, as this type of model remains better suited to tabular datasets. Experiments on a credit card fraud detection dataset demonstrate that EmDT significantly improves downstream classification performance compared to existing oversampling and generative methods, while maintaining comparable privacy protection and preserving feature correlations present in the original data.

artificial intelligence, deep learning, machine learning, (18 more...)

2603.13566

Country:

North America > United States > Arizona > Maricopa County > Tempe (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report (1.00)

Industry: Law Enforcement & Public Safety > Fraud (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Saad, El Mehdi, Thuot, Victor, Verzelen, Nicolas

The Sampling Complexity of Condorcet Winner Identification in Dueling Bandits

We study best-arm identification in stochastic dueling bandits under the sole assumption that a Condorcet winner exists, i.e., an arm that wins each noisy pairwise comparison with probability at least $1/2$. We introduce a new identification procedure that exploits the full gap matrix $Δ_{i,j}=q_{i,j}-\tfrac12$ (where $q_{i,j}$ is the probability that arm $i$ beats arm $j$), rather than only the gaps between the Condorcet winner and the other arms. We derive high-probability, instance-dependent sample-complexity guarantees that (up to logarithmic factors) improve the best known ones by leveraging informative comparisons beyond those involving the winner. We complement these results with new lower bounds which, to our knowledge, are the first for Condorcet-winner identification in stochastic dueling bandits. Our lower-bound analysis isolates the intrinsic cost of locating informative entries in the gap matrix and estimating them to the required confidence, establishing the optimality of our non-asymptotic bounds. Overall, our results reveal new regimes and trade-offs in the sample complexity that are not captured by asymptotic analyses based only on the expected budget.

artificial intelligence, data mining, machine learning, (21 more...)

2603.15189

Country:

Europe > France > Occitanie > Hérault > Montpellier (0.04)
North America > United States > New York (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(2 more...)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
Information Technology > Data Science > Data Mining > Big Data (0.46)

Dai, Yanning, Wang, Yuhui, Ashley, Dylan R., Schmidhuber, Jürgen

Efficient Morphology-Control Co-Design via Stackelberg Proximal Policy Optimization

Morphology-control co-design concerns the coupled optimization of an agent's body structure and control policy. This problem exhibits a bi-level structure, where the control dynamically adapts to the morphology to maximize performance. Existing methods typically neglect the control's adaptation dynamics by adopting a single-level formulation that treats the control policy as fixed when optimizing morphology. This can lead to inefficient optimization, as morphology updates may be misaligned with control adaptation. In this paper, we revisit the co-design problem from a game-theoretic perspective, modeling the intrinsic coupling between morphology and control as a novel variant of a Stackelberg game. We propose Stackelberg Proximal Policy Optimization (Stackelberg PPO), which explicitly incorporates the control's adaptation dynamics into morphology optimization. By modeling this intrinsic coupling, our method aligns morphology updates with control adaptation, thereby stabilizing training and improving learning efficiency. Experiments across diverse co-design tasks demonstrate that Stackelberg PPO outperforms standard PPO in both stability and final performance, opening the way for dramatically more efficient robotics designs.

artificial intelligence, machine learning, morphology, (18 more...)

2603.15388

Country:

Europe > Switzerland (0.04)
Europe > Denmark (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
Information Technology > Game Theory (0.88)
Information Technology > Artificial Intelligence > Robots (0.67)

High-Probability Bounds for SGD under the Polyak-Lojasiewicz Condition with Markovian Noise

Kar, Avik, Chandak, Siddharth, Singh, Rahul, Moulines, Eric, Bhatnagar, Shalabh, Bambos, Nicholas

We present the first uniform-in-time high-probability bound for SGD under the PL condition, where the gradient noise contains both Markovian and martingale difference components. This significantly broadens the scope of finite-time guarantees, as the PL condition arises in many machine learning and deep learning models while Markovian noise naturally arises in decentralized optimization and online system identification problems. We further allow the magnitude of noise to grow with the function value, enabling the analysis of many practical sampling strategies. In addition to the high-probability guarantee, we establish a matching $1/k$ decay rate for the expected suboptimality. Our proof technique relies on the Poisson equation to handle the Markovian noise and a probabilistic induction argument to address the lack of almost-sure bounds on the objective. Finally, we demonstrate the applicability of our framework by analyzing three practical optimization problems: token-based decentralized linear regression, supervised learning with subsampling for privacy amplification, and online system identification.

artificial intelligence, assumption 2, machine learning, (19 more...)

2603.14514

Country:

North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

When Should Humans Step In? Optimal Human Dispatching in AI-Assisted Decisions

Tan, Lezhi, Sagan, Naomi, Lei, Lihua, Blanchet, Jose

AI systems increasingly assist human decision making by producing preliminary assessments of complex inputs. However, such AI-generated assessments can often be noisy or systematically biased, raising a central question: how should costly human effort be allocated to correct AI outputs where it matters the most for the final decision? We propose a general decision-theoretic framework for human-AI collaboration in which AI assessments are treated as factor-level signals and human judgments as costly information that can be selectively acquired. We consider cases where the optimal selection problem reduces to maximizing a reward associated with each candidate subset of factors, and turn policy design into reward estimation. We develop estimation procedures under both nonparametric and linear models, covering contextual and non-contextual selection rules. In the linear setting, the optimal rule admits a closed-form expression with a clear interpretation in terms of factor importance and residual variance. We apply our framework to AI-assisted peer review. Our approach substantially outperforms LLM-only predictions and achieves performance comparable to full human review while using only 20-30% of the human information. Across different selection rules, we find that simpler rules derived under linear models can significantly reduce computational cost without harming final prediction performance. Our results highlight both the value of human intervention and the efficiency of principled dispatching.

large language model, machine learning, natural language, (21 more...)

2603.13688

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
Asia > Taiwan (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)