AITopics | Government

Collaborating Authors

Government

Sample-Efficient Tabular Self-Play for Offline Robust Reinforcement Learning

Li, Na, Zheng, Zewu, Ni, Wei, Shan, Hangguan, Zhang, Wenjie, Li, Xinyu

arXiv.org Machine LearningDec-2-2025

Multi-agent reinforcement learning (MARL), as a thriving field, explores how multiple agents independently make decisions in a shared dynamic environment. Due to environmental uncertainties, policies in MARL must remain robust to tackle the sim-to-real gap. We focus on robust two-player zero-sum Markov games (TZMGs) in offline settings, specifically on tabular robust TZMGs (RTZMGs). We propose a model-based algorithm (\textit{RTZ-VI-LCB}) for offline RTZMGs, which is optimistic robust value iteration combined with a data-driven Bernstein-style penalty term for robust value estimation. By accounting for distribution shifts in the historical dataset, the proposed algorithm establishes near-optimal sample complexity guarantees under partial coverage and environmental uncertainty. An information-theoretic lower bound is developed to confirm the tightness of our algorithm's sample complexity, which is optimal regarding both state and action spaces. To the best of our knowledge, RTZ-VI-LCB is the first to attain this optimality, sets a new benchmark for offline RTZMGs, and is validated experimentally.

algorithm, probability, rtzmg, (16 more...)

arXiv.org Machine Learning

2512.00352

Country:

North America > United States (0.14)
Oceania > Australia > New South Wales (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report > Experimental Study (1.00)

Industry:

Information Technology (0.67)
Government (0.46)
Leisure & Entertainment > Games > Computer Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Crowdsourcing the Frontier: Advancing Hybrid Physics-ML Climate Simulation via a $50,000 Kaggle Competition

Lin, Jerry, Hu, Zeyuan, Beucler, Tom, Frields, Katherine, Christensen, Hannah, Hannah, Walter, Heuer, Helge, Ukkonnen, Peter, Mansfield, Laura A., Zheng, Tian, Peng, Liran, Gupta, Ritwik, Gentine, Pierre, Al-Naher, Yusef, Duan, Mingjiang, Hattori, Kyo, Ji, Weiliang, Li, Chunhan, Matsuda, Kippei, Murakami, Naoki, Ron, Shlomo, Serlin, Marec, Song, Hongjian, Tanabe, Yuma, Yamamoto, Daisuke, Zhou, Jianyao, Pritchard, Mike

arXiv.org Artificial IntelligenceDec-2-2025

Subgrid machine-learning (ML) parameterizations have the potential to introduce a new generation of climate models that incorporate the effects of higher-resolution physics without incurring the prohibitive computational cost associated with more explicit physics-based simulations. However, important issues, ranging from online instability to inconsistent online performance, have limited their operational use for long-term climate projections. To more rapidly drive progress in solving these issues, domain scientists and machine learning researchers opened up the offline aspect of this problem to the broader machine learning and data science community with the release of ClimSim, a NeurIPS Datasets and Benchmarks publication, and an associated Kaggle competition. This paper reports on the downstream results of the Kaggle competition by coupling emulators inspired by the winning teams' architectures to an interactive climate model (including full cloud microphysics, a regime historically prone to online instability) and systematically evaluating their online performance. Our results demonstrate that online stability in the low-resolution, real-geography setting is reproducible across multiple diverse architectures, which we consider a key milestone. All tested architectures exhibit strikingly similar offline and online biases, though their responses to architecture-agnostic design choices (e.g., expanding the list of input variables) can differ significantly. Multiple Kaggle-inspired architectures achieve state-of-the-art (SOTA) results on certain metrics such as zonal mean bias patterns and global RMSE, indicating that crowdsourcing the essence of the offline problem is one path to improving online performance in hybrid physics-AI climate simulation.

artificial intelligence, machine learning, social media, (18 more...)

arXiv.org Artificial Intelligence

2511.20963

Country:

Europe (1.00)
North America > United States > California (0.46)
North America > United States > Maryland (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Education (0.67)
Energy (0.67)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Communications > Social Media > Crowdsourcing (0.70)

Add feedback

Domain-Decomposed Graph Neural Network Surrogate Modeling for Ice Sheets

Propp, Adrienne M., Perego, Mauro, Cyr, Eric C., Gruber, Anthony, Howard, Amanda A., Heinlein, Alexander, Stinis, Panos, Tartakovsky, Daniel M.

arXiv.org Artificial IntelligenceDec-2-2025

Accurate yet efficient surrogate models are essential for large-scale simulations of partial differential equations (PDEs), particularly for uncertainty quantification (UQ) tasks that demand hundreds or thousands of evaluations. We develop a physics-inspired graph neural network (GNN) surrogate that operates directly on unstructured meshes and leverages the flexibility of graph attention. To improve both training efficiency and generalization properties of the model, we introduce a domain decomposition (DD) strategy that partitions the mesh into subdomains, trains local GNN surrogates in parallel, and aggregates their predictions. We then employ transfer learning to fine-tune models across subdomains, accelerating training and improving accuracy in data-limited settings. Applied to ice sheet simulations, our approach accurately predicts full-field velocities on high-resolution meshes, substantially reduces training time relative to training a single global surrogate model, and provides a ripe foundation for UQ objectives. Our results demonstrate that graph-based DD, combined with transfer learning, provides a scalable and reliable pathway for training GNN surrogates on massive PDE-governed systems, with broad potential for application beyond ice sheet dynamics.

artificial intelligence, machine learning, prediction, (17 more...)

arXiv.org Artificial Intelligence

2512.01888

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.86)

Industry:

Energy (1.00)
Government > Regional Government > North America Government > United States Government (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Probabilistic Neuro-Symbolic Reasoning for Sparse Historical Data: A Framework Integrating Bayesian Inference, Causal Models, and Game-Theoretic Allocation

Kublashvili, Saba

arXiv.org Artificial IntelligenceDec-2-2025

Modeling historical events poses fundamental challenges for machine learning: extreme data scarcity (N << 100), heterogeneous and noisy measurements, missing counterfactuals, and the requirement for human interpretable explanations. We present HistoricalML, a probabilistic neuro-symbolic framework that addresses these challenges through principled integration of (1) Bayesian uncertainty quantification to separate epistemic from aleatoric uncertainty, (2) structural causal models for counterfactual reasoning under confounding, (3) cooperative game theory (Shapley values) for fair allocation modeling, and (4) attention based neural architectures for context dependent factor weighting. We provide theoretical analysis showing that our approach achieves consistent estimation in the sparse data regime when strong priors from domain knowledge are available, and that Shapley based allocation satisfies axiomatic fairness guarantees that pure regression approaches cannot provide. We instantiate the framework on two historical case studies: the 19th century partition of Africa (N = 7 colonial powers) and the Second Punic War (N = 2 factions). Our model identifies Germany's +107.9 percent discrepancy as a quantifiable structural tension preceding World War I, with tension factor 36.43 and 0.79 naval arms race correlation. For the Punic Wars, Monte Carlo battle simulations achieve a 57.3 percent win probability for Carthage at Cannae and 57.8 percent for Rome at Zama, aligning with historical outcomes. Counterfactual analysis reveals that Carthaginian political support (support score 6.4 vs Napoleon's 7.1), rather than military capability, was the decisive factor.

artificial intelligence, hannibal, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2512.01723

Country: Europe > Germany (0.27)

Genre: Research Report (0.83)

Industry: Government (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)

Add feedback

Bayesian Ambiguity Contraction-based Adaptive Robust Markov Decision Processes for Adversarial Surveillance Missions

Choi, Jimin, Li, Max Z.

arXiv.org Artificial IntelligenceDec-2-2025

Collaborative Combat Aircraft (CCAs) are envisioned to enable autonomous Intelligence, Surveillance, and Reconnaissance (ISR) missions in contested environments, where adversaries may act strategically to deceive or evade detection. These missions pose challenges due to model uncertainty and the need for safe, real-time decision-making. Robust Markov Decision Processes (RMDPs) provide worst-case guarantees but are limited by static ambiguity sets that capture initial uncertainty without adapting to new observations. This paper presents an adaptive RMDP framework tailored to ISR missions with CCAs. We introduce a mission-specific formulation in which aircraft alternate between movement and sensing states. Adversarial tactics are modeled as a finite set of transition kernels, each capturing assumptions about how adversarial sensing or environmental conditions affect rewards. Our approach incrementally refines policies by eliminating inconsistent threat models, allowing agents to shift from conservative to aggressive behaviors while maintaining robustness. We provide theoretical guarantees showing that the adaptive planner converges as credible sets contract to the true threat and maintains safety under uncertainty. Experiments under Gaussian and non-Gaussian threat models across diverse network topologies show higher mission rewards and fewer exposure events compared to nominal and static robust planners.

ambiguity, machine learning, real time system, (20 more...)

arXiv.org Artificial Intelligence

2512.0166

Country: North America > United States > Michigan (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Transportation > Air (1.00)
Aerospace & Defense > Aircraft (1.00)
Government > Military > Air Force (0.48)
(2 more...)

Technology:

Information Technology > Architecture > Real Time Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.85)

Add feedback

SynthStrategy: Extracting and Formalizing Latent Strategic Insights from LLMs in Organic Chemistry

Armstrong, Daniel, Jončev, Zlatko, Bran, Andres M, Schwaller, Philippe

arXiv.org Artificial IntelligenceDec-2-2025

Modern computer-assisted synthesis planning (CASP) systems show promises at generating chemically valid reaction steps but struggle to incorporate strategic considerations such as convergent assembly, protecting group minimization, and optimal ring-forming sequences. We introduce a methodology that leverages Large Language Models to distill synthetic knowledge into code. Our system analyzes synthesis routes and translates strategic principles into Python functions representing diverse strategic and tactical rules, such as strategic functional group interconversions and ring construction strategies. By formalizing this knowledge as verifiable code rather than simple heuristics, we create testable, interpretable representations of synthetic strategy. We release the complete codebase and the USPTO-ST dataset -- synthesis routes annotated with strategic tags. This framework unlocks a novel capability for CASP: natural language-based route retrieval, achieving 75\% Top-3 accuracy on our benchmark. We further validate our library through temporal analysis of historical trends and chemically intuitive route clustering that offers more granular partitioning than common previous methods. This work bridges the tactical-strategic divide in CASP, enabling specification, search, and evaluation of routes by strategic criteria rather than structure alone.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2512.01507

Country: North America > United States (0.67)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Government > Regional Government > North America Government > United States Government (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Benchmarking Overton Pluralism in LLMs

Poole-Dayan, Elinor, Wu, Jiayi, Sorensen, Taylor, Pei, Jiaxin, Bakker, Michiel A.

arXiv.org Artificial IntelligenceDec-2-2025

We introduce a novel framework for measuring Overton pluralism in LLMs--the extent to which diverse viewpoints are represented in model outputs. We (i) formalize Overton pluralism as a set coverage metric (OvertonScore), (ii) conduct a large-scale U.S.-representative human study (N = 1209; 60 questions; 8 LLMs), and (iii) develop an automated benchmark that closely reproduces human judgments. On average, models achieve OvertonScores of 0.35--0.41, with DeepSeek V3 performing best; yet all models remain far below the theoretical maximum of 1.0, revealing substantial headroom for improvement. Because repeated large-scale human studies are costly and slow, scalable evaluation tools are essential for model development. Hence, we propose an automated benchmark that achieves high rank correlation with human judgments ($ρ=0.88$), providing a practical proxy without replacing human assessment. By turning pluralistic alignment from a normative aim into a measurable benchmark, our work establishes a foundation for systematic progress toward more pluralistic LLMs.

benchmark, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2512.01351

Country: North America > United States (1.00)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Questionnaire & Opinion Survey (0.93)

Industry: Government > Regional Government > North America Government > United States Government (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

EmoRAG: Evaluating RAG Robustness to Symbolic Perturbations

Zhou, Xinyun, Li, Xinfeng, Peng, Yinan, Xu, Ming, Zhang, Xuanwang, Yu, Miao, Wang, Yidong, Jia, Xiaojun, Wang, Kun, Wen, Qingsong, Wang, XiaoFeng, Dong, Wei

arXiv.org Artificial IntelligenceDec-2-2025

Retrieval-Augmented Generation (RAG) systems are increasingly central to robust AI, enhancing large language model (LLM) faithfulness by incorporating external knowledge. However, our study unveils a critical, overlooked vulnerability: their profound susceptibility to subtle symbolic perturbations, particularly through near-imperceptible emoticon tokens such as "(@_@)" that can catastrophically mislead retrieval, termed EmoRAG. We demonstrate that injecting a single emoticon into a query makes it nearly 100% likely to retrieve semantically unrelated texts that contain a matching emoticon. Our extensive experiment across general question-answering and code domains, using a range of state-of-the-art retrievers and generators, reveals three key findings: (I) Single-Emoticon Disaster: Minimal emoticon injections cause maximal disruptions, with a single emoticon almost 100% dominating RAG output. (II) Positional Sensitivity: Placing an emoticon at the beginning of a query can cause severe perturbation, with F1-Scores exceeding 0.92 across all datasets. (III) Parameter-Scale Vulnerability: Counterintuitively, models with larger parameters exhibit greater vulnerability to the interference. We provide an in-depth analysis to uncover the underlying mechanisms of these phenomena. Furthermore, we raise a critical concern regarding the robustness assumption of current RAG systems, envisioning a threat scenario where an adversary exploits this vulnerability to manipulate the RAG system. We evaluate standard defenses and find them insufficient against EmoRAG. To address this, we propose targeted defenses, analyzing their strengths and limitations in mitigating emoticon-based perturbations. Finally, we outline future directions for building robust RAG systems.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2512.01335

Country:

Asia > China (0.46)
North America > United States (0.28)

Genre: Research Report > New Finding (0.67)

Industry:

Information Technology > Security & Privacy (1.00)
Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Social Media Data Mining of Human Behaviour during Bushfire Evacuation

Wu, Junfeng, Zhou, Xiangmin, Kuligowski, Erica, Singh, Dhirendra, Ronchi, Enrico, Kinateder, Max

arXiv.org Artificial IntelligenceDec-2-2025

Traditional data sources on bushfire evacuation behaviour, such as quantitative surveys and manual observations have severe limitations. Mining social media data related to bushfire evacuations promises to close this gap by allowing the collection and processing of a large amount of behavioural data, which are low-cost, accurate, possibly including location information and rich contextual information. However, social media data have many limitations, such as being scattered, incomplete, informal, etc. Together, these limitations represent several challenges to their usefulness to better understand bushfire evacuation. To overcome these challenges and provide guidance on which and how social media data can be used, this scoping review of the literature reports on recent advances in relevant data mining techniques. In addition, future applications and open problems are discussed. We envision future applications such as evacuation model calibration and validation, emergency communication, personalised evacuation training, and resource allocation for evacuation preparedness. We identify open problems such as data quality, bias and representativeness, geolocation accuracy, contextual understanding, crisis-specific lexicon and semantics, and multimodal data interpretation.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2512.01262

Country:

North America > United States (1.00)
Oceania > Australia (0.93)

Genre:

Overview (1.00)
Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Transportation > Infrastructure & Services (0.93)
Information Technology > Services (0.69)
(5 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.89)
(2 more...)

Add feedback

On the Tension Between Optimality and Adversarial Robustness in Policy Optimization

Li, Haoran, Lv, Jiayu, Han, Congying, Zhang, Zicheng, Li, Anqi, Liu, Yan, Guo, Tiande, Jiang, Nan

arXiv.org Artificial IntelligenceDec-2-2025

Achieving optimality and adversarial robustness in deep reinforcement learning has long been regarded as conflicting goals. Nonetheless, recent theoretical insights presented in CAR suggest a potential alignment, raising the important question of how to realize this in practice. This paper first identifies a key gap between theory and practice by comparing standard policy optimization (SPO) and adversarially robust policy optimization (ARPO). Although they share theoretical consistency, a fundamental tension between robustness and optimality arises in practical policy gradient methods. SPO tends toward convergence to vulnerable first-order stationary policies (FOSPs) with strong natural performance, whereas ARPO typically favors more robust FOSPs at the expense of reduced returns. Furthermore, we attribute this tradeoff to the reshaping effect of the strongest adversary in ARPO, which significantly complicates the global landscape by inducing deceptive sticky FOSPs. This improves robustness but makes navigation more challenging. To alleviate this, we develop the BARPO, a bilevel framework unifying SPO and ARPO by modulating adversary strength, thereby facilitating navigability while preserving global optima. Extensive empirical results demonstrate that BARPO consistently outperforms vanilla ARPO, providing a practical approach to reconcile theoretical and empirical performance.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2512.01228

Country: North America > United States > Illinois (0.27)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (0.93)
Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Add feedback