AITopics | syn

Collaborating Authors

syn

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

151de84cca69258b17375e2f44239191-Supplemental.pdf

Neural Information Processing SystemsApr-24-2026, 20:16:23 GMT

artificial intelligence, machine learning, pasta-gan, (14 more...)

Neural Information Processing Systems

Genre: Research Report (0.47)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

dd0151081b1e80e93f1b8aaf0b684c18-Paper-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 08:11:32 GMT

Wespanthe distilled representation of the synthetic domain tothe real domain by also using real training data to remedy the mode collapse of GANs and boost the student network performance in a real domain.

artificial intelligence, machine learning, representation, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

151de84cca69258b17375e2f44239191-Supplemental.pdf

Neural Information Processing SystemsFeb-7-2026, 14:39:29 GMT

ast a-gan, leakyrelu, modulated conv 3 3, (12 more...)

Neural Information Processing Systems

Country:

Europe > Norway (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (0.47)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Algorithmic Guarantees for Distilling Supervised and Offline RL Datasets

Gupta, Aaryan, Saket, Rishi, Raghuveer, Aravindan

arXiv.org Machine LearningDec-2-2025

Given a training dataset, the goal of dataset distillation is to derive a synthetic dataset such that models trained on the latter perform as well as those trained on the training dataset. In this work, we develop and analyze an efficient dataset distillation algorithm for supervised learning, specifically regression in $\mathbb{R}^d$, based on matching the losses on the training and synthetic datasets with respect to a fixed set of randomly sampled regressors without any model training. Our first key contribution is a novel performance guarantee proving that our algorithm needs only $\tilde{O}(d^2)$ sampled regressors to derive a synthetic dataset on which the MSE loss of any bounded linear model is nearly the same as its MSE loss on the given training data. In particular, the model optimized on the synthetic data has close to minimum loss on the training data, thus performing nearly as well as the model optimized on the latter. Complementing this, we also prove a matching lower bound of $Ω(d^2)$ for the number of sampled regressors showing the tightness of our analysis. Our second contribution is to extend our algorithm to offline RL dataset distillation by matching the Bellman loss, unlike previous works which used a behavioral cloning objective. This is the first such method which leverages both, the rewards and the next state information, available in offline RL datasets, without any policy model optimization. Our algorithm generates a synthetic dataset whose Bellman loss with respect to any linear action-value predictor is close to the latter's Bellman loss on the offline RL training dataset. Therefore, a policy associated with an action-value predictor optimized on the synthetic dataset performs nearly as well as that derived from the one optimized on the training data. We conduct experiments to validate our theoretical guarantees and observe performance gains.

dataset, sup train, synthetic dataset, (16 more...)

arXiv.org Machine Learning

2512.00536

Country:

Asia > India (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Understanding LLM Behaviors via Compression: Data Generation, Knowledge Acquisition and Scaling Laws

Pan, Zhixuan, Wang, Shaowen, Li, Jian

arXiv.org Artificial IntelligenceNov-11-2025

Large Language Models (LLMs) have demonstrated remarkable capabilities across numerous tasks, yet principled explanations for their underlying mechanisms and several phenomena, such as scaling laws, hallucinations, and related behaviors, remain elusive. In this work, we revisit the classical relationship between compression and prediction, grounded in Kolmogorov complexity and Shannon information theory, to provide deeper insights into LLM behaviors. By leveraging the Kolmogorov Structure Function and interpreting LLM compression as a two-part coding process, we offer a detailed view of how LLMs acquire and store information across increasing model and data scales -- from pervasive syntactic patterns to progressively rarer knowledge elements. Motivated by this theoretical perspective and natural assumptions inspired by Heap's and Zipf's laws, we introduce a simplified yet representative hierarchical data-generation framework called the Syntax-Knowledge model. Under the Bayesian setting, we show that prediction and compression within this model naturally lead to diverse learning and scaling behaviors observed in LLMs. In particular, our theoretical analysis offers intuitive and principled explanations for both data and model scaling laws, the dynamics of knowledge acquisition during training and fine-tuning, factual knowledge hallucinations in LLMs. The experimental results validate our theoretical predictions.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2504.09597

Country: North America > United States (0.92)

Genre: Research Report > New Finding (0.67)

Industry:

Health & Medicine (0.45)
Education (0.45)

Add feedback

Learning From Simulators: A Theory of Simulation-Grounded Learning

Dudley, Carson, Eisenberg, Marisa

arXiv.org Artificial IntelligenceOct-3-2025

Simulation-Grounded Neural Networks (SGNNs) are predictive models trained entirely on synthetic data from mechanistic simulations. They have achieved state-of-the-art performance in domains where real-world labels are limited or unobserved, but lack a formal underpinning. We place SGNNs in a unified statistical framework. Under standard loss functions, they can be interpreted as amortized Bayesian predictors trained under a simulator-induced prior. Empirical risk minimization then yields convergence to the Bayes-optimal predictor under the synthetic distribution. We employ classical results on distribution shift to characterize how performance degrades when the simulator diverges from reality. Beyond these consequences, we develop SGNN-specific results: (i) conditions under which unobserved scientific parameters are learnable via simulation, and (ii) a back-to-simulation attribution method that provides mechanistic explanations of predictions by linking them to the simulations the model deems similar, with guarantees of posterior consistency. We provide numerical experiments to validate theoretical predictions. SGNNs recover latent parameters, remain robust under mismatch, and outperform classical tools: in a model selection task, SGNNs achieve half the error of AIC in distinguishing mechanistic dynamics. These results establish SGNNs as a principled and practical framework for scientific prediction in data-limited regimes.

artificial intelligence, deep learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2509.1899

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Epidemiology (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Projection-based multifidelity linear regression for data-scarce applications

Sella, Vignesh, Pham, Julie, Willcox, Karen, Chaudhuri, Anirban

arXiv.org Machine LearningAug-13-2025

An important challenge in scientific machine learning is to develop methods that can exploit and maximize the amount of learning possible from scarce data [1-4]. The need for such methods arises often in science and engineering, especially in the case of computational fluid dynamics (CFD), since expensive-to-evaluate high-fidelity (HF) models make many-query problems such as uncertainty quantification, risk analysis, optimization, and optimization under uncertainty computationally prohibitive [5]. Surrogate models that approximate the solutions to HF models can facilitate the design and analysis process; however, lack of sufficient HF data in tandem with high-dimensional quantities of interest adversely affect surrogate model accuracy. We propose multifidelity (MF) linear regression methods that leverage abundant low-cost, lower-fidelity (LF) data alongside limited HF data to construct linear regression models. These models operate within a reduced-dimensional subspace, obtained through the principal component analysis (PCA), to effectively handle both training data scarcity and the high dimensionality (on the order of tens of thousands of quantities of interest) inherent in our problem setting. Linear regression has been widely utilized as a surrogate modeling approach in aerospace applications due to its simplicity and interpretability. We note that linear regression encompasses a broad class of models that are linear in their parameters but can include features that are arbitrarily nonlinear functions of the input variables [6].

artificial intelligence, machine learning, regression, (17 more...)

arXiv.org Machine Learning

2508.08517

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry:

Aerospace & Defense (0.68)
Government > Regional Government > North America Government > United States Government (0.68)
Government > Military (0.68)
Transportation > Air (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

Simple yet Effective Graph Distillation via Clustering

Lai, Yurui, Zhang, Taiyan, Yang, Renchi

arXiv.org Artificial IntelligenceMay-28-2025

Despite plentiful successes achieved by graph representation learning in various domains, the training of graph neural networks (GNNs) still remains tenaciously challenging due to the tremendous computational overhead needed for sizable graphs in practice. Recently, graph data distillation (GDD), which seeks to distill large graphs into compact and informative ones, has emerged as a promising technique to enable efficient GNN training. However, most existing GDD works rely on heuristics that align model gradients or representation distributions on condensed and original graphs, leading to compromised result quality, expensive training for distilling large graphs, or both. Motivated by this, this paper presents an efficient and effective GDD approach, ClustGDD. Under the hood, ClustGDD resorts to synthesizing the condensed graph and node attributes through fast and theoretically-grounded clustering that minimizes the within-cluster sum of squares and maximizes the homophily on the original graph. The fundamental idea is inspired by our empirical and theoretical findings unveiling the connection between clustering and empirical condensation quality using Fréchet Inception Distance, a well-known quality metric for synthetic images. Furthermore, to mitigate the adverse effects caused by the homophily-based clustering, ClustGDD refines the nodal attributes of the condensed graph with a small augmentation learned via class-aware graph sampling and consistency loss. Our extensive experiments exhibit that GNNs trained over condensed graphs output by ClustGDD consistently achieve superior or comparable performance to state-of-the-art GDD methods in terms of node classification on five benchmark datasets, while being orders of magnitude faster.

artificial intelligence, graph, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2505.20807

Country: North America > Canada (0.17)

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

The Value of Information in Multi-Scale Feedback Systems

Di Felice, Louisa Jane, Diaconescu, Ada, Zahadat, Payam, Mellodge, Patricia

arXiv.org Artificial IntelligenceMay-20-2025

Complex adaptive systems (CAS) can be described as systems of information flows dynamically interacting across scales in order to adapt and survive. CAS often consist of many components that work towards a shared goal, and interact across different informational scales through feedback loops, leading to their adaptation. In this context, understanding how information is transmitted among system components and across scales becomes crucial for understanding the behavior of CAS. Shannon entropy, a measure of syntactic information, is often used to quantify the size and rarity of messages transmitted between objects and observers, but it does not measure the value that information has for each specific observer. For this, semantic and pragmatic information have been conceptualized as describing the influence on an observer's knowledge and actions. Building on this distinction, we describe the architecture of multi-scale information flows in CAS through the concept of Multi-Scale Feedback Systems, and propose a series of syntactic, semantic and pragmatic information measures to quantify the value of information flows. While the measurement of values is necessarily context-dependent, we provide general guidelines on how to calculate semantic and pragmatic measures, and concrete examples of their calculation through four case studies: a robotic collective model, a collective decision-making model, a task distribution model, and a hierarchical oscillator model. Our results contribute to an informational theory of complexity, aiming to better understand the role played by information in the behavior of Multi-Scale Feedback Systems.

artificial intelligence, information, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2505.11509

Country:

Europe > Denmark > Capital Region > Copenhagen (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Minnesota (0.04)
(5 more...)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
Information Technology > Artificial Intelligence > Robots (0.91)

Add feedback

TinySQL: A Progressive Text-to-SQL Dataset for Mechanistic Interpretability Research

Quirke, Philip, Neo, Clement, Harrasse, Abir, Nathawani, Dhruv, Abdullah, Amir

arXiv.org Artificial IntelligenceMar-16-2025

Mechanistic interpretability research faces a gap between analyzing simple circuits in toy tasks and discovering features in large models. To bridge this gap, we propose text-to-SQL generation as an ideal task to study, as it combines the formal structure of toy tasks with real-world complexity. We introduce TinySQL, a synthetic dataset progressing from basic to advanced SQL operations, and train models ranging from 33M to 1B parameters to establish a comprehensive testbed for interpretability. We apply multiple complementary interpretability techniques, including edge attribution patching and sparse autoencoders, to identify minimal circuits and components supporting SQL generation. Our analysis reveals both the potential and limitations of current interpretability methods, showing how circuits can vary even across similar queries. Lastly, we demonstrate how mechanistic interpretability can identify flawed heuristics in models and improve synthetic dataset design. Our work provides a comprehensive framework for evaluating and advancing interpretability techniques while establishing clear boundaries for their reliable application.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2503.1273

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback