AITopics | theorem iv

Collaborating Authors

theorem iv

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Distributionally Robust K-Means Clustering

Malik, Vikrant, Kargin, Taylan, Hassibi, Babak

arXiv.org Machine LearningApr-14-2026

In recent years, the widespreadavailability of large-scale, high-dimensionaldatasets has driven significant interest in clustering algorithms that are both computationally efficient and robust to distributional shifts and outliers. The classical clustering method, K-means, can be seen as an application of the Lloyd-Max quantization algorithm, in which the distribution being quantized is the empirical distribution of the points to be clustered. This empirical distribution generally differs from the true underlying distribution, especially when the number of points to be clustered is small. This induces a distributional shift, which can also arise in many real-world settings, such as image segmentation, biological data analysis, and sensor networks, due to noise variations, sensor inaccuracies, or environmental changes. Distributional shifts can severely impact the performance of clustering algorithms, leading to degraded cluster assignments and unreliable downstream analysis. The field of clustering has a rich history. One of the most popular algorithms in this field is theK-means (KM) algorithm, introduced by [1], which computes centroids by iteratively updating the conditional mean of the data in the Voronoi regions induced by the centroids. However, standardK-means is sensitive to initialization and, in general, converges only to a local minimum.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

2604.11118

Country:

North America > United States > California > Los Angeles County > Pasadena (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Alameda County > Oakland (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Dropout Neural Network Training Viewed from a Percolation Perspective

Devlin, Finley, Sanders, Jaron

arXiv.org Machine LearningDec-17-2025

In this work, we investigate the existence and effect of percolation in training deep Neural Networks (NNs) with dropout. Dropout methods are regularisation techniques for training NNs, first introduced by G. Hinton et al. (2012). These methods temporarily remove connections in the NN, randomly at each stage of training, and update the remaining subnetwork with Stochastic Gradient Descent (SGD). The process of removing connections from a network at random is similar to percolation, a paradigm model of statistical physics. If dropout were to remove enough connections such that there is no path between the input and output of the NN, then the NN could not make predictions informed by the data. We study new percolation models that mimic dropout in NNs and characterise the relationship between network topology and this path problem. The theory shows the existence of a percolative effect in dropout. We also show that this percolative effect can cause a breakdown when training NNs without biases with dropout; and we argue heuristically that this breakdown extends to NNs with biases.

dropout, probability, theorem iv, (17 more...)

arXiv.org Machine Learning

2512.13853

Country:

Europe > Netherlands > North Brabant > Eindhoven (0.04)
Europe > Hungary > Hajdú-Bihar County > Debrecen (0.04)
Europe > Czechia (0.04)
Asia > Singapore (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Early science acceleration experiments with GPT-5

Bubeck, Sébastien, Coester, Christian, Eldan, Ronen, Gowers, Timothy, Lee, Yin Tat, Lupsasca, Alexandru, Sawhney, Mehtaab, Scherrer, Robert, Sellke, Mark, Spears, Brian K., Unutmaz, Derya, Weil, Kevin, Yin, Steven, Zhivotovskiy, Nikita

arXiv.org Artificial IntelligenceNov-21-2025

AI models like GPT-5 are an increasingly valuable tool for scientists, but many remain unaware of the capabilities of frontier AI. We present a collection of short case studies in which GPT-5 produced new, concrete steps in ongoing research across mathematics, physics, astronomy, computer science, biology, and materials science. In these examples, the authors highlight how AI accelerated their work, and where it fell short; where expert time was saved, and where human input was still key. We document the interactions of the human authors with GPT-5, as guiding examples of fruitful collaboration with AI. Of note, this paper includes four new results in mathematics (carefully verified by the human authors), underscoring how GPT-5 can help human mathematicians settle previously unsolved problems. These contributions are modest in scope but profound in implication, given the rate at which frontier AI is progressing.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2511.16072

Country:

North America > United States (0.92)
Europe > United Kingdom > England (0.27)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Energy (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Transformers Provably Learn Directed Acyclic Graphs via Kernel-Guided Mutual Information

Cheng, Yuan, Huang, Yu, Xiong, Zhe, Liang, Yingbin, Tan, Vincent Y. F.

arXiv.org Artificial IntelligenceOct-31-2025

Uncovering hidden graph structures underlying real-world data is a critical challenge with broad applications across scientific domains. Recently, transformer-based models leveraging the attention mechanism have demonstrated strong empirical success in capturing complex dependencies within graphs. However, the theoretical understanding of their training dynamics has been limited to tree-like graphs, where each node depends on a single parent. Extending provable guarantees to more general directed acyclic graphs (DAGs) -- which involve multiple parents per node -- remains challenging, primarily due to the difficulty in designing training objectives that enable different attention heads to separately learn multiple different parent relationships. In this work, we address this problem by introducing a novel information-theoretic metric: the kernel-guided mutual information (KG-MI), based on the $f$-divergence. Our objective combines KG-MI with a multi-head attention framework, where each head is associated with a distinct marginal transition kernel to model diverse parent-child dependencies effectively. We prove that, given sequences generated by a $K$-parent DAG, training a single-layer, multi-head transformer via gradient ascent converges to the global optimum in polynomial time. Furthermore, we characterize the attention score patterns at convergence. In addition, when particularizing the $f$-divergence to the KL divergence, the learned attention scores accurately reflect the ground-truth adjacency matrix, thereby provably recovering the underlying graph structure. Experimental results validate our theoretical findings.

machine learning, natural language, node, (17 more...)

arXiv.org Artificial Intelligence

2510.25542

Country:

North America > United States (0.28)
Asia > Singapore (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Lift What You Can: Green Online Learning with Heterogeneous Ensembles

Köbschall, Kirsten, Buschjäger, Sebastian, Fischer, Raphael, Hartung, Lisa, Kramer, Stefan

arXiv.org Artificial IntelligenceOct-30-2025

Ensemble methods for stream mining necessitate managing multiple models and updating them as data distributions evolve. Considering the calls for more sustainability, established methods are however not sufficiently considerate of ensemble members' computational expenses and instead overly focus on predictive capabilities. To address these challenges and enable green online learning, we propose heterogeneous online ensembles (HEROS). For every training step, HEROS chooses a subset of models from a pool of models initialized with diverse hyperparameter choices under resource constraints to train. We introduce a Markov decision process to theoretically capture the trade-offs between predictive performance and sustainability constraints. Based on this framework, we present different policies for choosing which models to train on incoming data. Most notably, we propose the novel $ζ$-policy, which focuses on training near-optimal models at reduced costs. Using a stochastic model, we theoretically prove that our $ζ$-policy achieves near optimal performance while using fewer resources compared to the best performing policy. In our experiments across 11 benchmark datasets, we find empiric evidence that our $ζ$-policy is a strong contribution to the state-of-the-art, demonstrating highly accurate performance, in some cases even outperforming competitors, and simultaneously being much more resource-friendly.

artificial intelligence, data mining, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2509.18962

Country:

North America > United States (0.68)
Europe > Germany (0.68)
South America > Brazil (0.46)

Genre: Research Report > New Finding (0.48)

Industry:

Education > Educational Setting > Online (1.00)
Government (0.93)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.61)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.54)
(2 more...)

Add feedback

Discrete Functional Geometry of ReLU Networks via ReLU Transition Graphs

Dhayalkar, Sahil Rajesh

arXiv.org Artificial IntelligenceSep-5-2025

We extend the ReLU Transition Graph (RTG) framework into a comprehensive graph-theoretic model for understanding deep ReLU networks. In this model, each node represents a linear activation region, and edges connect regions that differ by a single ReLU activation flip, forming a discrete geometric structure over the network's functional behavior. We prove that RTGs at random initialization exhibit strong expansion, binomial degree distributions, and spectral properties that tightly govern generalization. These structural insights enable new bounds on capacity via region entropy and on generalization via spectral gap and edge-wise KL divergence. Empirically, we construct RTGs for small networks, measure their smoothness and connectivity properties, and validate theoretical predictions. Our results show that region entropy saturates under overparameterization, spectral gap correlates with generalization, and KL divergence across adjacent regions reflects functional smoothness. This work provides a unified framework for analyzing ReLU networks through the lens of discrete functional geometry, offering new tools to understand, diagnose, and improve generalization.

artificial intelligence, deep learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2509.03056

Country: North America > United States > California (0.46)

Genre: Research Report > New Finding (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

An Asynchronous Decentralised Optimisation Algorithm for Nonconvex Problems

Mafakheri, Behnam, Manton, Jonathan H., Shames, Iman

arXiv.org Artificial IntelligenceJul-31-2025

In this paper, we consider nonconvex decentralised optimisation and learning over a network of distributed agents. We develop an ADMM algorithm based on the Randomised Block Coordinate Douglas-Rachford splitting method which enables agents in the network to distributedly and asynchronously compute a set of first-order stationary solutions of the problem. To the best of our knowledge, this is the first decentralised and asynchronous algorithm for solving nonconvex optimisation problems with convergence proof. The numerical examples demonstrate the efficiency of the proposed algorithm for distributed Phase Retrieval and sparse Principal Component Analysis problems.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2507.22311

Country: Oceania > Australia (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)

Add feedback

Diffusion-Based Hypothesis Testing and Change-Point Detection

Moushegian, Sean, Banerjee, Taposh, Tarokh, Vahid

arXiv.org Machine LearningJun-23-2025

Score-based methods have recently seen increasing popularity in modeling and generation. Methods have been constructed to perform hypothesis testing and change-point detection with score functions, but these methods are in general not as powerful as their likelihood-based peers. Recent works consider generalizing the score-based Fisher divergence into a diffusion-divergence by transforming score functions via multiplication with a matrix-valued function or a weight matrix. In this paper, we extend the score-based hypothesis test and change-point detection stopping rule into their diffusion-based analogs. Additionally, we theoretically quantify the performance of these diffusion-based algorithms and study scenarios where optimal performance is achievable. We propose a method of numerically optimizing the weight matrix and present numerical simulations to illustrate the advantages of diffusion-based algorithms.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

2506.16089

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > North Carolina > Durham County > Durham (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.62)

Add feedback

Proximal Operators of Sorted Nonconvex Penalties

Gagneux, Anne, Massias, Mathurin, Soubies, Emmanuel

arXiv.org Artificial IntelligenceJun-19-2025

--This work studies the problem of sparse signal recovery with automatic grouping of variables. T o this end, we investigate sorted nonsmooth penalties as a regularization approach for generalized linear models. These penalties are designed to promote clustering of variables due to their sorted nature, while the nonconvexity reduces the shrinkage of coefficients. Our goal is to provide efficient ways to compute their proximal operator, enabling the use of popular proximal algorithms to solve composite optimization problems with this choice of sorted penalties. We distinguish between two classes of problems: the weakly convex case where computing the proximal operator remains a convex problem, and the nonconvex case where computing the proximal operator becomes a challenging nonconvex combinatorial problem. We demonstrate the practical interest of using such penalties on several experiments. R is a data-fidelity term and the penalty Ψ is a regularization term that should embed some properties of the solution. Among them, sparsity and structure are particularly useful for a model as they improve its in-terpretability and decrease its complexity. Sparsity is most usually enforced through a penalty term favoring variable selection, i.e. solutions that use only a subset of features.

artificial intelligence, machine learning, penalty, (18 more...)

arXiv.org Artificial Intelligence

2506.15315

Genre: Research Report (0.64)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.34)

Add feedback

Grothendieck Graph Neural Networks Framework: An Algebraic Platform for Crafting Topology-Aware GNNs

Langari, Amirreza Shiralinasab, Yeganeh, Leila, Nguyen, Kim Khoa

arXiv.org Artificial IntelligenceDec-11-2024

Due to the structural limitations of Graph Neural Networks (GNNs), in particular with respect to conventional neighborhoods, alternative aggregation strategies have recently been investigated. This paper investigates graph structure in message passing, aimed to incorporate topological characteristics. While the simplicity of neighborhoods remains alluring, we propose a novel perspective by introducing the concept of 'cover' as a generalization of neighborhoods. We design the Grothendieck Graph Neural Networks (GGNN) framework, offering an algebraic platform for creating and refining diverse covers for graphs. This framework translates covers into matrix forms, such as the adjacency matrix, expanding the scope of designing GNN models based on desired message-passing strategies. Leveraging algebraic tools, GGNN facilitates the creation of models that outperform traditional approaches. Based on the GGNN framework, we propose Sieve Neural Networks (SNN), a new GNN model that leverages the notion of sieves from category theory. SNN demonstrates outstanding performance in experiments, particularly on benchmarks designed to test the expressivity of GNNs, and exemplifies the versatility of GGNN in generating novel architectures.

artificial intelligence, graph, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2412.08835

Country:

North America > Canada > Quebec > Montreal (0.04)
Asia > Middle East > Iran > Kerman Province > Kerman (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback