AITopics | sgn

Collaborating Authors

sgn

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Bayesian Best-Arm Identification with Abstention: A Polynomial-to-Exponential Phase Transition

Huang, Yuqi, Hou, Yunlong, Tan, Vincent Y. F.

arXiv.org Machine LearningJun-30-2026

We study the Bayesian fixed-budget best-arm identification problem in which a learner can abstain from making a terminal recommendation. Subject to an abstention budget $α$, we analyze the probability of undetected error--the risk of recommending a suboptimal arm without abstaining. Our central finding is that abstention induces a phase transition: without abstention, the error probability decays polynomially in the sampling budget $T$; in contrast, introducing any small positive abstention budget shifts this to an exponential decay. For Gaussian priors and rewards, in the regime $T\to\infty$ followed by $α\downarrow0$, we establish exact matching information-theoretic lower bounds and algorithmic upper bounds on the optimal error exponent, which takes the form $\exp(-\frac{α^{2}T}{8κ_ν^{2}})$. The hardness parameter $κ_ν$ represents the prior density of the top-two gap at zero, highlighting that nearly tied instances drive the fundamental error. We introduce an adaptive algorithm, PGWS, that successfully achieves this optimal exponent by expending its abstention budget on statistically ambiguous instances. We further demonstrate that this polynomial-to-exponential improvement is exclusively a Bayesian phenomenon--in the frequentist setting, abstention only affects lower-order exponent terms. We also extend our results beyond the Gaussian model.

artificial intelligence, bayesian inference, machine learning, (19 more...)

arXiv.org Machine Learning

2606.29203

Country: North America > United States > New York (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)

Add feedback

SGD Provably Prioritizes a Shortcut Spurious Feature in the XOR Model

LaBonte, Tyler, Muthukumar, Vidya

arXiv.org Machine LearningJun-30-2026

Neural networks are known to be susceptible to over-reliance on spurious correlations. However, the precise mechanism by which models exploit shortcut features is not fully understood, and algorithms to mitigate this behavior rely on as yet unjustified assumptions about the learned representations. In this work, we provide the first end-to-end theoretical characterization of spurious feature learning for two-layer ReLU neural networks trained by online minibatch SGD on the logistic loss. We consider data drawn from the high-dimensional Boolean hypercube with a quadratic signal function (namely XOR) and a linear spurious correlation. We show that SGD learns the spurious feature first, and exponentially fast. Moreover, the optimization dynamics couple the spurious and signal features, with a stronger spurious component inhibiting signal feature learning. Our analysis reveals precise phase transitions in the learning dynamics. In the first phase, alignment between the signs of the spurious feature and second-layer weight drives rapid growth of the spurious feature. In the second phase, large majority group margin slows learning and the signal feature remains suppressed. When the spurious correlation is maximally strong, we show theoretically that the spurious feature dominates even at the sample complexity threshold where XOR would be learned in isolation (i.e., if the spurious feature was absent). In contrast, when the correlation strength is constant, we provide preliminary empirical evidence that the model can eventually learn the XOR signal, although the spurious feature is not forgotten.

artificial intelligence, deep learning, machine learning, (20 more...)

arXiv.org Machine Learning

2606.30444

Genre: Research Report (0.50)

Industry: Health & Medicine (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.45)

Add feedback

Regret Lower Bounds for Decentralized Multi-Agent Stochastic Shortest Path Problems

Neural Information Processing SystemsJun-23-2026, 04:21:32 GMT

Multi-agent systems (MAS) are central to applications such as swarm robotics and traffic routing, where agents must coordinate in a decentralized manner to achieve a common objective. Stochastic Shortest Path (SSP) problems provide a natural framework for modeling decentralized control in such settings. While the problem of learning in SSP has been extensively studied in single-agent settings, the decentralized multi-agent variant remains largely unexplored. In this work, we take a step towards addressing that gap. We study decentralized multi-agent SSPs (Dec-MASSPs) under linear function approximation, where the transition dynamics and costs are represented using linear models. Applying novel symmetry-based arguments, we identify the structure of optimal policies. Our main contribution is the first regret lower bound for this setting based on the construction of hard-tolearn instances for any number of agents, n. Our regret lower bound of Ω( K), over K episodes, highlights the inherent learning difficulty in Dec-MASSPs. These insights clarify the learning complexity of decentralized control and can further guide the design of efficient learning algorithms in multi-agent systems.

artificial intelligence, equation, sgn, (14 more...)

Neural Information Processing Systems

Country: Asia > India (0.28)

Genre: Research Report > Experimental Study (0.92)

Industry:

Information Technology (0.45)
Transportation (0.45)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.49)

Add feedback

Online Strategic Classification with Noise and Partial Feedback

Neural Information Processing SystemsJun-23-2026, 00:26:02 GMT

In this paper, we study an online strategic classification problem, where a principal aims to learn an accurate binary linear classifier from interactions with sequentially arriving agents. For each agent, the principal announces a classifier. The agent can strategically exercise costly manipulations on his features to be classified as the favorable positive class. The principal is unaware of the true featurelabel relationship, but observes all reported features and only labels of positively classified agents. We assume that the true feature-label relationship is given by a halfspace model subject to arbitrary feature-dependent but bounded noise (i.e., Massart noise). This problem faces the combined challenges of agents' strategic feature manipulations, partial feedback observations, and label noise. We tackle these challenges by a novel learning algorithm. We show that the proposed algorithm yields classifiers that converge to the clairvoyant optimal classifier and attains a regret rate of O( T) up to poly-logarithmic and constant factors over T cycles.

agent, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Industry: Education > Educational Setting > Online (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.45)

Add feedback

Global Minimizers of ℓp-Regularized Objectives Yield the Sparsest ReLU Neural Networks

Neural Information Processing SystemsJun-19-2026, 05:22:26 GMT

Overparameterized neural networks can interpolate a given dataset in many different ways, prompting the fundamental question: which among these solutions should we prefer, and what explicit regularization strategies will provably yield these solutions? This paper addresses the challenge of finding the sparsest interpolating ReLU network--i.e., the network with the fewest nonzero parameters or neurons--a goal with wide-ranging implications for efficiency, generalization, interpretability, theory, and model compression. Unlike post hoc pruning approaches, we propose a continuous, almost-everywhere differentiable training objective whose global minima are guaranteed to correspond to the sparsest singlehidden-layer ReLU networks that fit the data. This result marks a conceptual advance: it recasts the combinatorial problem of sparse interpolation as a smooth optimization task, potentially enabling the use of gradient-based training methods. Our objective is based on minimizing ℓp quasinorms of the weights for 0 < p < 1, a classical sparsity-promoting strategy in finite-dimensional settings. However, applying these ideas to neural networks presents new challenges: the function class is infinite-dimensional, and the weights are learned using a highly nonconvex objective. We prove that, under our formulation, global minimizers correspond exactly to sparsest solutions. Our work lays a foundation for understanding when and how continuous sparsity-inducing objectives can be leveraged to recover sparse networks through training.

artificial intelligence, machine learning, sout, (19 more...)

Neural Information Processing Systems

Country: North America > United States > Wisconsin (0.28)

Genre: Research Report > Experimental Study (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Understanding the Generalization of Stochastic Gradient Adam in Learning Neural Networks

Neural Information Processing SystemsJun-15-2026, 07:41:49 GMT

Adam is a popular and widely used adaptive gradient method in deep learning, which has also received tremendous focus in theoretical research. However, most existing theoretical work primarily analyzes its full-batch version, which differs fundamentally from the stochastic variant used in practice. Unlike SGD, stochastic Adam does not converge to its full-batch counterpart even with infinitesimal learning rates. We present the first theoretical characterization of how batch size affects Adam's generalization, analyzing two-layer over-parameterized CNNs on image data. Our results reveal that while both Adam and AdamW with proper weight decay λ converge to poor test error solutions, their mini-batch variants can achieve near-zero test error. We further prove Adam has a strictly smaller effective weight decay bound than AdamW, theoretically explaining why Adam requires more sensitive λtuning.

adamw, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ec4f0b0a7557d6a51c42308800f2c23a-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-30-2026, 05:08:18 GMT

Let (x,y)be a binary classification task that admits a smooth separator as in Assumption 1. Then, there exists an RLC with neural network fθ and absolutely continuous randomness source u (Assumption 2) that is universal in the limit, i.e., Fθ (x) = y(x), x X, and makes random predictions that are correct with probability P(maj({sgn( a Further, if p is the number of parameters used by a deterministic neural network with one hidden layer to achieve zero-error in the task, fθ has at most p p +O(1)parameters. Since Assumption 1 holds3, there exists a single hidden-layer neural network N that, like s, achieves zero-error in this task [8]. Further, since sgn is nonpolynomial, we can use it as the nonlinearity of this network [21]. Putting it all together, there exists a number of hidden units M and parameters bj,oj R,wj Rd for j = 1,...,M such that N(x):= Note that this means we can achieve zero-error in classification, N(x) = y(x), x X.

artificial intelligence, machine learning, proposition 3, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Faster Directional Convergence of Linear Neural Networks under Spherically Symmetric Data

Neural Information Processing SystemsApr-25-2026, 04:00:28 GMT

In this paper, we study gradient methods for training deep linear neural networks with binary cross-entropy loss. In particular, we show global directional convergence guarantees from a polynomial rate to a linear rate for (deep) linear networks with spherically symmetric data distribution, which can be viewed as a specific zero-margin dataset. Our results do not require the assumptions in other works such as small initial loss, presumed convergence of weight direction, or overparameterization. We also characterize our findings in experiments.

artificial intelligence, convergence, machine learning, (14 more...)

Neural Information Processing Systems

Country: Asia (0.28)

Genre: Research Report > New Finding (1.00)

Technology: