AITopics | phase ii

Neural networks are known to be susceptible to over-reliance on spurious correlations. However, the precise mechanism by which models exploit shortcut features is not fully understood, and algorithms to mitigate this behavior rely on as yet unjustified assumptions about the learned representations. In this work, we provide the first end-to-end theoretical characterization of spurious feature learning for two-layer ReLU neural networks trained by online minibatch SGD on the logistic loss. We consider data drawn from the high-dimensional Boolean hypercube with a quadratic signal function (namely XOR) and a linear spurious correlation. We show that SGD learns the spurious feature first, and exponentially fast. Moreover, the optimization dynamics couple the spurious and signal features, with a stronger spurious component inhibiting signal feature learning. Our analysis reveals precise phase transitions in the learning dynamics. In the first phase, alignment between the signs of the spurious feature and second-layer weight drives rapid growth of the spurious feature. In the second phase, large majority group margin slows learning and the signal feature remains suppressed. When the spurious correlation is maximally strong, we show theoretically that the spurious feature dominates even at the sample complexity threshold where XOR would be learned in isolation (i.e., if the spurious feature was absent). In contrast, when the correlation strength is constant, we provide preliminary empirical evidence that the model can eventually learn the XOR signal, although the spurious feature is not forgotten.

artificial intelligence, deep learning, machine learning, (20 more...)

arXiv.org Machine Learning

2606.30444

Genre: Research Report (0.50)

Industry: Health & Medicine (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.45)

Add feedback

5f1cb1d23261b19cbd45f90f7b4f251f-Paper-Conference.pdf

Neural Information Processing SystemsJun-17-2026, 16:11:13 GMT

Recent work suggests that large language models (LLMs) can perform multi-hop reasoning implicitly--producing correct answers without explicitly verbalizing intermediate steps--but the underlying mechanisms remain poorly understood. In this paper, we study how such implicit reasoning emerges by training transformers from scratch in a controlled symbolic environment. Our analysis reveals a threestage developmental trajectory: early memorization, followed by in-distribution generalization, and eventually cross-distribution generalization. We find that training with atomic triples is not necessary but accelerates learning, and that second-hop generalization relies on query-level exposure to specific compositional structures. To interpret these behaviors, we introduce two diagnostic tools: cross-query semantic patching, which identifies semantically reusable intermediate representations, and a cosine-based representational lens, which reveals that successful reasoning correlates with the cosine-base clustering in hidden space. This clustering phenomenon in turn provides a coherent explanation for the behavioral dynamics observed across training, linking representational structure to reasoning capability. These findings provide new insights into the interpretability of implicit multi-hop reasoning in LLMs, helping to clarify how complex reasoning processes unfold internally and offering pathways to enhance the transparency of such models.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country: Europe > Austria (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Add feedback

21c426323068204f4199c490d730e88e-Paper-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 20:32:03 GMT

artificial intelligence, machine learning, probability, (16 more...)

Neural Information Processing Systems

Genre: Research Report (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

7058bc192a37f5e5a57398887b05f6f6-Paper-Conference.pdf

Neural Information Processing SystemsFeb-14-2026, 03:25:18 GMT

artificial intelligence, dp-randp, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Virginia (0.04)
North America > Canada > Quebec > Montreal (0.04)

Industry: Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
(2 more...)

Add feedback

7016d7b7b6e3c05b2128ac5b3aae492d-Paper-Conference.pdf

Neural Information Processing SystemsFeb-14-2026, 02:06:09 GMT

artificial intelligence, machine learning, pt ii, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.27)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

Add feedback

a70145bf8b173e4496b554ce57969e24-Supplemental.pdf

Neural Information Processing SystemsFeb-9-2026, 17:05:57 GMT

assumption, decoder, probability, (17 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
North America > Canada (0.04)

Genre:

Workflow (0.46)
Overview (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science (0.92)

Add feedback

Rank-1 Matrix Completion with Gradient Descent and Small Random Initialization

Neural Information Processing SystemsFeb-8-2026, 21:30:30 GMT

Gradient Descent (GD) is a simple yet efficient baseline algorithm for solving nonconvex optimization problems.

artificial intelligence, machine learning, nullx, (17 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.61)

Add feedback

5a378f8490c8d6af8647a753812f6e31-Supplemental.pdf

Neural Information Processing SystemsFeb-8-2026, 13:05:11 GMT

algorithm, prediction, secretary problem, (16 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Saarland (0.04)
North America > United States > Washington > King County > Seattle (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

36ac8e558ac7690b6f44e2cb5ef93322-Supplemental.pdf

Neural Information Processing SystemsFeb-8-2026, 02:04:52 GMT

In our comparativestudy, we choose 4 challenging benchmark datasets for feature selection evaluation.

artificial intelligence, dataset, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > Middle East > Jordan (0.04)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

SCALER: SAM-Enhanced Collaborative Learning for Label-Deficient Concealed Object Segmentation

He, Chunming, Zhang, Rihan, Tang, Longxiang, Yang, Ziyun, Li, Kai, Fan, Deng-Ping, Farsiu, Sina

arXiv.org Artificial IntelligenceNov-25-2025

Existing methods for label-deficient concealed object segmentation (LDCOS) either rely on consistency constraints or Segment Anything Model (SAM)-based pseudo-labeling. However, their performance remains limited due to the intrinsic concealment of targets and the scarcity of annotations. This study investigates two key questions: (1) Can consistency constraints and SAM-based supervision be jointly integrated to better exploit complementary information and enhance the segmenter? and (2) beyond that, can the segmenter in turn guide SAM through reciprocal supervision, enabling mutual improvement? To answer these questions, we present SCALER, a unified collaborative framework toward LDCOS that jointly optimizes a mean-teacher segmenter and a learnable SAM. SCALER operates in two alternating phases. In \textbf{Phase \uppercase\expandafter{\romannumeral1}}, the segmenter is optimized under fixed SAM supervision using entropy-based image-level and uncertainty-based pixel-level weighting to select reliable pseudo-label regions and emphasize harder examples. In \textbf{Phase \uppercase\expandafter{\romannumeral2}}, SAM is updated via augmentation invariance and noise resistance losses, leveraging its inherent robustness to perturbations. Experiments demonstrate that SCALER yields consistent performance gains across eight semi- and weakly-supervised COS tasks. The results further suggest that SCALER can serve as a general training paradigm to enhance both lightweight segmenters and large foundation models under label-scarce conditions. Code will be released.

artificial intelligence, machine learning, segmenter, (17 more...)

arXiv.org Artificial Intelligence

2511.18136

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Filters

Collaborating Authors

phase ii

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

SGD Provably Prioritizes a Shortcut Spurious Feature in the XOR Model

5f1cb1d23261b19cbd45f90f7b4f251f-Paper-Conference.pdf

21c426323068204f4199c490d730e88e-Paper-Conference.pdf

7058bc192a37f5e5a57398887b05f6f6-Paper-Conference.pdf

7016d7b7b6e3c05b2128ac5b3aae492d-Paper-Conference.pdf

a70145bf8b173e4496b554ce57969e24-Supplemental.pdf

Rank-1 Matrix Completion with Gradient Descent and Small Random Initialization

5a378f8490c8d6af8647a753812f6e31-Supplemental.pdf

36ac8e558ac7690b6f44e2cb5ef93322-Supplemental.pdf

SCALER: SAM-Enhanced Collaborative Learning for Label-Deficient Concealed Object Segmentation