AITopics

2606.28432

Country: Asia > Azerbaijan (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Machine LearningJun-30-2026

Optimization Dynamics Imprint Semantic Specificity in Contrastive Embedding Norms

Su, Ziwei, Ren, Junyu, Veitch, Victor

Contrastive embedding models trained with scale-invariant losses are typically paired with distance metrics like cosine similarity, effectively ignoring embedding magnitudes. However, surprisingly, empirical studies reveal that despite this, these "discarded" norms seem to correlate with semantic properties such as concept specificity, token frequency, and human uncertainty. In this work, we provide a formal theoretical framework explaining this phenomenon. By analyzing the optimization dynamics, we derive an analytic formula demonstrating that embedding length naturally encodes this information as a byproduct of the training process. We also show how this gives rise to signals that can serve as "free" calibration tools in specific models and retrieval tasks, providing a grounded explanation for a previously heuristic observation.

large language model, machine learning, natural language, (18 more...)

2606.30625

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)

Neural Information Processing SystemsJun-23-2026, 00:44:31 GMT

Geometry-Aware Edge Pooling for Graph Neural Networks

Graph Neural Networks (GNNs) have shown significant success for graph-based tasks. Motivated by the prevalence of large datasets in real-world applications, pooling layers are crucial components of GNNs. By reducing the size of input graphs, pooling enables faster training and potentially better generalisation.

artificial intelligence, graph, machine learning, (17 more...)

Country:

North America (0.45)
Europe (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology (0.45)
Health & Medicine > Pharmaceuticals & Biotechnology (0.45)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Machine LearningJun-23-2026

Weibull Weight-Scale Parameter Evolution under AdamW Training Dynamics

Ding, Tiexin

Building on a two-parameter Weibull framework for diagnosing transformer weight distributions, we study why the Weibull weight-scale parameter $λ$ grows, overshoots, and then relaxes during AdamW training. We derive a leading-order three-force decomposition of the squared weight norm from the AdamW update: an alignment force measuring the correlation between weights and the adaptive update direction, an injection force from adaptive step magnitude, and a decay force from decoupled weight decay. On self-trained Pythia-70M models with ground-truth optimizer moments, alignment dominates the rise phase, contributing 88-94% of the absolute force budget across four random seeds and remaining robust to super-weight removal. Near saturation, alignment and decay approach balance, explaining the transition from weight-scale growth to relaxation. These force dynamics directly govern the squared-norm component underlying $λ(t)$; the remaining RMS-to-Weibull reconstruction offset is measurable and decomposes into bridge and integration components, totaling approximately 5-6% in densely sampled regions. To extend the analysis to real models where optimizer moments are unavailable, we introduce a spline displacement method that recovers the alignment force from sparse checkpoints with approximately 92-94% accuracy, about twice the naive two-point baseline. We further observe that the peak value of $λ(t)$ varies with training-data coherence in our experiments, suggesting a data-dependent component of weight-scale growth that we leave to a controlled follow-up study. Code and data are available at https://github.com/tiexinding/NPM-Weibull-public.

artificial intelligence, machine learning, trajectory, (15 more...)

2606.19367

Genre:

Research Report > New Finding (0.34)
Research Report > Experimental Study (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Shirodkar, Tejas Pradeep, Narayanan, P. J.

Dead-Direction Signatures: A Cheap Spectral Reading of Singular Complexity

arXiv.org Machine LearningJun-23-2026

Singular learning theory characterises the complexity of a deep network through the geometry of its loss singularities. The local learning coefficient (LLC), the standard estimator of Watanabe's real log canonical threshold (RLCT, $λ$), reads this geometry as an integrated Bayesian scalar through SGLD, which needs per-task calibration and $10^4$-$10^6$ forward-backward passes per checkpoint. We introduce Dead-Direction Signatures (DDS), a family of cheap closed-form spectral readings of singular structure: each reads a network's activation matrix or per-sample-gradient Fisher-Gram at a chosen layer, replacing the SGLD posterior chain with spectral linear algebra. The readings rest on a dead-direction framework that predicts a structural correlation between activation- and Fisher-side spectra at any singular minimum, and a rank-multiplicative volume identity that single-eigenvalue monitors cannot produce: the active-volume $\log\det^{+}(G)$ slope counts the dead directions, tracking the rank-deficit $r$ across $r \in \{1,2,3,4\}$ (slope ratios $2.0, 3.1, 4.0$ at $r{=}2,3,4$ against the predicted $2,3,4$), where the smallest eigenvalue is rank-blind. On reduced-rank regression with closed-form $λ$, calibrated LLC recovers $λ$ at $99\%$ mean and the DDS observables rank-track it at the framework-predicted sign; on a non-linear modular-addition transformer DDS separates $d_{\mathrm{model}}$ across eighteen orders of magnitude where calibrated LLC at the protocol budget is rank-flat. Complementary to LLC's integrated posterior reading, DDS gives a directional, layer-local handle on a network's dead directions, read in closed form from its activation and gradient spectra.

artificial intelligence, correlation, machine learning, (19 more...)

2606.21158

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)

Neural Information Processing SystemsJun-22-2026, 22:04:24 GMT

Pay Attention to Small Weights

Finetuning large pretrained neural networks is known to be resource-intensive, both in terms of memory and computational cost. To mitigate this, a common approach is to restrict training to a subset of the model parameters. By analyzing the relationship between gradients and weights during finetuning, we observe a notable pattern: large gradients are often associated with small-magnitude weights. This correlation is more pronounced in finetuning settings than in training from scratch. Motivated by this observation, we propose NANOADAM, which dynamically updates only the small-magnitude weights during finetuning and offers several practical advantages: first, the criterion is gradient-free--the parameter subset can be determined without gradient computation; second, it preserves large-magnitude weights, which are likely to encode critical features learned during pretraining, thereby reducing the risk of catastrophic forgetting; thirdly, it permits the use of larger learning rates and consistently leads to better generalization performance in experiments. We demonstrate this for both NLP and vision tasks.

large language model, machine learning, natural language, (20 more...)

Country: Europe (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)
Overview (0.67)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
(2 more...)

Neural Information Processing SystemsJun-22-2026, 18:50:55 GMT

AIProgress Should Be Measured by CapabilityPer-Resource, Not Scale Alone: AFramework for Gradient-Guided Resource Allocation in LLMs

This position paper challenges the "scaling fundamentalism" dominating AI research, where unbounded growth in model size and computation has led to unsustainable environmental impacts and widening resource inequality. We argue that LLM development should be fundamentally reoriented toward capability-perresource rather than capability alone. We present a theoretical framework demonstrating that resource-allocation decisions guided by gradient influence patterns can dramatically improve efficiency throughout the AI lifecycle. Our analysis shows that in transformer-based models, where a small fraction of parameters exert outsized influence (following heavy-tailed distributions), three critical insights emerge: (1) updating only high-influence parameters strictly outperforms full-parameter tuning on a performance-per-resource basis; (2) simple gradient norms provide computationally efficient proxies for identifying these high-influence components; and (3) coordinated parameter and data selection yields multiplicative efficiency gains, potentially reducing resource requirements by orders of magnitude. Building on these theoretical foundations, we propose a two-stage paradigm--marginalreturn pretraining for foundation developers and influence-guided adaptation for downstream users--bridged by gradient blueprints, metadata describing which parameters matter most for various tasks. This capability-per-resource perspective transforms what were once considered pragmatic hardware workarounds into theoretically optimal strategies, democratizing access to cutting-edge AI capabilities while significantly reducing environmental impact. By embedding resource consciousness into how we develop, adapt, and evaluate models, we can reshape AI progress toward a more sustainable and equitable future.

large language model, machine learning, natural language, (19 more...)

Industry: Law (0.55)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsJun-22-2026, 18:48:45 GMT

cb463f73a35802996546ac8e8b1b2743-Supplemental-Datasets_and_Benchmarks_Track.pdf

A.1 Behavioral Task A male nonhuman primate (NHP, Macaca mulatta), Monkey N (age 7 at the beginning of the dataset, age 11 at the end), was trained to perform a trial-based, two degree-of-freedom (DOF) dexterous finger movement task, shown in Figure 1. During all sessions, Monkey N sat in a primate chair (Crist Instruments, Hagerstown, MA) in a shielded chamber, with his arms fixed at his sides and flexed 90 degrees at the elbow, resting on a table. The left hand was positioned securely in a manipulandum, which used bend sensors (FS-L-0073-103-ST, Spectra Symbol, Salt Lake City, UT) to measure the flexion of two finger groups, index (IDX) and middle-ring-small (MRS). At the beginning of each experimental session (and as needed throughout a session), these flexion sensors were calibrated such that a reading of 1 indicated full flexion of a finger group and a reading of 0 indicated full extension. These readings were used to update the positions of the corresponding finger groups of a virtual hand presented on a screen in front of Monkey N. Bend sensor values were sampled at 1000 Hz. Updates to the virtual hand were limited to the refresh rate of the monitor (120 Hz). The task itself involved trial-based target acquisitions. At the beginning of each trial, two color-coded spherical targets, one for each DOF, were placed on the screen, covering 15% of the full arc of motion (see Figure 1A). Monkey N then acquired the targets by moving his fingers to the correct positions and holding his position for 750 ms.

artificial intelligence, deep learning, machine learning, (19 more...)

Country: North America > United States > Utah > Salt Lake County > Salt Lake City (0.25)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Vimalajeewa, Dixon, Lakmini, Vijini, Vidakovic, Brani

Machine Learning Integrated in Wavelet Shrinkage (MLShrink)

arXiv.org Machine LearningJun-19-2026

Data encountered in practice are frequently contaminated by additive noise, and wavelet shrinkage remains a fundamental tool for recovering underlying signals in nonparametric estimation. Classical procedures such as hard and soft thresholding decide whether to retain a wavelet coefficient almost entirely from its magnitude. Although effective in many settings, these rules can be too rigid for coefficients whose magnitudes fall in an intermediate region where the distinction between signal and noise is uncertain. We propose MLShrink, a two-threshold wavelet denoising procedure that combines wavelet shrinkage with machine learning. Coefficients below a lower threshold are discarded, coefficients above an upper threshold are retained, and coefficients in the intermediate band are classified using local wavelet-domain features. In this way, MLShrink preserves the simplicity of classical thresholding away from the decision boundary while allowing data-adaptive decisions for ambiguous coefficients. The paper also develops a theoretical framework tailored to this architecture. We show that MLShrink is a nonexpansive support-selection rule, derive an oracle-based risk decomposition showing that excess denoising risk is determined by classification errors on the undecided band, and establish an oracle-consistency result under suitable assumptions on classifier performance. Simulation experiments on standard benchmark signals indicate that MLShrink is competitive with several established wavelet shrinkage methods and is especially effective for signals with irregular, edge-rich, or non-smooth structure. These findings suggest that learned decisions on the intermediate threshold band provide a useful and interpretable connection between classical wavelet denoising and modern statistical learning.

artificial intelligence, coefficient, machine learning, (17 more...)

2606.1958

Country: North America > United States > Nebraska (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Neural Information Processing SystemsJun-17-2026, 10:23:37 GMT

56bdf726a96d43ee1e66172d14c63a61-Supplemental-Datasets_and_Benchmarks_Track.pdf

By leveraging neural rendering technologies based on NeRF and 3DGS, we create a wide array of realistic 3D scene representations and generate a multitude of synthesized 2D images from different perspectives. Moreover, through the combination of generative models with these advanced neural rendering methods, we generate highly sophisticated but fake images that incorporate combined artifacts. Unlike other existing datasets that largely focus on fake images generated by traditional generative models such as GANs or diffusion models, our NeuroRenderedFake dataset significantly extends the boundaries of a much-needed dataset for sophisticated fake image detection. This benchmark consists of over 2 million images, i.e., 512,972 authentic images and 1,653,881 highly sophisticated fake images. Therefore, it can serve as the largest collection of diverse images generated through advanced synthesis and neural rendering techniques. This work is expected to have a significant positive societal impact, particularly benefiting the forensic community and media outlets. Our method can enhance the accurate and timely identification of real-look-like but fake images that are often found in our mailboxes or social media platforms. The development of accurate techniques to detect these images is crucial for addressing concerns related to security, privacy, and preserving harmony within our community.

artificial intelligence, machine learning, natural language, (15 more...)

Country: Asia (0.67)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)