AITopics

2511.11592

Country:

Europe > Sweden > Stockholm > Stockholm (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
North America > Puerto Rico > San Juan > San Juan (0.04)
(5 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.84)

arXiv.org Machine LearningJun-25-2025

Rare dense solutions clusters in asymmetric binary perceptrons -- local entropy via fully lifted RDT

Stojnic, Mihailo

We study classical asymmetric binary perceptron (ABP) and associated \emph{local entropy} (LE) as potential source of its algorithmic hardness. Isolation of \emph{typical} ABP solutions in SAT phase seemingly suggests a universal algorithmic hardness. Paradoxically, efficient algorithms do exist even for constraint densities $α$ fairly close but at a finite distance (\emph{computational gap}) from the capacity. In recent years, existence of rare large dense clusters and magical ability of fast algorithms to find them have been posited as the conceptual resolution of this paradox. Monotonicity or breakdown of the LEs associated with such \emph{atypical} clusters are predicated to play a key role in their thinning-out or even complete defragmentation. Invention of fully lifted random duality theory (fl RDT) [90,93,94] allows studying random structures \emph{typical} features. A large deviation upgrade, sfl LD RDT [96,97], moves things further and enables \emph{atypical} features characterizations as well. Utilizing the machinery of [96,97] we here develop a generic framework to study LE as an ABP's atypical feature. Already on the second level of lifting we discover that the LE results are closely matching those obtained through replica methods. For classical zero threshold ABP, we obtain that LE breaks down for $α$ in $(0.77,0.78)$ interval which basically matches $α\sim 0.75-0.77$ range that currently best ABP solvers can handle and effectively indicates that LE's behavior might indeed be among key reflections of the ABP's computational gaps presumable existence.

artificial intelligence, machine learning, perceptron, (17 more...)

2506.19276

Country:

North America > United States > Colorado > Denver County > Denver (0.04)
Africa > Sudan (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(12 more...)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.63)

Someya, Taiga, Svete, Anej, DuSell, Brian, O'Donnell, Timothy J., Giulianelli, Mario, Cotterell, Ryan

Information Locality as an Inductive Bias for Neural Language Models

arXiv.org Artificial IntelligenceJun-6-2025

Inductive biases are inherent in every machine learning system, shaping how models generalize from finite data. In the case of neural language models (LMs), debates persist as to whether these biases align with or diverge from human processing constraints. To address this issue, we propose a quantitative framework that allows for controlled investigations into the nature of these biases. Within our framework, we introduce $m$-local entropy$\unicode{x2013}$an information-theoretic measure derived from average lossy-context surprisal$\unicode{x2013}$that captures the local uncertainty of a language by quantifying how effectively the $m-1$ preceding symbols disambiguate the next symbol. In experiments on both perturbed natural language corpora and languages defined by probabilistic finite-state automata (PFSAs), we show that languages with higher $m$-local entropy are more difficult for Transformer and LSTM LMs to learn. These results suggest that neural LMs, much like humans, are highly sensitive to the local statistical structure of a language.

artificial intelligence, machine learning, natural language, (19 more...)

2506.05136

Country:

North America > United States (1.00)
Europe (0.93)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceNov-12-2024

Zero-Shot NAS via the Suppression of Local Entropy Decrease

Wu, Ning, Huang, Han, Xu, Yueting, Hao, Zhifeng

Architecture performance evaluation is the most time-consuming part of neural architecture search (NAS). Zero-Shot NAS accelerates the evaluation by utilizing zero-cost proxies instead of training. Though effective, existing zero-cost proxies require invoking backpropagations or running networks on input data, making it difficult to further accelerate the computation of proxies. To alleviate this issue, architecture topologies are used to evaluate the performance of networks in this study. We prove that particular architectural topologies decrease the local entropy of feature maps, which degrades specific features to a bias, thereby reducing network performance. Based on this proof, architectural topologies are utilized to quantify the suppression of local entropy decrease (SED) as a data-free and running-free proxy. Experimental results show that SED outperforms most state-of-the-art proxies in terms of architecture selection on five benchmarks, with computation time reduced by three orders of magnitude. We further compare the SED-based NAS with state-of-the-art proxies. SED-based NAS selects the architecture with higher accuracy and fewer parameters in only one second. The theoretical analyses of local entropy and experimental results demonstrate that the suppression of local entropy decrease facilitates selecting optimal architectures in Zero-Shot NAS.

architecture, entropy, proxy, (16 more...)

2411.06236

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Asia > China > Guangdong Province > Shantou (0.04)
Africa > Rwanda > Kigali > Kigali (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Orsingher, Marco, Dell'Eva, Anthony, Zani, Paolo, Medici, Paolo, Bertozzi, Massimo

Informative Rays Selection for Few-Shot Neural Radiance Fields

arXiv.org Artificial IntelligenceDec-29-2023

Neural Radiance Fields (NeRF) have recently emerged as a powerful method for image-based 3D reconstruction, but the lengthy per-scene optimization limits their practical usage, especially in resource-constrained settings. Existing approaches solve this issue by reducing the number of input views and regularizing the learned volumetric representation with either complex losses or additional inputs from other modalities. In this paper, we present KeyNeRF, a simple yet effective method for training NeRF in few-shot scenarios by focusing on key informative rays. Such rays are first selected at camera level by a view selection algorithm that promotes baseline diversity while guaranteeing scene coverage, then at pixel level by sampling from a probability distribution based on local image entropy. Our approach performs favorably against state-of-the-art methods, while requiring minimal changes to existing NeRF codebases.

nerf, ray, reconstruction, (12 more...)

2312.17561

Country:

Europe > Italy > Emilia-Romagna > Metropolitan City of Bologna > Bologna (0.04)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Machine LearningOct-9-2023

Entropy-MCMC: Sampling from Flat Basins with Ease

Li, Bolian, Zhang, Ruqi

Bayesian deep learning counts on the quality of posterior distribution estimation. However, the posterior of deep neural networks is highly multi-modal in nature, with local modes exhibiting varying generalization performance. Given a practical budget, sampling from the original posterior can lead to suboptimal performance, as some samples may become trapped in "bad" modes and suffer from overfitting. Leveraging the observation that "good" modes with low generalization error often reside in flat basins of the energy landscape, we propose to bias sampling on the posterior toward these flat regions. Specifically, we introduce an auxiliary guiding variable, the stationary distribution of which resembles a smoothed posterior free from sharp modes, to lead the MCMC sampler to flat basins. By integrating this guiding variable with the model parameter, we create a simple joint distribution that enables efficient sampling with minimal computational overhead. We prove the convergence of our method and further show that it converges faster than several existing flatness-aware methods in the strongly convex setting. Empirical results demonstrate that our method can successfully sample from flat basins of the posterior, and outperforms all compared baselines on multiple benchmarks including classification, calibration, and out-of-distribution detection.

artificial intelligence, flat basin, machine learning, (16 more...)

2310.05401

Country:

North America > United States (0.14)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceSep-17-2023

High-dimensional manifold of solutions in neural networks: insights from statistical physics

Malatesta, Enrico M.

In these pedagogic notes I review the statistical mechanics approach to neural networks, focusing on the paradigmatic example of the perceptron architecture with binary an continuous weights, in the classification setting. I will review the Gardner's approach based on replica method and the derivation of the SAT/UNSAT transition in the storage setting. Then, I discuss some recent works that unveiled how the zero training error configurations are geometrically arranged, and how this arrangement changes as the size of the training set increases. I also illustrate how different regions of solution space can be explored analytically and how the landscape in the vicinity of a solution can be characterized. I give evidence how, in binary weight models, algorithmic hardness is a consequence of the disappearance of a clustered region of solutions that extends to very large distances. Finally, I demonstrate how the study of linear mode connectivity between solutions can give insights into the average shape of the solution manifold.

entropy, local entropy, perceptron, (17 more...)

2309.0924

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Italy > Lombardy > Milan (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)

Baldassi, Carlo, Malatesta, Enrico M., Perugini, Gabriele, Zecchina, Riccardo

Typical and atypical solutions in non-convex neural networks with discrete and continuous weights

arXiv.org Artificial IntelligenceJul-24-2023

We study the binary and continuous negative-margin perceptrons as simple non-convex neural network models learning random rules and associations. We analyze the geometry of the landscape of solutions in both models and find important similarities and differences. Both models exhibit subdominant minimizers which are extremely flat and wide. These minimizers coexist with a background of dominant solutions which are composed by an exponential number of algorithmically inaccessible small clusters for the binary case (the frozen 1-RSB phase) or a hierarchical structure of clusters of different sizes for the spherical case (the full RSB phase). In both cases, when a certain threshold in constraint density is crossed, the local entropy of the wide flat minima becomes non-monotonic, indicating a break-up of the space of robust solutions into disconnected components. This has a strong impact on the behavior of algorithms in binary models, which cannot access the remaining isolated clusters. For the spherical case the behaviour is different, since even beyond the disappearance of the wide flat minima the remaining solutions are shown to always be surrounded by a large number of other solutions at any distance, up to capacity. Indeed, we exhibit numerical evidence that algorithms seem to find solutions up to the SAT/UNSAT transition, that we compute here using an 1RSB approximation. For both models, the generalization performance as a learning device is shown to be greatly improved by the existence of wide flat minimizers even when trained in the highly underconstrained regime of very negative margins.

configuration, entropy, transition, (15 more...)

2304.13871

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Italy > Lombardy > Milan (0.04)
Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)

Genre: Research Report (0.64)

Industry: Energy (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.36)

arXiv.org Artificial IntelligenceMar-28-2023

Learning to Generalize Provably in Learning to Optimize

Yang, Junjie, Chen, Tianlong, Zhu, Mingkang, He, Fengxiang, Tao, Dacheng, Liang, Yingbin, Wang, Zhangyang

Learning to optimize (L2O) has gained increasing popularity, which automates the design of optimizers by data-driven approaches. However, current L2O methods often suffer from poor generalization performance in at least two folds: (i) applying the L2O-learned optimizer to unseen optimizees, in terms of lowering their loss function values (optimizer generalization, or ``generalizable learning of optimizers"); and (ii) the test performance of an optimizee (itself as a machine learning model), trained by the optimizer, in terms of the accuracy over unseen data (optimizee generalization, or ``learning to generalize"). While the optimizer generalization has been recently studied, the optimizee generalization (or learning to generalize) has not been rigorously studied in the L2O context, which is the aim of this paper. We first theoretically establish an implicit connection between the local entropy and the Hessian, and hence unify their roles in the handcrafted design of generalizable optimizers as equivalent metrics of the landscape flatness of loss functions. We then propose to incorporate these two metrics as flatness-aware regularizers into the L2O framework in order to meta-train optimizers to learn to generalize, and theoretically show that such generalization ability can be learned during the L2O meta-training process and then transformed to the optimizee loss function. Extensive experiments consistently validate the effectiveness of our proposals with substantially improved generalization on multiple sophisticated L2O models and diverse optimizees. Our code is available at: https://github.com/VITA-Group/Open-L2O/tree/main/Model_Free_L2O/L2O-Entropy.

artificial intelligence, machine learning, optimization problem, (18 more...)

2302.11085

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Ohio (0.04)
Europe > Spain > Valencian Community > Valencia Province > Valencia (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Pittorino, Fabrizio, Lucibello, Carlo, Feinauer, Christoph, Perugini, Gabriele, Baldassi, Carlo, Demyanenko, Elizaveta, Zecchina, Riccardo

Entropic gradient descent algorithms and wide flat minima

arXiv.org Machine LearningOct-7-2020

The properties of flat minima in the empirical risk landscape of neural networks have been debated for some time. Increasing evidence suggests they possess better generalization capabilities with respect to sharp ones. First, we discuss Gaussian mixture classification models and show analytically that there exist Bayes optimal pointwise estimators which correspond to minimizers belonging to wide flat regions. These estimators can be found by applying maximum flatness algorithms either directly on the classifier (which is norm independent) or on the differentiable loss function used in learning. Next, we extend the analysis to the deep learning scenario by extensive numerical validations. Using two algorithms, Entropy-SGD and Replicated-SGD, that explicitly include in the optimization objective a non-local flatness measure known as local entropy, we consistently improve the generalization error for common architectures (e.g. ResNet, EfficientNet). An easy to compute flatness measure shows a clear correlation with test accuracy.

algorithm, artificial intelligence, machine learning, (17 more...)

2006.07897

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Italy > Piedmont > Turin Province > Turin (0.04)
Europe > Italy > Lombardy > Milan (0.04)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)