AITopics

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.94)

Neural Information Processing SystemsJun-16-2026, 16:35:05 GMT

Dimensionality Mismatch Between Brains and Artificial Neural Networks

Biological and artificial vision systems both rely on hierarchical architectures, yet it remains unclear how their representational geometry evolves across processing stages, and what functional consequences may arise from potential differences. In this work, we systematically quantify and compare the linear and nonlinear dimensionality of human brain activity (fMRI) and artificial neural networks (ANNs) during natural image viewing. In the human ventral visual stream, both dimensionality measures increase along the visual hierarchy, supporting the emergence of semantic and abstract representations. For linear dimensionality, most ANNs show a similar increase, but only for pooled features, emphasizing the importance of appropriate feature readouts in brain-model comparisons. In contrast, nonlinear dimensionality shows a collapse in the later layers of ANNs, pointing at a mismatch in representational geometry between the human and artificial visual systems. This mismatch may have functional consequences: while high-dimensional brain representations support flexible generalization to abstract features, ANNs appear to lose this capacity in later layers, where their representations become overly compressed. Overall, our findings propose dimensionality alignment as a benchmark for building more flexible and biologically grounded vision models.

artificial intelligence, dimensionality, machine learning, (18 more...)

Country: North America (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science > Neuroscience (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Ramachandran, Rahul Uma, Massar, Serge

Information Processing Capacity of Stationary Physical Systems: Theory, Data-efficient Estimation Methods, and Photonic Demonstration

arXiv.org Machine LearningMay-22-2026

Physical computing systems provide a promising route toward hardware-native machine learning, but their computational capabilities remain difficult to characterize in a principled, task-independent, and data-efficient way. We extend the Information Processing Capacity (IPC) framework to stationary physical computing systems and establish several fundamental results: individual capacities are bounded between zero and one, their sum over a complete basis is bounded by the number of readouts, and noise strictly reduces this bound. We address the finite-sample estimation of IPC and derive the asymptotic form of the systematic positive bias affecting naive estimators. Building on these results, we introduce data-efficient estimation methods based on Richardson extrapolation and Sobol quasi-random sampling. We validate the framework experimentally using a photonic computing system based on picosecond laser pulses propagating through a nonlinear optical fibre. By varying the laser power and fibre length, we observe systematic shifts of the IPC distribution toward higher-order nonlinear capacities induced by the Kerr effect. Finally, we demonstrate that the total IPC strongly correlates with performance on benchmark machine-learning tasks and provides a reliable estimate of the effective dimensionality of the system. These results establish IPC as a practical bridge between the intrinsic dynamics of physical computing systems and their machine-learning performance.

artificial intelligence, machine learning, stationary system, (16 more...)

2605.19152

Country: Europe (0.67)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Neural Information Processing SystemsFeb-15-2026, 07:43:17 GMT

61c00c07e6d27285e4b952e96cc65666-Paper-Conference.pdf

However, in practice, new reconstruction methods could improve performance for at least three other reasons: learning more about the distribution of stimuli, becoming better at reconstructing text or images in general, or exploiting weaknesses in current image and/or text evaluation metrics. Here we disentangle how much of the reconstruction is due to these other factors vs. productively using the neural recordings.

artificial intelligence, machine learning, natural language, (19 more...)

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Middle East (0.04)
(3 more...)

Genre: Research Report > Promising Solution (0.68)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.95)
Education (0.68)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Gulati, Paarth, Abdelaleem, Eslam, Sederberg, Audrey, Nemenman, Ilya

Mutual information and task-relevant latent dimensionality

arXiv.org Machine LearningFeb-10-2026

Estimating the dimensionality of the latent representation needed for prediction-- the task-relevant dimension--is a difficult, largely unsolved problem with broad scientific applications. We cast it as an Information Bottleneck question: what embedding bottleneck dimension is sufficient to compress predictor and predicted views while preserving their mutual information (MI). We show that standard neural estimators with separable/bilinear critics systematically inflate the inferred dimension, and we address this by introducing a hybrid critic that retains an explicit dimensional bottleneck while allowing flexible nonlinear cross-view interactions, thereby preserving the latent geometry. We further propose a one-shot protocol that reads off the effective dimension from a single over-parameterized hybrid model, without sweeping over bottleneck sizes. We validate the approach on synthetic problems with known task-relevant dimension. We extend the approach to intrinsic dimensionality by constructing paired views of a single dataset, enabling comparison with classical geometric dimension estimators. In noisy regimes where those estimators degrade, our approach remains reliable. Finally, we demonstrate the utility of the method on multiple physics datasets. Before "low-dimensional latent embeddings" became a rallying cry of AI, they were already a basic aim of science: identify a low-dimensional state--a small set of degrees of freedom constructed from observations--that suffices to predict the quantities of interest. The long road from Aristotelian to Newtonian mechanics illustrates that determining the number of such state variables--the relevant latent dimensionality--can be hard, even before one argues about the right variables or the laws that relate them.

artificial intelligence, dimensionality, machine learning, (16 more...)

2602.08105

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Data Science (0.93)

arXiv.org Machine LearningFeb-3-2026

Over-Alignment vs Over-Fitting: The Role of Feature Learning Strength in Generalization

Yeom, Taesun, Ha, Taehyeok, Lee, Jaeho

Feature learning strength (FLS), i.e., the inverse of the effective output scaling of a model, plays a critical role in shaping the optimization dynamics of neural nets. While its impact has been extensively studied under the asymptotic regimes -- both in training time and FLS -- existing theory offers limited insight into how FLS affects generalization in practical settings, such as when training is stopped upon reaching a target training risk. In this work, we investigate the impact of FLS on generalization in deep networks under such practical conditions. Through empirical studies, we first uncover the emergence of an $\textit{optimal FLS}$ -- neither too small nor too large -- that yields substantial generalization gains. This finding runs counter to the prevailing intuition that stronger feature learning universally improves generalization. To explain this phenomenon, we develop a theoretical analysis of gradient flow dynamics in two-layer ReLU nets trained with logistic loss, where FLS is controlled via initialization scale. Our main theoretical result establishes the existence of an optimal FLS arising from a trade-off between two competing effects: An excessively large FLS induces an $\textit{over-alignment}$ phenomenon that degrades generalization, while an overly small FLS leads to $\textit{over-fitting}$.

artificial intelligence, machine learning, over-alignment vs over-fitting, (14 more...)

2602.00827

Country:

North America > United States > Illinois (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > South Korea > Gyeongsangbuk-do > Pohang (0.04)

Genre: Research Report > New Finding (0.92)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Zacharopoulos, Christos-Nikolaos, Kyriakoglou, Revekka

In Machina N400: Pinpointing Where a Causal Language Model Detects Semantic Violations

arXiv.org Artificial IntelligenceNov-25-2025

How and where does a transformer notice that a sentence has gone semantically off the rails? To explore this question, we evaluated the causal language model (phi-2) using a carefully curated corpus, with sentences that concluded plausibly or implausibly. Our analysis focused on the hidden states sampled at each model layer. To investigate how violations are encoded, we utilized two complementary probes. First, we conducted a per-layer detection using a linear probe. Our findings revealed that a simple linear decoder struggled to distinguish between plausible and implausible endings in the lowest third of the model's layers. However, its accuracy sharply increased in the middle blocks, reaching a peak just before the top layers. Second, we examined the effective dimensionality of the encoded violation. Initially, the violation widens the representational subspace, followed by a collapse after a mid-stack bottleneck. This might indicate an exploratory phase that transitions into rapid consolidation. Taken together, these results contemplate the idea of alignment with classical psycholinguistic findings in human reading, where semantic anomalies are detected only after syntactic resolution, occurring later in the online processing sequence.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2511.19232

Genre:

Research Report > New Finding (0.50)
Research Report > Experimental Study (0.30)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)

Neural Information Processing SystemsOct-10-2025, 04:19:58 GMT

61c00c07e6d27285e4b952e96cc65666-Paper-Conference.pdf

bottleneck, reconstruction, reconstruction method, (15 more...)

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Middle East (0.04)
(3 more...)

Genre: Research Report > Promising Solution (0.68)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.95)
Education (0.68)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Budnikov, Mikhail, Yamshchikov, Ivan

Transfer of Structural Knowledge from Synthetic Languages

arXiv.org Artificial IntelligenceMay-22-2025

This work explores transfer learning from several synthetic languages to English. We investigate the structure of the embeddings in the fine-tuned models, the information they contain, and the capabilities of the fine-tuned models on simple linguistic tasks. We also introduce a new synthetic language that leads to better transfer to English than the languages used in previous research. Finally, we introduce Tiny-Cloze Benchmark - a new synthetic benchmark for natural language understanding that is more informative for less powerful models. We use Tiny-Cloze Benchmark to evaluate fine-tuned models in several domains demonstrating that fine-tuning on a new synthetic language allows for better performance on a variety of tasks.

large language model, machine learning, synthetic language, (17 more...)

arXiv.org Artificial Intelligence

2505.15769

Country: Europe > Germany (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

arXiv.org Machine LearningMar-3-2025

Deep Learning is Not So Mysterious or Different

Wilson, Andrew Gordon

Deep neural networks are often seen as different from other model classes by defying conventional notions of generalization. Popular examples of anomalous generalization behaviour include benign overfitting, double descent, and the success of overparametrization. We argue that these phenomena are not distinct to neural networks, or particularly mysterious. Moreover, this generalization behaviour can be intuitively understood, and rigorously characterized using long-standing generalization frameworks such as PAC-Bayes and countable hypothesis bounds. We present soft inductive biases as a key unifying principle in explaining these phenomena: rather than restricting the hypothesis space to avoid overfitting, embrace a flexible hypothesis space, with a soft preference for simpler solutions that are consistent with the data. This principle can be encoded in many model classes, and thus deep learning is not as mysterious or different from other model classes as it might seem. However, we also highlight how deep learning is relatively distinct in other ways, such as its ability for representation learning, phenomena such as mode connectivity, and its relative universality.

generalization, learning, neural network, (12 more...)

2503.02113

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)