AITopics | Patel, Niket

Collaborating Authors

Patel, Niket

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Layer by Layer: Uncovering Hidden Representations in Language Models

Skean, Oscar, Arefin, Md Rifat, Zhao, Dan, Patel, Niket, Naghiyev, Jalal, LeCun, Yann, Shwartz-Ziv, Ravid

arXiv.org Artificial IntelligenceFeb-4-2025

From extracting features to generating text, the outputs of large language models (LLMs) typically rely on their final layers, following the conventional wisdom that earlier layers capture only low-level cues. However, our analysis shows that intermediate layers can encode even richer representations, often improving performance on a wide range of downstream tasks. To explain and quantify these hidden-layer properties, we propose a unified framework of representation quality metrics based on information theory, geometry, and invariance to input perturbations. Our framework highlights how each model layer balances information compression and signal preservation, revealing why mid-depth embeddings can exceed the last layer's performance. Through extensive experiments on 32 text-embedding tasks and comparisons across model architectures (transformers, state-space models) and domains (language, vision), we demonstrate that intermediate layers consistently provide stronger features. These findings challenge the standard focus on final-layer embeddings and open new directions for model analysis and optimization, including strategic use of mid-layer representations for more robust and accurate AI systems.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2502.02013

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.93)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

On the Local Complexity of Linear Regions in Deep ReLU Networks

Patel, Niket, Montúfar, Guido

arXiv.org Artificial IntelligenceDec-24-2024

We define the local complexity of a neural network with continuous piecewise linear activations as a measure of the density of linear regions over an input data distribution. We show theoretically that ReLU networks that learn low-dimensional feature representations have a lower local complexity. This allows us to connect recent empirical observations on feature learning at the level of the weight matrices with concrete properties of the learned functions. In particular, we show that the local complexity serves as an upper bound on the total variation of the function over the input data distribution and thus that feature learning can be related to adversarial robustness. Lastly, we consider how optimization drives ReLU networks towards solutions with lower local complexity. Overall, this work contributes a theoretical framework towards relating geometric properties of ReLU networks to different aspects of learning such as feature learning and representation cost. Despite the numerous achievements of deep learning, many of the mechanisms by which deep neural networks learn and generalize remain unclear. An "Occam's Razor" style heuristic is that we want our neural network to parameterize a simple solution after training, but it can be challenging to establish a useful metric of the complexity of a deep neural network (Hu et al., 2021). A growing body of research has sought to gain insights into the complexity of deep neural networks in the case where we use piece-wise linear activation functions, such as ReLU, LeakyReLU, or Maxout. In this work we aim to advance a theoretical framework towards better understanding the local distribution of linear regions near the data distribution and how it relates to other relevant aspects of learning such as robustness and representation learning. In the kernel regime, neural networks with piecewise linear activations are observed to follow lazy training (Chizat et al., 2019) and bias towards smooth interpolants which do not significantly change the structure of linear regions during training (see, e.g., Williams et al., 2019; Jin & Montúfar, 2023).

artificial intelligence, deep learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2412.18283

Country: North America > United States > California (0.28)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Learning to Compress: Local Rank and Information Compression in Deep Neural Networks

Patel, Niket, Shwartz-Ziv, Ravid

arXiv.org Artificial IntelligenceOct-10-2024

Deep neural networks tend to exhibit a bias toward low-rank solutions during training, implicitly learning low-dimensional feature representations. This paper investigates how deep multilayer perceptrons (MLPs) encode these feature manifolds and connects this behavior to the Information Bottleneck (IB) theory. We introduce the concept of local rank as a measure of feature manifold dimensionality and demonstrate, both theoretically and empirically, that this rank decreases during the final phase of training. We argue that networks that reduce the rank of their learned representations also compress mutual information between inputs and intermediate layers.

artificial intelligence, machine learning, survey article, (15 more...)

arXiv.org Artificial Intelligence

2410.07687

Country: North America > United States > California (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)

Add feedback

Bridging the Gap: Addressing Discrepancies in Diffusion Model Training for Classifier-Free Guidance

Patel, Niket, Salamanca, Luis, Barba, Luis

arXiv.org Artificial IntelligenceNov-1-2023

Diffusion models have emerged as a pivotal advancement in generative models, setting new standards to the quality of the generated instances. In the current paper we aim to underscore a discrepancy between conventional training methods and the desired conditional sampling behavior of these models. While the prevalent classifier-free guidance technique works well, it's not without flaws. At higher values for the guidance scale parameter $w$, we often get out of distribution samples and mode collapse, whereas at lower values for $w$ we may not get the desired specificity. To address these challenges, we introduce an updated loss function that better aligns training objectives with sampling behaviors. Experimental validation with FID scores on CIFAR-10 elucidates our method's ability to produce higher quality samples with fewer sampling timesteps, and be more robust to the choice of guidance scale $w$. We also experiment with fine-tuning Stable Diffusion on the proposed loss, to provide early evidence that large diffusion models may also benefit from this refined loss function.

artificial intelligence, diffusion model, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2311.00938

Country: North America > United States > California (0.29)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.37)

Add feedback