AITopics | Patel, Ankit

Collaborating Authors

Patel, Ankit

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The Spectral Bias of Shallow Neural Network Learning is Shaped by the Choice of Non-linearity

Sahs, Justin, Pyle, Ryan, Anselmi, Fabio, Patel, Ankit

arXiv.org Artificial IntelligenceMar-13-2025

Despite classical statistical theory predicting severe overfitting, modern massively overparameterized neural networks still generalize well. This unexpected property is attributed to the network's so-called implicit bias, which describes its propensity to converge to solutions that generalize effectively, among the many possible that correctly label the training data. The aim of our research is to explore this bias from a new perspective, focusing on how non-linear activation functions contribute to shaping it. First, we introduce a reparameterization which removes a continuous weight rescaling symmetry. Second, in the kernel regime, we leverage this reparameterization to generalize recent findings that relate shallow Neural Networks to the Radon transform, deriving an explicit formula for the implicit bias induced by a broad class of activation functions. Specifically, by utilizing the connection between the Radon transform and the Fourier transform, we interpret the kernel regime's inductive bias as minimizing a spectral seminorm that penalizes high-frequency components, in a manner dependent on the activation function. Finally, in the adaptive regime, we demonstrate the existence of local dynamical attractors that facilitate the formation of clusters of hyperplanes where the input to a neuron's activation function is zero, yielding alignment between many neurons' response functions. We confirm these theoretical results with simulations. All together, our work provides a deeper understanding of the mechanisms underlying the generalization capabilities of overparameterized neural networks and its relation with the implicit bias, offering potential pathways for designing more efficient and robust models.

activation function, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2503.10587

Country: North America > United States (0.14)

Genre: Research Report (0.40)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Navigating protein landscapes with a machine-learned transferable coarse-grained model

Charron, Nicholas E., Musil, Felix, Guljas, Andrea, Chen, Yaoyi, Bonneau, Klara, Pasos-Trejo, Aldo S., Venturin, Jacopo, Gusew, Daria, Zaporozhets, Iryna, Krämer, Andreas, Templeton, Clark, Kelkar, Atharva, Durumeric, Aleksander E. P., Olsson, Simon, Pérez, Adrià, Majewski, Maciej, Husic, Brooke E., Patel, Ankit, De Fabritiis, Gianni, Noé, Frank, Clementi, Cecilia

arXiv.org Machine LearningOct-27-2023

The most popular and universally predictive protein simulation models employ all-atom molecular dynamics (MD), but they come at extreme computational cost. The development of a universal, computationally efficient coarse-grained (CG) model with similar prediction performance has been a long-standing challenge. By combining recent deep learning methods with a large and diverse training set of all-atom protein simulations, we here develop a bottom-up CG force field with chemical transferability, which can be used for extrapolative molecular dynamics on new sequences not used during model parametrization. We demonstrate that the model successfully predicts folded structures, intermediates, metastable folded and unfolded basins, and the fluctuations of intrinsically disordered proteins while it is several orders of magnitude faster than an all-atom model. This showcases the feasibility of a universal and computationally efficient machine-learned CG model for proteins.

artificial intelligence, machine learning, simulation, (19 more...)

arXiv.org Machine Learning

2310.18278

Country:

Europe (0.68)
North America > United States (0.47)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Shallow Univariate ReLu Networks as Splines: Initialization, Loss Surface, Hessian, & Gradient Flow Dynamics

Sahs, Justin, Pyle, Ryan, Damaraju, Aneel, Caro, Josue Ortega, Tavaslioglu, Onur, Lu, Andy, Patel, Ankit

arXiv.org Machine LearningAug-4-2020

Understanding the learning dynamics and inductive bias of neural networks (NNs) is hindered by the opacity of the relationship between NN parameters and the function represented. We propose reparametrizing ReLU NNs as continuous piecewise linear splines. Using this spline lens, we study learning dynamics in shallow univariate ReLU NNs, finding unexpected insights and explanations for several perplexing phenomena. We develop a surprisingly simple and transparent view of the structure of the loss surface, including its critical and fixed points, Hessian, and Hessian spectrum. We also show that standard weight initializations yield very flat functions, and that this flatness, together with overparametrization and the initial weight scale, is responsible for the strength and type of implicit regularization, consistent with recent work arXiv:1906.05827. Our implicit regularization results are complementary to recent work arXiv:1906.07842, done independently, which showed that initialization scale critically controls implicit regularization via a kernel-based argument. Our spline-based approach reproduces their key implicit regularization results but in a far more intuitive and transparent manner. Going forward, our spline-based approach is likely to extend naturally to the multivariate and deep settings, and will play a foundational role in efforts to understand neural networks. Videos of learning dynamics using a spline-based visualization are available at http://shorturl.at/tFWZ2.

deep learning, initialization, neural network, (18 more...)

arXiv.org Machine Learning

2008.01772

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Using Learning Dynamics to Explore the Role of Implicit Regularization in Adversarial Examples

Caro, Josue Ortega, Ju, Yilong, Pyle, Ryan, Patel, Ankit

arXiv.org Machine LearningJun-19-2020

Recent work (Ilyas et al, 2019) suggests that adversarial examples are features not bugs. If adversarial perturbations are indeed useful but non-robust features, then what is their origin? In order to answer these questions, we systematically examine the learning dynamics of adversarial perturbations both in the pixel and frequency domains. We find that: (1) adversarial examples are not present at initialization but instead emerge very early in training, typically within the first epochs, as verified by a novel breakpoint-based analysis; (2) the low-amplitude high-frequency nature of common adversarial perturbations in natural images is critically dependent on an implicit bias towards sparsity in the frequency domain; and (3) the origin of this bias is the locality and translation invariance of convolutional filters, along with (4) the existence of useful frequency-domain features in natural images. We provide a simple theoretical explanation for these observations, providing a clear and minimalist target for theorists in future work. Looking forward, our findings suggest that analyzing the learning dynamics of perturbations can provide useful insights for understanding the origin of adversarial sensitivities and developing robust solutions.

deep learning, neural network, perturbation, (16 more...)

arXiv.org Machine Learning

2006.1144

Genre: Research Report > New Finding (1.00)

Industry: Government (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Neural Rendering Model: Joint Generation and Prediction for Semi-Supervised Learning

Ho, Nhat, Nguyen, Tan, Patel, Ankit, Anandkumar, Anima, Jordan, Michael I., Baraniuk, Richard G.

arXiv.org Artificial IntelligenceOct-31-2018

Unsupervised and semi-supervised learning are important problems that are especially challenging with complex data like natural images. Progress on these problems would accelerate if we had access to appropriate generative models under which to pose the associated inference tasks. Inspired by the success of Convolutional Neural Networks (CNNs) for supervised prediction in images, we design the Neural Rendering Model (NRM), a new probabilistic generative model whose inference calculations correspond to those in a given CNN architecture. The NRM uses the given CNN to design the prior distribution in the probabilistic model. Furthermore, the NRM generates images from coarse to finer scales. It introduces a small set of latent variables at each level, and enforces dependencies among all the latent variables via a conjugate prior distribution. This conjugate prior yields a new regularizer based on paths rendered in the generative model for training CNNs-the Rendering Path Normalization (RPN). We demonstrate that this regularizer improves generalization, both in theory and in practice. In addition, likelihood estimation in the NRM yields training losses for CNNs, and inspired by this, we design a new loss termed as the Max-Min cross entropy which outperforms the traditional cross-entropy loss for object classification. The Max-Min cross entropy suggests a new deep network architecture, namely the Max-Min network, which can learn from less labeled data while maintaining good prediction performance. Our experiments demonstrate that the NRM with the RPN and the Max-Min architecture exceeds or matches the-state-of-art on benchmarks including SVHN, CIFAR10, and CIFAR100 for semi-supervised and supervised learning tasks.

deep learning, neural network, nrm, (21 more...)

arXiv.org Artificial Intelligence

1811.02657

Country: North America > United States (0.45)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

JR-GAN: Jacobian Regularization for Generative Adversarial Networks

Nie, Weili, Patel, Ankit

arXiv.org Machine LearningJun-24-2018

Generative adversarial networks (GANs) are notoriously difficult to train and the reasons for their (non-)convergence behaviors are still not completely understood. Using a simple GAN example, we mathematically analyze the local convergence behavior of its training dynamics in a non-asymptotic way. We find that in order to ensure a good convergence rate two factors of the Jacobian should be \textit{simultaneously} avoided, which are (1) Phase Factor: the Jacobian has complex eigenvalues with a large imaginary-to-real ratio, (2) Conditioning Factor: the Jacobian is ill-conditioned. Previous methods of regularizing the Jacobian can only alleviate one of these two factors, while making the other more severe. From our theoretical analysis, we propose the Jacobian Regularized GANs (JR-GANs), which insure the two factors are alleviated by construction. With extensive experiments on several popular datasets, we show that the JR-GAN training is highly stable and achieves near state-of-the-art results both qualitatively and quantitatively.

artificial intelligence, jacobian, neural network, (17 more...)

arXiv.org Machine Learning

1806.09235

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

A Theoretical Explanation for Perplexing Behaviors of Backpropagation-based Visualizations

Nie, Weili, Zhang, Yang, Patel, Ankit

arXiv.org Artificial IntelligenceMay-17-2018

Backpropagation-based visualizations have been proposed to interpret convolutional neural networks (CNNs), however a theory is missing to justify their behaviors: Guided backpropagation(GBP) and deconvolutional network (DeconvNet) generate more human-interpretable but less class-sensitive visualizations than saliency map. Motivated by this, we develop a theoretical explanation revealing that GBP and DeconvNet are essentially doing (partial) image recovery and thus are unrelated to the network decisions. Specifically, our analysis shows that the backward ReLU introduced by GBP and DeconvNet, and the local connections in CNNs are the two main causes of compelling visualizations. Extensive experiments are provided that support the theoretical analysis.

artificial intelligence, neural network, visualization, (17 more...)

arXiv.org Artificial Intelligence

1805.07039

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (0.86)

Add feedback

Overcomplete Frame Thresholding for Acoustic Scene Analysis

Cosentino, Romain, Balestriero, Randall, Baraniuk, Richard, Patel, Ankit

arXiv.org Machine LearningDec-25-2017

In this work, we derive a generic overcomplete frame thresholding scheme based on risk minimization. Overcomplete frames being favored for analysis tasks such as classification, regression or anomaly detection, we provide a way to leverage those optimal representations in real-world applications through the use of thresholding. We validate the method on a large scale bird activity detection task via the scattering network architecture performed by means of continuous wavelets, known for being an adequate dictionary in audio environments.

artificial intelligence, upstream oil & gas, wavelet, (20 more...)

arXiv.org Machine Learning

1712.09117

Country: North America > United States (0.46)

Genre: Research Report (0.82)

Industry: Energy > Oil & Gas > Upstream (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Quality (0.70)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.54)

Add feedback