AITopics | Rangamani, Akshay

Collaborating Authors

Rangamani, Akshay

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Low Rank and Sparse Fourier Structure in Recurrent Networks Trained on Modular Addition

Rangamani, Akshay

arXiv.org Machine LearningMar-27-2025

Low Rank and Sparse Fourier Structure in Recurrent Networks Trained on Modular Addition Akshay Rangamani Dept. of Data Science New Jersey Institute of T echnology Newark, NJ, USA akshay.rangamani@njit.edu Abstract --Modular addition tasks serve as a useful test bed for observing empirical phenomena in deep learning, including the phenomenon of grokking. Prior work has shown that one-layer transformer architectures learn Fourier Multiplication circuits to solve modular addition tasks. In this paper, we show that Recurrent Neural Networks (RNNs) trained on modular addition tasks also use a Fourier Multiplication strategy. We identify low rank structures in the model weights, and attribute model components to specific Fourier frequencies, resulting in a sparse representation in the Fourier space. We also show empirically that the RNN is robust to removing individual frequencies, while the performance degrades drastically as more frequencies are ablated from the model.

artificial intelligence, frequency, machine learning, (14 more...)

arXiv.org Machine Learning

2503.22059

Country: North America > United States > New Jersey > Essex County > Newark (0.24)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

On Generalization Bounds for Neural Networks with Low Rank Layers

Pinto, Andrea, Rangamani, Akshay, Poggio, Tomaso

arXiv.org Machine LearningNov-20-2024

Deep learning has achieved remarkable success across a wide range of applications, including computer vision[2, 3], natural language processing [4, 5], decision-making in novel environments [6], and code generation [7], among others. Understanding the reasons behind the effectiveness of deep learning is a multifaceted challenge that involves questions about architectural choices, optimizer selection, and the types of inductive biases that can guarantee generalization. A long-standing question in this field is how deep learning finds solutions that generalize well. While good generalization performance by overparameterized models is not unique to deep learning--it can be explained by the implicit bias of learning algorithms towards low-norm solutions in linear models and kernel machines [8, 9]--the case of deep learning presents additional challenges. However in the case of deep learning, identifying the right implicit bias and obtaining generalization bounds that depend on this bias are still open questions. In recent years, Rademacher bounds have been developed to explain the complexity control induced by an important bias in deep network training: the minimization of weight matrix norms. This minimization occurs due to explicit or implicit regularization [10, 11, 12, 13]. For rather general network architectures, Golowich et al.[14] showed that the Rademacher complexity is linear in the product of the Frobenius norms of the various layers. Although the associated bounds are usually orders of magnitude larger than the generalization gap for dense networks, very recent results by Galanti et al. [15] demonstrate that for networks with structural sparsity in their weight matrices, such as convolutional networks, norm-based Rademacher bounds approach non-vacuity.

artificial intelligence, complexity, machine learning, (18 more...)

arXiv.org Machine Learning

2411.13733

Country:

North America (0.46)
Europe (0.28)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Neural-guided, Bidirectional Program Search for Abstraction and Reasoning

Alford, Simon, Gandhi, Anshula, Rangamani, Akshay, Banburski, Andrzej, Wang, Tony, Dandekar, Sylee, Chin, John, Poggio, Tomaso, Chin, Peter

arXiv.org Artificial IntelligenceOct-26-2021

One of the challenges facing artificial intelligence research today is designing systems capable of utilizing systematic reasoning to generalize to new tasks. The Abstraction and Reasoning Corpus (ARC) measures such a capability through a set of visual reasoning tasks. In this paper we report incremental progress on ARC and lay the foundations for two approaches to abstraction and reasoning not based in brute-force search. We first apply an existing program synthesis system called DreamCoder to create symbolic abstractions out of tasks solved so far, and show how it enables solving of progressively more challenging ARC tasks. Second, we design a reasoning algorithm motivated by the way humans approach ARC. Our algorithm constructs a search graph and reasons over this graph structure to discover task solutions. More specifically, we extend existing execution-guided program synthesis approaches with deductive reasoning based on function inverse semantics to enable a neural-guided bidirectional search algorithm. We demonstrate the effectiveness of the algorithm on three domains: ARC, 24-Game tasks, and a 'double-and-add' arithmetic puzzle.

artificial intelligence, machine learning, neural network, (21 more...)

arXiv.org Artificial Intelligence

2110.11536

Country:

North America > United States > Massachusetts (0.28)
North America > United States > New York > New York County > New York City (0.14)

Genre: Research Report (0.40)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)

Add feedback

For interpolating kernel machines, minimizing the norm of the ERM solution minimizes stability

Rangamani, Akshay, Rosasco, Lorenzo, Poggio, Tomaso

arXiv.org Machine LearningOct-11-2020

We study the average $\mbox{CV}_{loo}$ stability of kernel ridge-less regression and derive corresponding risk bounds. We show that the interpolating solution with minimum norm minimizes a bound on $\mbox{CV}_{loo}$ stability, which in turn is controlled by the condition number of the empirical kernel matrix. The latter can be characterized in the asymptotic regime where both the dimension and cardinality of the data go to infinity. Under the assumption of random kernel matrices, the corresponding test error should be expected to follow a double descent curve.

artificial intelligence, machine learning, stability, (14 more...)

arXiv.org Machine Learning

2006.15522

Country: North America > United States (0.68)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

A Scale Invariant Flatness Measure for Deep Network Minima

Rangamani, Akshay, Nguyen, Nam H., Kumar, Abhishek, Phan, Dzung, Chin, Sang H., Tran, Trac D.

arXiv.org Machine LearningFeb-6-2019

It has been empirically observed that the flatness of minima obtained from training deep networks seems to correlate with better generalization. However, for deep networks with positively homogeneous activations, most measures of sharpness/flatness are not invariant to rescaling of the network parameters, corresponding to the same function. This means that the measure of flatness/sharpness can be made as small or as large as possible through rescaling, rendering the quantitative measures meaningless. In this paper we show that for deep networks with positively homogenous activations, these rescalings constitute equivalence relations, and that these equivalence relations induce a quotient manifold structure in the parameter space. Using this manifold structure and an appropriate metric, we propose a Hessian-based measure for flatness that is invariant to rescaling. We use this new measure to confirm the proposition that Large-Batch SGD minima are indeed sharper than Small-Batch SGD minima.

artificial intelligence, minima, neural network, (16 more...)

arXiv.org Machine Learning

1902.02434

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Automated software vulnerability detection with machine learning

Harer, Jacob A., Kim, Louis Y., Russell, Rebecca L., Ozdemir, Onur, Kosta, Leonard R., Rangamani, Akshay, Hamilton, Lei H., Centeno, Gabriel I., Key, Jonathan R., Ellingwood, Paul M., McConley, Marc W., Opper, Jeffrey M., Chin, Peter, Lazovich, Tomo

arXiv.org Machine LearningFeb-14-2018

Thousands of security vulnerabilities are discovered in production software each year, either reported publicly to the Common Vulnerabilities and Exposures database or discovered internally in proprietary code. Vulnerabilities often manifest themselves in subtle ways that are not obvious to code reviewers or the developers themselves. With the wealth of open source code available for analysis, there is an opportunity to learn the patterns of bugs that can lead to security vulnerabilities directly from data. In this paper, we present a data-driven approach to vulnerability detection using machine learning, specifically applied to C and C++ programs. We first compile a large dataset of hundreds of thousands of open-source functions labeled with the outputs of a static analyzer. We then compare methods applied directly to source code with methods applied to artifacts extracted from the build process, finding that source-based models perform better. We also compare the application of deep neural network models with more traditional models such as random forests and find the best performance comes from combining features learned by deep models with tree-based models. Ultimately, our highest performing model achieves an area under the precision-recall curve of 0.49 and an area under the ROC curve of 0.87.

dataset, deep learning, neural network, (19 more...)

arXiv.org Machine Learning

1803.04497

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Sparse Coding and Autoencoders

Rangamani, Akshay, Mukherjee, Anirbit, Basu, Amitabh, Ganapathy, Tejaswini, Arora, Ashish, Chin, Sang, Tran, Trac D.

arXiv.org Machine LearningOct-20-2017

In "Dictionary Learning" one tries to recover incoherent matrices $A^* \in \mathbb{R}^{n \times h}$ (typically overcomplete and whose columns are assumed to be normalized) and sparse vectors $x^* \in \mathbb{R}^h$ with a small support of size $h^p$ for some $0

autoencoder, deep learning, neural network, (18 more...)

arXiv.org Machine Learning

1708.03735

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback