AITopics | Arpit, Devansh

Collaborating Authors

Arpit, Devansh

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

On Optimality Conditions for Auto-Encoder Signal Recovery

Arpit, Devansh, Zhou, Yingbo, Ngo, Hung Q., Napp, Nils, Govindaraju, Venu

arXiv.org Machine LearningJul-13-2017

Auto-Encoders are unsupervised models that aim to learn patterns from observed data by minimizing a reconstruction cost. The useful representations learned are often found to be sparse and distributed. On the other hand, compressed sensing and sparse coding assume a data generating process, where the observed data is generated from some true latent signal source, and try to recover the corresponding signal from measurements. Looking at auto-encoders from this \textit{signal recovery perspective} enables us to have a more coherent view of these techniques. In this paper, in particular, we show that the \textit{true} hidden representation can be approximately recovered if the weight matrices are highly incoherent with unit $ \ell^{2} $ row length and the bias vectors takes the value (approximately) equal to the negative of the data mean. The recovery also becomes more and more accurate as the sparsity in hidden signals increases. Additionally, we empirically demonstrate that auto-encoders are capable of recovering the data generating dictionary when only data samples are given.

artificial intelligence, neural network, recovery, (16 more...)

arXiv.org Machine Learning

1605.07145

Country: North America > United States (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

A Closer Look at Memorization in Deep Networks

Arpit, Devansh, Jastrzębski, Stanisław, Ballas, Nicolas, Krueger, David, Bengio, Emmanuel, Kanwal, Maxinder S., Maharaj, Tegan, Fischer, Asja, Courville, Aaron, Bengio, Yoshua, Lacoste-Julien, Simon

arXiv.org Machine LearningJul-1-2017

We examine the role of memorization in deep learning, drawing connections to capacity, generalization, and adversarial robustness. While deep networks are capable of memorizing noise data, our results suggest that they tend to prioritize learning simple patterns first. In our experiments, we expose qualitative differences in gradient-based optimization of deep neural networks (DNNs) on noise vs. real data. We also demonstrate that for appropriately tuned explicit regularization (e.g., dropout) we can degrade DNN training performance on noise datasets without compromising generalization on real data. Our analysis suggests that the notions of effective capacity which are dataset independent are unlikely to explain the generalization performance of deep networks when trained with gradient based methods because training data itself plays an important role in determining the degree of memorization.

deep learning, neural network, real data, (17 more...)

arXiv.org Machine Learning

1706.05394

Country:

North America > United States (0.46)
North America > Canada > Quebec > Montreal (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.86)

Add feedback

Normalization Propagation: A Parametric Technique for Removing Internal Covariate Shift in Deep Networks

Arpit, Devansh, Zhou, Yingbo, Kota, Bhargava U., Govindaraju, Venu

arXiv.org Machine LearningJul-12-2016

While the authors of Batch Normalization (BN) identify and address an important problem involved in training deep networks-- Internal Covariate Shift-- the current solution has certain drawbacks. Specifically, BN depends on batch statistics for layerwise input normalization during training which makes the estimates of mean and standard deviation of input (distribution) to hidden layers inaccurate for validation due to shifting parameter values (especially during initial training epochs). Also, BN cannot be used with batch-size 1 during training. We address these drawbacks by proposing a non-adaptive normalization technique for removing internal covariate shift, that we call Normalization Propagation. Our approach does not depend on batch statistics, but rather uses a data-independent parametric estimate of mean and standard-deviation in every layer thus being computationally faster compared with BN. We exploit the observation that the pre-activation before Rectified Linear Units follow Gaussian distribution in deep networks, and that once the first and second order statistics of any given dataset are normalized, we can forward propagate this normalization without the need for recalculating the approximate statistics for hidden layers.

deep learning, neural network, normprop, (18 more...)

arXiv.org Machine Learning

1603.01431

Country:

North America > United States (0.14)
Europe > Spain (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Why Regularized Auto-Encoders learn Sparse Representation?

Arpit, Devansh, Zhou, Yingbo, Ngo, Hung, Govindaraju, Venu

arXiv.org Machine LearningJun-17-2016

While the authors of Batch Normalization (BN) identify and address an important problem involved in training deep networks-- \textit{Internal Covariate Shift}-- the current solution has certain drawbacks. For instance, BN depends on batch statistics for layerwise input normalization during training which makes the estimates of mean and standard deviation of input (distribution) to hidden layers inaccurate due to shifting parameter values (especially during initial training epochs). Another fundamental problem with BN is that it cannot be used with batch-size $ 1 $ during training. We address these drawbacks of BN by proposing a non-adaptive normalization technique for removing covariate shift, that we call \textit{Normalization Propagation}. Our approach does not depend on batch statistics, but rather uses a data-independent parametric estimate of mean and standard-deviation in every layer thus being computationally faster compared with BN. We exploit the observation that the pre-activation before Rectified Linear Units follow Gaussian distribution in deep networks, and that once the first and second order statistics of any given dataset are normalized, we can forward propagate this normalization without the need for recalculating the approximate statistics for hidden layers.

deep learning, neural network, sparsity, (13 more...)

arXiv.org Machine Learning

1505.05561

Country: North America > United States (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Dimensionality Reduction with Subspace Structure Preservation

Arpit, Devansh, Nwogu, Ifeoma, Govindaraju, Venu

arXiv.org Machine LearningApr-6-2016

Modeling data as being sampled from a union of independent subspaces has been widely applied to a number of real world applications. However, dimensionality reduction approaches that theoretically preserve this independence assumption have not been well studied. Our key contribution is to show that $2K$ projection vectors are sufficient for the independence preservation of any $K$ class data sampled from a union of independent subspaces. It is this non-trivial observation that we use for designing our dimensionality reduction technique. In this paper, we propose a novel dimensionality reduction algorithm that theoretically preserves this structure for a given dataset. We support our theoretical analysis with empirical results on both synthetic and real world data achieving \textit{state-of-the-art} results compared to popular dimensionality reduction techniques.

artificial intelligence, data mining, subspace, (18 more...)

arXiv.org Machine Learning

1412.2404

Country: North America > United States (0.29)

Genre: Research Report (0.40)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (1.00)

Add feedback

Dimensionality Reduction with Subspace Structure Preservation

Arpit, Devansh, Nwogu, Ifeoma, Govindaraju, Venu

Neural Information Processing SystemsDec-31-2014

artificial intelligence, data mining, subspace, (14 more...)

Neural Information Processing Systems

Country: North America > United States (0.14)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (1.00)

Add feedback

An Analysis of Random Projections in Cancelable Biometrics

Arpit, Devansh, Nwogu, Ifeoma, Srivastava, Gaurav, Govindaraju, Venu

arXiv.org Machine LearningNov-13-2014

With increasing concerns about security, the need for highly secure physical biometrics-based authentication systems utilizing \emph{cancelable biometric} technologies is on the rise. Because the problem of cancelable template generation deals with the trade-off between template security and matching performance, many state-of-the-art algorithms successful in generating high quality cancelable biometrics all have random projection as one of their early processing steps. This paper therefore presents a formal analysis of why random projections is an essential step in cancelable biometrics. By formally defining the notion of an \textit{Independent Subspace Structure} for datasets, it can be shown that random projection preserves the subspace structure of data vectors generated from a union of independent linear subspaces. The bound on the minimum number of random vectors required for this to hold is also derived and is shown to depend logarithmically on the number of data samples, not only in independent subspaces but in disjoint subspace settings as well. The theoretical analysis presented is supported in detail with empirical results on real-world face recognition datasets.

artificial intelligence, data mining, machine learning, (15 more...)

arXiv.org Machine Learning

1401.4489

Country: North America > United States > New York (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (0.96)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)

Add feedback