Goto

Collaborating Authors

 Kumar, Pawan


Adaptive Consensus Optimization Method for GANs

arXiv.org Artificial Intelligence

We propose a second order gradient based method with ADAM and RMSprop for the training of generative adversarial networks. The proposed method is fastest to obtain similar accuracy when compared to prominent second order methods. Unlike state-of-the-art recent methods, it does not require solving a linear system, or it does not require additional mixed second derivative terms. We derive the fixed point iteration corresponding to proposed method, and show that the proposed method is convergent. The proposed method produces better or comparable inception scores, and comparable quality of images compared to other recently proposed state-of-the-art second order methods. Compared to first order methods such as ADAM, it produces significantly better inception scores. The proposed method is compared and validated on popular datasets such as FFHQ, LSUN, CIFAR10, MNIST, and Fashion MNIST for image generation tasks\footnote{Accepted in IJCNN 2023}. Codes: \url{https://github.com/misterpawan/acom}


Angle based dynamic learning rate for gradient descent

arXiv.org Artificial Intelligence

In our work, we propose a novel yet simple approach to obtain an adaptive learning rate for gradient-based descent methods on classification tasks. Instead of the traditional approach of selecting adaptive learning rates via the decayed expectation of gradient-based terms, we use the angle between the current gradient and the new gradient: this new gradient is computed from the direction orthogonal to the current gradient, which further helps us in determining a better adaptive learning rate based on angle history, thereby, leading to relatively better accuracy compared to the existing state-of-the-art optimizers. On a wide variety of benchmark datasets with prominent image classification architectures such as ResNet, DenseNet, EfficientNet, and VGG, we find that our method leads to the highest accuracy in most of the datasets. Moreover, we prove that our method is convergent.


Effects of Spectral Normalization in Multi-agent Reinforcement Learning

arXiv.org Artificial Intelligence

A reliable critic is central to on-policy actor-critic learning. But it becomes challenging to learn a reliable critic in a multi-agent sparse reward scenario due to two factors: 1) The joint action space grows exponentially with the number of agents 2) This, combined with the reward sparseness and environment noise, leads to large sample requirements for accurate learning. We show that regularising the critic with spectral normalization (SN) enables it to learn more robustly, even in multi-agent on-policy sparse reward scenarios. Our experiments show that the regularised critic is quickly able to learn from the sparse rewarding experience in the complex SMAC and RWARE domains. These findings highlight the importance of regularisation in the critic for stable learning.


Light-weight Deep Extreme Multilabel Classification

arXiv.org Artificial Intelligence

Extreme multi-label (XML) classification refers to the task of supervised multi-label learning that involves a large number of labels. Hence, scalability of the classifier with increasing label dimension is an important consideration. In this paper, we develop a method called LightDXML which modifies the recently developed deep learning based XML framework by using label embeddings instead of feature embedding for negative sampling and iterating cyclically through three major phases: (1) proxy training of label embeddings (2) shortlisting of labels for negative sampling and (3) final classifier training using the negative samples. Consequently, LightDXML also removes the requirement of a re-ranker module, thereby, leading to further savings on time and memory requirements. The proposed method achieves the best of both worlds: while the training time, model size and prediction times are on par or better compared to the tree-based methods, it attains much better prediction accuracy that is on par with the deep learning based methods. Moreover, the proposed approach achieves the best tail-label prediction accuracy over most state-of-the-art XML methods on some of the large datasets\footnote{accepted in IJCNN 2023, partial funding from MAPG grant and IIIT Seed grant at IIIT, Hyderabad, India. Code: \url{https://github.com/misterpawan/LightDXML}


Review of Extreme Multilabel Classification

arXiv.org Artificial Intelligence

Extreme multilabel classification or XML, is an active area of interest in machine learning. Compared to traditional multilabel classification, here the number of labels is extremely large, hence, the name extreme multilabel classification. Using classical one versus all classification wont scale in this case due to large number of labels, same is true for any other classifiers. Embedding of labels as well as features into smaller label space is an essential first step. Moreover, other issues include existence of head and tail labels, where tail labels are labels which exist in relatively smaller number of given samples. The existence of tail labels creates issues during embedding. This area has invited application of wide range of approaches ranging from bit compression motivated from compressed sensing, tree based embeddings, deep learning based latent space embedding including using attention weights, linear algebra based embeddings such as SVD, clustering, hashing, to name a few. The community has come up with a useful set of metrics to identify correctly the prediction for head or tail labels.


SynGraphy: Succinct Summarisation of Large Networks via Small Synthetic Representative Graphs

arXiv.org Artificial Intelligence

We describe SynGraphy, a method for visually summarising the structure of large network datasets that works by drawing smaller graphs generated to have similar structural properties to the input graphs. Visualising complex networks is crucial to understand and make sense of networked data and the relationships it represents. Due to the large size of many networks, visualisation is extremely difficult; the simple method of drawing large networks like those of Facebook or Twitter leads to graphics that convey little or no information. While modern graph layout algorithms can scale computationally to large networks, their output tends to a common "hairball" look, which makes it difficult to even distinguish different graphs from each other. Graph sampling and graph coarsening techniques partially address these limitations but they are only able to preserve a subset of the properties of the original graphs. In this paper we take the problem of visualising large graphs from a novel perspective: we leave the original graph's nodes and edges behind, and instead summarise its properties such as the clustering coefficient and bipartivity by generating a completely new graph whose structural properties match that of the original graph. To verify the utility of this approach as compared to other graph visualisation algorithms, we perform an experimental evaluation in which we repeatedly asked experimental subjects (professionals in graph mining and related areas) to determine which of two given graphs has a given structural property and then assess which visualisation algorithm helped in identifying the correct answer. Our summarisation approach SynGraphy compares favourably to other techniques on a variety of networks.


Qualitative Data Augmentation for Performance Prediction in VLSI circuits

arXiv.org Artificial Intelligence

Various studies have shown the advantages of using Machine Learning (ML) techniques for analog and digital IC design automation and optimization. Data scarcity is still an issue for electronic designs, while training highly accurate ML models. This work proposes generating and evaluating artificial data using generative adversarial networks (GANs) for circuit data to aid and improve the accuracy of ML models trained with a small training data set. The training data is obtained by various simulations in the Cadence Virtuoso, HSPICE, and Microcap design environment with TSMC 180nm and 22nm CMOS technology nodes. The artificial data is generated and tested for an appropriate set of analog and digital circuits. The experimental results show that the proposed artificial data generation significantly improves ML models and reduces the percentage error by more than 50\% of the original percentage error, which were previously trained with insufficient data. Furthermore, this research aims to contribute to the extensive application of AI/ML in the field of VLSI design and technology by relieving the training data availability-related challenges.


DXML: Distributed Extreme Multilabel Classification

arXiv.org Artificial Intelligence

As a big data application, extreme multilabel classification has emerged as an important research topic with applications in ranking and recommendation of products and items. A scalable hybrid distributed and shared memory implementation of extreme classification for large scale ranking and recommendation is proposed. In particular, the implementation is a mix of message passing using MPI across nodes and using multithreading on the nodes using OpenMP. The expression for communication latency and communication volume is derived. Parallelism using work-span model is derived for shared memory architecture. This throws light on the expected scalability of similar extreme classification methods. Experiments show that the implementation is relatively faster to train and test on some large datasets. In some cases, model size is relatively small.


Optimizing Non-decomposable Measures with Deep Networks

arXiv.org Machine Learning

We present a class of algorithms capable of directly training deep neural networks with respect to large families of task-specific performance measures such as the F-measure and the Kullback-Leibler divergence that are structured and non-decomposable. This presents a departure from standard deep learning techniques that typically use squared or cross-entropy loss functions (that are decomposable) to train neural networks. We demonstrate that directly training with task-specific loss functions yields much faster and more stable convergence across problems and datasets. Our proposed algorithms and implementations have several novel features including (i) convergence to first order stationary points despite optimizing complex objective functions; (ii) use of fewer training samples to achieve a desired level of convergence, (iii) a substantial reduction in training time, and (iv) a seamless integration of our implementation into existing symbolic gradient frameworks. We implement our techniques on a variety of deep architectures including multi-layer perceptrons and recurrent neural networks and show that on a variety of benchmark and real data sets, our algorithms outperform traditional approaches to training deep networks, as well as some recent approaches to task-specific training of neural networks.