AITopics | Mishra, Bamdev

Collaborating Authors

Mishra, Bamdev

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Submodular Framework for Structured-Sparse Optimal Transport

Manupriya, Piyushi, Jawanpuria, Pratik, Gurumoorthy, Karthik S., Jagarlapudi, SakethaNath, Mishra, Bamdev

arXiv.org Artificial IntelligenceJun-7-2024

Unbalanced optimal transport (UOT) has recently gained much attention due to its flexible framework for handling un-normalized measures and its robustness properties. In this work, we explore learning (structured) sparse transport plans in the UOT setting, i.e., transport plans have an upper bound on the number of non-sparse entries in each column (structured sparse pattern) or in the whole plan (general sparse pattern). We propose novel sparsity-constrained UOT formulations building on the recently explored maximum mean discrepancy based UOT. We show that the proposed optimization problem is equivalent to the maximization of a weakly submodular function over a uniform matroid or a partition matroid. We develop efficient gradient-based discrete greedy algorithms and provide the corresponding theoretical guarantees. Empirically, we observe that our proposed greedy algorithms select a diverse support set and we illustrate the efficacy of the proposed approach in various applications.

constraint, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2406.04914

Country:

Asia (0.67)
Europe > Austria > Vienna (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Riemannian coordinate descent algorithms on matrix manifolds

Han, Andi, Jawanpuria, Pratik, Mishra, Bamdev

arXiv.org Machine LearningJun-4-2024

Many machine learning applications are naturally formulated as optimization problems on Riemannian manifolds. The main idea behind Riemannian optimization is to maintain the feasibility of the variables while moving along a descent direction on the manifold. This results in updating all the variables at every iteration. In this work, we provide a general framework for developing computationally efficient coordinate descent (CD) algorithms on matrix manifolds that allows updating only a few variables at every iteration while adhering to the manifold constraint. In particular, we propose CD algorithms for various manifolds such as Stiefel, Grassmann, (generalized) hyperbolic, symplectic, and symmetric positive (semi)definite. While the cost per iteration of the proposed CD algorithms is low, we further develop a more efficient variant via a first-order approximation of the objective function. We analyze their convergence and complexity, and empirically illustrate their efficacy in several applications.

machine learning, manifold, natural language, (17 more...)

arXiv.org Machine Learning

2406.02225

Country:

Asia (0.28)
Europe > Austria > Vienna (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining

Han, Andi, Li, Jiaxiang, Huang, Wei, Hong, Mingyi, Takeda, Akiko, Jawanpuria, Pratik, Mishra, Bamdev

arXiv.org Artificial IntelligenceJun-4-2024

Large language models (LLMs) have shown impressive capabilities across various tasks. However, training LLMs from scratch requires significant computational power and extensive memory capacity. Recent studies have explored low-rank structures on weights for efficient fine-tuning in terms of parameters and memory, either through low-rank adaptation or factorization. While effective for fine-tuning, low-rank structures are generally less suitable for pretraining because they restrict parameters to a low-dimensional subspace. In this work, we propose to parameterize the weights as a sum of low-rank and sparse matrices for pretraining, which we call SLTrain. The low-rank component is learned via matrix factorization, while for the sparse component, we employ a simple strategy of uniformly selecting the sparsity support at random and learning only the non-zero entries with the fixed support. While being simple, the random fixed-support sparse learning strategy significantly enhances pretraining when combined with low-rank learning. Our results show that SLTrain adds minimal extra parameters and memory costs compared to pretraining with low-rank parameterization, yet achieves substantially better performance, which is comparable to full-rank training. Remarkably, when combined with quantization and per-layer updates, SLTrain can reduce memory requirements by up to 73% when pretraining the LLaMA 7B model.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2406.02214

Country:

Asia (0.14)
North America > United States (0.14)

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Federated Learning on Riemannian Manifolds with Differential Privacy

Huang, Zhenwei, Huang, Wen, Jawanpuria, Pratik, Mishra, Bamdev

arXiv.org Artificial IntelligenceApr-15-2024

In recent years, federated learning (FL) has emerged as a prominent paradigm in distributed machine learning. Despite the partial safeguarding of agents' information within FL systems, a malicious adversary can potentially infer sensitive information through various means. In this paper, we propose a generic private FL framework defined on Riemannian manifolds (PriRFed) based on the differential privacy (DP) technique. We analyze the privacy guarantee while establishing the convergence properties. To the best of our knowledge, this is the first federated learning framework on Riemannian manifold with a privacy guarantee and convergence results. Numerical simulations are performed on synthetic and real-world datasets to showcase the efficacy of the proposed PriRFed approach.

artificial intelligence, gradf, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2404.10029

Country:

Asia (0.67)
North America > United States > New York > New York County > New York City (0.14)

Genre: Research Report (0.81)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)

Add feedback

A Gauss-Newton Approach for Min-Max Optimization in Generative Adversarial Networks

Mishra, Neel, Mishra, Bamdev, Jawanpuria, Pratik, Kumar, Pawan

arXiv.org Artificial IntelligenceApr-10-2024

A novel first-order method is proposed for training generative adversarial networks (GANs). It modifies the Gauss-Newton method to approximate the min-max Hessian and uses the Sherman-Morrison inversion formula to calculate the inverse. The method corresponds to a fixed-point method that ensures necessary contraction. To evaluate its effectiveness, numerical experiments are conducted on various datasets commonly used in image generation tasks, such as MNIST, Fashion MNIST, CIFAR10, FFHQ, and LSUN. Our method is capable of generating high-fidelity images with greater diversity across multiple datasets. It also achieves the highest inception score for CIFAR10 among all compared methods, including state-of-the-art second-order methods. Additionally, its execution time is comparable to that of first-order min-max methods.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2404.07172

Country:

Asia > India (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

A Framework for Bilevel Optimization on Riemannian Manifolds

Han, Andi, Mishra, Bamdev, Jawanpuria, Pratik, Takeda, Akiko

arXiv.org Artificial IntelligenceFeb-6-2024

Bilevel optimization has seen an increasing presence in various domains of applications. In this work, we propose a framework for solving bilevel optimization problems where variables of both lower and upper level problems are constrained on Riemannian manifolds. We provide several hypergradient estimation strategies on manifolds and study their estimation error. We provide convergence and complexity analysis for the proposed hypergradient descent algorithm on manifolds. We also extend the developments to stochastic bilevel optimization and to the use of general retraction. We showcase the utility of the proposed framework on various applications.

artificial intelligence, machine learning, optimization, (17 more...)

arXiv.org Artificial Intelligence

2402.03883

Country:

Asia (0.14)
Europe (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

Riemannian Hamiltonian methods for min-max optimization on manifolds

Han, Andi, Mishra, Bamdev, Jawanpuria, Pratik, Kumar, Pawan, Gao, Junbin

arXiv.org Artificial IntelligenceAug-24-2023

In this paper, we study min-max optimization problems on Riemannian manifolds. We introduce a Riemannian Hamiltonian function, minimization of which serves as a proxy for solving the original min-max problems. Under the Riemannian Polyak--{\L}ojasiewicz condition on the Hamiltonian function, its minimizer corresponds to the desired min-max saddle point. We also provide cases where this condition is satisfied. For geodesic-bilinear optimization in particular, solving the proxy problem leads to the correct search direction towards global optimality, which becomes challenging with the min-max formulation. To minimize the Hamiltonian function, we propose Riemannian Hamiltonian methods (RHM) and present their convergence analyses. We extend RHM to include consensus regularization and to the stochastic setting. We illustrate the efficacy of the proposed RHM in applications such as subspace robust Wasserstein distance, robust training of neural networks, and generative adversarial networks.

artificial intelligence, machine learning, manifold, (18 more...)

arXiv.org Artificial Intelligence

2204.11418

Country:

Asia (0.46)
Europe (0.27)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)

Add feedback

ProtoBandit: Efficient Prototype Selection via Multi-Armed Bandits

Chaudhuri, Arghya Roy, Jawanpuria, Pratik, Mishra, Bamdev

arXiv.org Artificial IntelligenceAug-23-2023

In this work, we propose a multi-armed bandit-based framework for identifying a compact set of informative data instances (i.e., the prototypes) from a source dataset $S$ that best represents a given target set $T$. Prototypical examples of a given dataset offer interpretable insights into the underlying data distribution and assist in example-based reasoning, thereby influencing every sphere of human decision-making. Current state-of-the-art prototype selection approaches require $O(|S||T|)$ similarity comparisons between source and target data points, which becomes prohibitively expensive for large-scale settings. We propose to mitigate this limitation by employing stochastic greedy search in the space of prototypical examples and multi-armed bandits for reducing the number of similarity comparisons. Our randomized algorithm, ProtoBandit, identifies a set of $k$ prototypes incurring $O(k^3|S|)$ similarity comparisons, which is independent of the size of the target set. An interesting outcome of our analysis is for the $k$-medoids clustering problem $T = S$ setting) in which we show that our algorithm ProtoBandit approximates the BUILD step solution of the partitioning around medoids (PAM) method in $O(k^3|S|)$ complexity. Empirically, we observe that ProtoBandit reduces the number of similarity computation calls by several orders of magnitudes ($100-1000$ times) while obtaining solutions similar in quality to those from state-of-the-art approaches.

artificial intelligence, data mining, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2210.0186

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Light-weight Deep Extreme Multilabel Classification

Mishra, Istasis, Dasgupta, Arpan, Jawanpuria, Pratik, Mishra, Bamdev, Kumar, Pawan

arXiv.org Artificial IntelligenceApr-20-2023

Extreme multi-label (XML) classification refers to the task of supervised multi-label learning that involves a large number of labels. Hence, scalability of the classifier with increasing label dimension is an important consideration. In this paper, we develop a method called LightDXML which modifies the recently developed deep learning based XML framework by using label embeddings instead of feature embedding for negative sampling and iterating cyclically through three major phases: (1) proxy training of label embeddings (2) shortlisting of labels for negative sampling and (3) final classifier training using the negative samples. Consequently, LightDXML also removes the requirement of a re-ranker module, thereby, leading to further savings on time and memory requirements. The proposed method achieves the best of both worlds: while the training time, model size and prediction times are on par or better compared to the tree-based methods, it attains much better prediction accuracy that is on par with the deep learning based methods. Moreover, the proposed approach achieves the best tail-label prediction accuracy over most state-of-the-art XML methods on some of the large datasets\footnote{accepted in IJCNN 2023, partial funding from MAPG grant and IIIT Seed grant at IIIT, Hyderabad, India. Code: \url{https://github.com/misterpawan/LightDXML}

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2304.11045

Country: Asia > India > Telangana > Hyderabad (0.24)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Generalised Spherical Text Embedding

Banerjee, Souvik, Mishra, Bamdev, Jawanpuria, Pratik, Shrivastava, Manish

arXiv.org Artificial IntelligenceNov-30-2022

This paper aims to provide an unsupervised modelling approach that allows for a more flexible representation of text embeddings. It jointly encodes the words and the paragraphs as individual matrices of arbitrary column dimension with unit Frobenius norm. The representation is also linguistically motivated with the introduction of a novel similarity metric. The proposed modelling and the novel similarity metric exploits the matrix structure of embeddings. We then go on to show that the same matrices can be reshaped into vectors of unit norm and transform our problem into an optimization problem over the spherical manifold. We exploit manifold optimization to efficiently train the matrix embeddings. We also quantitatively verify the quality of our text embeddings by showing that they demonstrate improved results in document classification, document clustering, and semantic textual similarity benchmark tests.

machine learning, natural language, text classification, (19 more...)

arXiv.org Artificial Intelligence

2211.16801

Country:

North America > United States (0.47)
Europe (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.69)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.49)
(2 more...)

Add feedback