Goto

Collaborating Authors

 Karnataka


Solving 2-D Helmholtz equation in the rectangular, circular, and elliptical domains using neural networks

arXiv.org Artificial Intelligence

Physics-informed neural networks offered an alternate way to solve several differential equations that govern complicated physics. However, their success in predicting the acoustic field is limited by the vanishing-gradient problem that occurs when solving the Helmholtz equation. In this paper, a formulation is presented that addresses this difficulty. The problem of solving the two-dimensional Helmholtz equation with the prescribed boundary conditions is posed as an unconstrained optimization problem using trial solution method. According to this method, a trial neural network that satisfies the given boundary conditions prior to the training process is constructed using the technique of transfinite interpolation and the theory of R-functions. This ansatz is initially applied to the rectangular domain and later extended to the circular and elliptical domains. The acoustic field predicted from the proposed formulation is compared with that obtained from the two-dimensional finite element methods. Good agreement is observed in all three domains considered. Minor limitations associated with the proposed formulation and their remedies are also discussed.


Near-Optimal Lower Bounds For Convex Optimization For All Orders of Smoothness Robin Kothari Microsoft Research India Microsoft Quantum and Microsoft Research Bengaluru, KA, India

Neural Information Processing Systems

We study the complexity of optimizing highly smooth convex functions. For a positive integer p, we want to find an ɛ-approximate minimum of a convex function f, given oracle access to the function and its first p derivatives, assuming that the pth derivative of f is Lipschitz. Recently, three independent research groups (Jiang et al., PLMR 2019; Gasnikov et al., PLMR 2019; Bubeck et al., PLMR 2019) developed a new algorithm that



Efficient and Effective Augmentation Strategy for Adversarial Training Samyak Jain R.Venkatesh Babu Video Analytics Lab, Indian Institute of Science, Bangalore

Neural Information Processing Systems

Adversarial training of Deep Neural Networks is known to be significantly more data-hungry when compared to standard training. Furthermore, complex data augmentations such as AutoAugment, which have led to substantial gains in standard training of image classifiers, have not been successful with Adversarial Training. We first explain this contrasting behavior by viewing augmentation during training as a problem of domain generalization, and further propose Diverse Augmentationbased Joint Adversarial Training (DAJAT) to use data augmentations effectively in adversarial training. We aim to handle the conflicting goals of enhancing the diversity of the training dataset and training with data that is close to the test distribution by using a combination of simple and complex augmentations with separate batch normalization layers during training. We further utilize the popular Jensen-Shannon divergence loss to encourage the joint learning of the diverse augmentations, thereby allowing simple augmentations to guide the learning of complex ones. Lastly, to improve the computational efficiency of the proposed method, we propose and utilize a two-step defense, Ascending Constraint Adversarial Training (ACAT), that uses an increasing epsilon schedule and weight-space smoothing to prevent gradient masking. The proposed method DAJAT achieves substantially better robustness-accuracy trade-off when compared to existing methods on the RobustBench Leaderboard on ResNet-18 and WideResNet-34-10. The code for implementing DAJAT is available here: https://github.com/val-iisc/DAJAT.


Leveraging Retrieval Augmented Generative LLMs For Automated Metadata Description Generation to Enhance Data Catalogs

arXiv.org Artificial Intelligence

Data catalogs serve as repositories for organizing and accessing diverse collection of data assets, but their effectiveness hinges on the ease with which business users can look-up relevant content. Unfortunately, many data catalogs within organizations suffer from limited searchability due to inadequate metadata like asset descriptions. Hence, there is a need of content generation solution to enrich and curate metadata in a scalable way. This paper explores the challenges associated with metadata creation and proposes a unique prompt enrichment idea of leveraging existing metadata content using retrieval based fewshot technique tied with generative large language models (LLM). The literature also considers finetuning an LLM on existing content and studies the behavior of few-shot pretrained LLM (Llama, GPT3.5) vis-à-vis few-shot finetuned LLM (Llama2-7b) by evaluating their performance based on accuracy, factual grounding, and toxicity. Our preliminary results exhibit more than 80% Rouge-1 F1 for the generated content. This implied 87%- 88% of instances accepted as is or curated with minor edits by data stewards. By automatically generating descriptions for tables and columns in most accurate way, the research attempts to provide an overall framework for enterprises to effectively scale metadata curation and enrich its data catalog thereby vastly improving the data catalog searchability and overall usability. NTRODUCTION In the modern digital ecosystem, locating relevant data has become increasingly challenging due to the rapid expansion of data assets.


Understanding the role of autoencoders for stiff dynamical systems using information theory

arXiv.org Artificial Intelligence

Using the information theory, this study provides insights into how the construction of latent space of autoencoder (AE) using deep neural network (DNN) training finds a smooth low-dimensional manifold in the stiff dynamical system. Our recent study [1] reported that an autoencoder (AE) combined with neural ODE (NODE) as a surrogate reduced order model (ROM) for the integration of stiff chemically reacting systems led to a significant reduction in the temporal stiffness, and the behavior was attributed to the identification of a slow invariant manifold by the nonlinear projection of the AE. The present work offers fundamental understanding of the mechanism by employing concepts from information theory and better mixing. The learning mechanism of both the encoder and decoder are explained by plotting the evolution of mutual information and identifying two different phases. Subsequently, the density distribution is plotted for the physical and latent variables, which shows the transformation of the \emph{rare event} in the physical space to a \emph{highly likely} (more probable) event in the latent space provided by the nonlinear autoencoder. Finally, the nonlinear transformation leading to density redistribution is explained using concepts from information theory and probability.


Enhanced $A^{*}$ Algorithm for Mobile Robot Path Planning with Non-Holonomic Constraints

arXiv.org Artificial Intelligence

In this paper, a novel method for path planning of mobile robots is proposed, taking into account the non-holonomic turn radius constraints and finite dimensions of the robot. The approach involves rasterizing the environment to generate a 2D map and utilizes an enhanced version of the $A^{*}$ algorithm that incorporates non-holonomic constraints while ensuring collision avoidance. Two new instantiations of the $A^{*}$ algorithm are introduced and tested across various scenarios and environments, with results demonstrating the effectiveness of the proposed method.


CRUPL: A Semi-Supervised Cyber Attack Detection with Consistency Regularization and Uncertainty-aware Pseudo-Labeling in Smart Grid

arXiv.org Artificial Intelligence

The modern power grids are integrated with digital technologies and automation systems. The inclusion of digital technologies has made the smart grids vulnerable to cyber-attacks. Cyberattacks on smart grids can compromise data integrity and jeopardize the reliability of the power supply. Traditional intrusion detection systems often need help to effectively detect novel and sophisticated attacks due to their reliance on labeled training data, which may only encompass part of the spectrum of potential threats. This work proposes a semi-supervised method for cyber-attack detection in smart grids by leveraging the labeled and unlabeled measurement data. We implement consistency regularization and pseudo-labeling to identify deviations from expected behavior and predict the attack classes. We use a curriculum learning approach to improve pseudo-labeling performance, capturing the model uncertainty. We demonstrate the efficiency of the proposed method in detecting different types of cyberattacks, minimizing the false positives by implementing them on publicly available datasets. The method proposes a promising solution by improving the detection accuracy to 99% in the presence of unknown samples and significantly reducing false positives.


Multispectral to Hyperspectral using Pretrained Foundational model

arXiv.org Artificial Intelligence

Multispectral to Hyperspectral using Pretrained Foundational model Ruben Gonzalez* 1, Conrad M Albrecht 1, Nassim Ait Ali Braham 1, Devyani Lambhate* 2, Joao Lucas de Sousa Almeida 2, Paolo Fraccaro 2, Benedikt Blumenstiel 2, Thomas Brunschwiler 2, and Ranjini Bangalore 2 1 Remote Sensing Technology Institute, German Aerospace Center (DLR), Germany 2 IBM Research Labs, India, U.K., Zurich, Brazil February 28, 2025 Abstract Hyperspectral imaging provides detailed spectral information, offering significant potential for monitoring greenhouse gases like CH 4 and NO 2. However, its application is constrained by limited spatial coverage and infrequent revisit times. In contrast, multispectral imaging delivers broader spatial and temporal coverage but lacks the spectral granularity required for precise GHG detection. To address these challenges, this study proposes Spectral and Spatial-Spectral transformer models that reconstructs hyperspectral data from multispectral inputs. The models in this paper are pretrained on EnMAP and EMIT datasets and fine-tuned on spatio-temporally aligned (Sentinel-2, EnMAP) and (HLS-S30, EMIT) image pairs respectively. Our model has the potential to enhance atmospheric monitoring by combining the strengths of hyperspectral and multispectral imaging systems. 1 Introduction Satellite images are being used to create detailed maps of Earth's surface.


Comparative Analysis of Black Hole Mass Estimation in Type-2 AGNs: Classical vs. Quantum Machine Learning and Deep Learning Approaches

arXiv.org Artificial Intelligence

In the case of Type-2 AGNs, estimating the mass of the black hole is challenging. Understanding how galaxies form and evolve requires considerable insight into the mass of black holes. This work compared different classical and quantum machine learning (QML) algorithms for black hole mass estimation, wherein the classical algorithms are Linear Regression, XGBoost Regression, Random Forest Regressor, Support Vector Regressor (SVR), Lasso Regression, Ridge Regression, Elastic Net Regression, Bayesian Regression, Decision Tree Regressor, Gradient Booster Regressor, Classical Neural Networks, Gated Recurrent Unit (GRU), LSTM, Deep Residual Networks (ResNets) and Transformer-Based Regression. On the other hand, quantum algorithms including Hybrid Quantum Neural Networks (QNN), Quantum Long Short-Term Memory (Q-LSTM), Sampler-QNN, Estimator-QNN, Variational Quantum Regressor (VQR), Quantum Linear Regression(Q-LR), QML with JAX optimization were also tested. The results revealed that classical algorithms gave better R^2, MAE, MSE, and RMSE results than the quantum models. Among the classical models, LSTM has the best result with an accuracy of 99.77%. Estimator-QNN has the highest accuracy for quantum algorithms with an MSE of 0.0124 and an accuracy of 99.75%. This study ascertains both the strengths and weaknesses of the classical and the quantum approaches. As far as our knowledge goes, this work could pave the way for the future application of quantum algorithms in astrophysical data analysis.