Goto

Collaborating Authors

 Uttarakhand


Internet of Things-Based Smart Precision Farming in Soilless Agriculture: Opportunities and Challenges for Global Food Security

arXiv.org Artificial Intelligence

The rapid growth of the global population and the continuous decline in cultivable land pose significant threats to food security. This challenge worsens as climate change further reduces the availability of farmland. Soilless agriculture, such as hydroponics, aeroponics, and aquaponics, offers a sustainable solution by enabling efficient crop cultivation in controlled environments. The integration of the Internet of Things (IoT) with smart precision farming improves resource efficiency, automates environmental control, and ensures stable and high-yield crop production. IoT-enabled smart farming systems utilize real-time monitoring, data-driven decision-making, and automation to optimize water and nutrient usage while minimizing human intervention. This paper explores the opportunities and challenges of IoT-based soilless farming, highlighting its role in sustainable agriculture, urban farming, and global food security. These advanced farming methods ensure greater productivity, resource conservation, and year-round cultivation. However, they also face challenges such as high initial investment, technological dependency, and energy consumption. Through a comprehensive study, bibliometric analysis, and comparative analysis, this research highlights current trends and research gaps. It also outlines future directions for researchers, policymakers, and industry stakeholders to drive innovation and scalability in IoT-driven soilless agriculture. By emphasizing the benefits of vertical farming and Controlled Environment Agriculture (CEA)-enabled soilless techniques, this paper supports informed decision-making to address food security challenges and promote sustainable agricultural innovations.


Good Representation, Better Explanation: Role of Convolutional Neural Networks in Transformer-Based Remote Sensing Image Captioning

arXiv.org Artificial Intelligence

Remote Sensing Image Captioning (RSIC) is the process of generating meaningful descriptions from remote sensing images. Recently, it has gained significant attention, with encoder-decoder models serving as the backbone for generating meaningful captions. The encoder extracts essential visual features from the input image, transforming them into a compact representation, while the decoder utilizes this representation to generate coherent textual descriptions. Recently, transformer-based models have gained significant popularity due to their ability to capture long-range dependencies and contextual information. The decoder has been well explored for text generation, whereas the encoder remains relatively unexplored. However, optimizing the encoder is crucial as it directly influences the richness of extracted features, which in turn affects the quality of generated captions. To address this gap, we systematically evaluate twelve different convolutional neural network (CNN) architectures within a transformer-based encoder framework to assess their effectiveness in RSIC. The evaluation consists of two stages: first, a numerical analysis categorizes CNNs into different clusters, based on their performance. The best performing CNNs are then subjected to human evaluation from a human-centric perspective by a human annotator. Additionally, we analyze the impact of different search strategies, namely greedy search and beam search, to ensure the best caption. The results highlight the critical role of encoder selection in improving captioning performance, demonstrating that specific CNN architectures significantly enhance the quality of generated descriptions for remote sensing images. Introduction With the advancement of remote sensing technologies and machine learning-based methods, the demand for Remote Sensing Image Captioning (RSIC) [1, 2] is growing rapidly. It plays a crucial role in various fields, including environmental monitoring, urban planning, and disaster management, by providing automated textual descriptions of satellite images.


Pruning as a Defense: Reducing Memorization in Large Language Models

arXiv.org Artificial Intelligence

Large language models have been shown to memorize significan t portions of their training data, which they can reproduce when appropriately prompted. This work investigates the impact of simple pruning techniques on thi s behavior. Our findings reveal that pruning effectively reduces the extent of m emorization in LLMs, demonstrating its potential as a foundational approach for mitigating membership inference attacks. Large language models are known to memorize portions of thei r training data, which poses significant privacy and security risks. Although various studies h ave explored the extent of memorization in LLMs, most of these efforts are qualitative (Carlini et al .


A Floating Normalization Scheme for Deep Learning-Based Custom-Range Parameter Extraction in BSIM-CMG Compact Models

arXiv.org Artificial Intelligence

A deep-learning (DL) based methodology for automated extraction of BSIM-CMG compact model parameters from experimental gate capacitance vs gate voltage (Cgg-Vg) and drain current vs gate voltage (Id-Vg) measurements is proposed in this paper. The proposed method introduces a floating normalization scheme within a cascaded forward and inverse ANN architecture enabling user-defined parameter extraction ranges. Unlike conventional DL-based extraction techniques, which are often constrained by fixed normalization ranges, the floating normalization approach adapts dynamically to user-specified ranges, allowing for fine-tuned control over the extracted parameters. Experimental validation, using a TCAD calibrated 14 nm FinFET process, demonstrates high accuracy for both Cgg-Vg and Id-Vg parameter extraction. The proposed framework offers enhanced flexibility, making it applicable to various compact models beyond BSIM-CMG.


Are VLMs Really Blind

arXiv.org Artificial Intelligence

Vision Language Models excel in handling a wide range of complex tasks, including Optical Character Recognition (OCR), Visual Question Answering (VQA), and advanced geometric reasoning. However, these models fail to perform well on low-level basic visual tasks which are especially easy for humans. Our goal in this work was to determine if these models are truly "blind" to geometric reasoning or if there are ways to enhance their capabilities in this area. Our work presents a novel automatic pipeline designed to extract key information from images in response to specific questions. Instead of just relying on direct VQA, we use question-derived keywords to create a caption that highlights important details in the image related to the question. This caption is then used by a language model to provide a precise answer to the question without requiring external fine-tuning.


Enhancing Deep Learning based RMT Data Inversion using Gaussian Random Field

arXiv.org Artificial Intelligence

Deep learning (DL) methods have emerged as a powerful tool for the inversion of geophysical data. When applied to field data, these models often struggle without additional fine-tuning of the network. This is because they are built on the assumption that the statistical patterns in the training and test datasets are the same. To address this, we propose a DL-based inversion scheme for Radio Magnetotelluric data where the subsurface resistivity models are generated using Gaussian Random Fields (GRF). The network's generalization ability was tested with an out-of-distribution (OOD) dataset comprising a homogeneous background and various rectangular-shaped anomalous bodies. After end-to-end training with the GRF dataset, the pre-trained network successfully identified anomalies in the OOD dataset. Synthetic experiments confirmed that the GRF dataset enhances generalization compared to a homogeneous background OOD dataset. The network accurately recovered structures in a checkerboard resistivity model, and demonstrated robustness to noise, outperforming traditional gradient-based methods. Finally, the developed scheme is tested using exemplary field data from a waste site near Roorkee, India. The proposed scheme enhances generalization in a data-driven supervised learning framework, suggesting a promising direction for OOD generalization in DL methods.


Utilizing Transfer Learning and pre-trained Models for Effective Forest Fire Detection: A Case Study of Uttarakhand

arXiv.org Artificial Intelligence

--Forest fires pose a significant threat to the environment, human life, and property. Early detection and response are crucial to mitigating the impact of these disasters. However, traditional forest fire detection methods are often hindered by our reliability on manual observation and satellite imagery with low spatial resolution. This paper emphasizes the role of transfer learning in enhancing forest fire detection in India, particularly in overcoming data collection challenges and improving model accuracy across various regions. We compare traditional learning methods with transfer learning, focusing on the unique challenges posed by regional differences in terrain, climate, and vegetation. Transfer learning can be categorized into several types based on the similarity between the source and target tasks, as well as the type of knowledge transferred. One key method is utilizing pre-trained models for efficient transfer learning, which significantly reduces the need for extensive labeled data. We outline the transfer learning process, demonstrating how researchers can adapt pre-trained models like MobileNetV2 for specific tasks such as forest fire detection. India is home to a vast and diverse range of forests, covering over 70 million hectares of land [1]. These forests are crucial not only for the country's ecosystem and biodiversity but also provide livelihoods for millions of people, particularly in rural areas. However, India's forests are facing a growing threat from forest fires, which can have devastating consequences for the environment, human life, and property [2]. Forest fires are a major concern in India, particularly during the summer months when temperatures are high and humidity is low. According to the Indian government, forest fires affect over 50, 000 hectares of land annually, causing significant economic losses and damage to the environment [3]. The country's forests are also home to a wide range of wildlife, including many endangered species which are threatened by fires. Figure 1 illustrates some images of the Uttarakhand, India, forest fire. Early detection and response are critical to mitigating the impact of forest fires. Traditional methods of forest fire detection, such as manual observation and satellite imagery with low spatial resolution, are often limited in their ability to detect fires quickly and accurately [4]. Manual observation is time-consuming and labour-intensive and may not be feasible in remote or inaccessible areas [5]. Satellite imagery with low spatial resolution may not be able to detect small fires or fires in areas with dense vegetation. In recent years, advances in deep learning and computer vision have enabled the development of more effective methods for forest fire detection. Convolutional neural networks (CNNs), in particular, have shown great promise in image classification tasks [6]-[10], including fire detection [4].


'Since Lawyers are Males..': Examining Implicit Gender Bias in Hindi Language Generation by LLMs

arXiv.org Artificial Intelligence

Large Language Models (LLMs) are increasingly being used to generate text across various languages, for tasks such as translation, customer support, and education. Despite these advancements, LLMs show notable gender biases in English, which become even more pronounced when generating content in relatively underrepresented languages like Hindi. This study explores implicit gender biases in Hindi text generation and compares them to those in English. We developed Hindi datasets inspired by WinoBias to examine stereotypical patterns in responses from models like GPT-4o and Claude-3 sonnet. Our results reveal a significant gender bias of 87.8% in Hindi, compared to 33.4% in English GPT-4o generation, with Hindi responses frequently relying on gender stereotypes related to occupations, power hierarchies, and social class. This research underscores the variation in gender biases across languages and provides considerations for navigating these biases in generative AI systems.


Reliable or Deceptive? Investigating Gated Features for Smooth Visual Explanations in CNNs

arXiv.org Artificial Intelligence

Deep learning models have achieved remarkable success across diverse domains. However, the intricate nature of these models often impedes a clear understanding of their decision-making processes. This is where Explainable AI (XAI) becomes indispensable, offering intuitive explanations for model decisions. In this work, we propose a simple yet highly effective approach, ScoreCAM++, which introduces modifications to enhance the promising ScoreCAM method for visual explainability. Our proposed approach involves altering the normalization function within the activation layer utilized in ScoreCAM, resulting in significantly improved results compared to previous efforts. Additionally, we apply an activation function to the upsampled activation layers to enhance interpretability. This improvement is achieved by selectively gating lower-priority values within the activation layer. Through extensive experiments and qualitative comparisons, we demonstrate that ScoreCAM++ consistently achieves notably superior performance and fairness in interpreting the decision-making process compared to both ScoreCAM and previous methods.


LG-Traj: LLM Guided Pedestrian Trajectory Prediction

arXiv.org Artificial Intelligence

Accurate pedestrian trajectory prediction is crucial for various applications, and it requires a deep understanding of pedestrian motion patterns in dynamic environments. However, existing pedestrian trajectory prediction methods still need more exploration to fully leverage these motion patterns. This paper investigates the possibilities of using Large Language Models (LLMs) to improve pedestrian trajectory prediction tasks by inducing motion cues. We introduce LG-Traj, a novel approach incorporating LLMs to generate motion cues present in pedestrian past/observed trajectories. Our approach also incorporates motion cues present in pedestrian future trajectories by clustering future trajectories of training data using a mixture of Gaussians. These motion cues, along with pedestrian coordinates, facilitate a better understanding of the underlying representation. Furthermore, we utilize singular value decomposition to augment the observed trajectories, incorporating them into the model learning process to further enhance representation learning. Our method employs a transformer-based architecture comprising a motion encoder to model motion patterns and a social decoder to capture social interactions among pedestrians. We demonstrate the effectiveness of our approach on popular pedestrian trajectory prediction benchmarks, namely ETH-UCY and SDD, and present various ablation experiments to validate our approach.