AITopics | Javed, Sajid

Collaborating Authors

Javed, Sajid

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Underwater Image Enhancement by Convolutional Spiking Neural Networks

Sudevan, Vidya, Zayer, Fakhreddine, Kausar, Rizwana, Javed, Sajid, Karki, Hamad, De Masi, Giulia, Dias, Jorge

arXiv.org Artificial IntelligenceMar-26-2025

Underwater image enhancement (UIE) is fundamental for marine applications, including autonomous vision-based navigation. Deep learning methods using convolutional neural networks (CNN) and vision transformers advanced UIE performance. Recently, spiking neural networks (SNN) have gained attention for their lightweight design, energy efficiency, and scalability. This paper introduces UIE-SNN, the first SNN-based UIE algorithm to improve visibility of underwater images. UIE-SNN is a 19- layered convolutional spiking encoder-decoder framework with skip connections, directly trained using surrogate gradient-based backpropagation through time (BPTT) strategy. We explore and validate the influence of training datasets on energy reduction, a unique advantage of UIE-SNN architecture, in contrast to the conventional learning-based architectures, where energy consumption is model-dependent. UIE-SNN optimizes the loss function in latent space representation to reconstruct clear underwater images. Our algorithm performs on par with its non-spiking counterpart methods in terms of PSNR and structural similarity index (SSIM) at reduced timesteps ($T=5$) and energy consumption of $85\%$. The algorithm is trained on two publicly available benchmark datasets, UIEB and EUVP, and tested on unseen images from UIEB, EUVP, LSUI, U45, and our custom UIE dataset. The UIE-SNN algorithm achieves PSNR of $17.7801~dB$ and SSIM of $0.7454$ on UIEB, and PSNR of $23.1725~dB$ and SSIM of $0.7890$ on EUVP. UIE-SNN achieves this algorithmic performance with fewer operators ($147.49$ GSOPs) and energy ($0.1327~J$) compared to its non-spiking counterpart (GFLOPs = $218.88$ and Energy=$1.0068~J$). Compared with existing SOTA UIE methods, UIE-SNN achieves an average of $6.5\times$ improvement in energy efficiency. The source code is available at \href{https://github.com/vidya-rejul/UIE-SNN.git}{UIE-SNN}.

artificial intelligence, energy consumption, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2503.20485

Country: Asia (0.46)

Genre: Research Report > Promising Solution (0.46)

Industry:

Health & Medicine (0.93)
Energy > Oil & Gas (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

AquaticCLIP: A Vision-Language Foundation Model for Underwater Scene Analysis

Alawode, Basit, Ganapathi, Iyyakutti Iyappan, Javed, Sajid, Werghi, Naoufel, Bennamoun, Mohammed, Mahmood, Arif

arXiv.org Artificial IntelligenceFeb-3-2025

The preservation of aquatic biodiversity is critical in mitigating the effects of climate change. Aquatic scene understanding plays a pivotal role in aiding marine scientists in their decision-making processes. In this paper, we introduce AquaticCLIP, a novel contrastive language-image pre-training model tailored for aquatic scene understanding. AquaticCLIP presents a new unsupervised learning framework that aligns images and texts in aquatic environments, enabling tasks such as segmentation, classification, detection, and object counting. By leveraging our large-scale underwater image-text paired dataset without the need for ground-truth annotations, our model enriches existing vision-language models in the aquatic domain. For this purpose, we construct a 2 million underwater image-text paired dataset using heterogeneous resources, including YouTube, Netflix, NatGeo, etc. To fine-tune AquaticCLIP, we propose a prompt-guided vision encoder that progressively aggregates patch features via learnable prompts, while a vision-guided mechanism enhances the language encoder by incorporating visual context. The model is optimized through a contrastive pretraining loss to align visual and textual modalities. AquaticCLIP achieves notable performance improvements in zero-shot settings across multiple underwater computer vision tasks, outperforming existing methods in both robustness and interpretability. Our model sets a new benchmark for vision-language applications in underwater environments. The code and dataset for AquaticCLIP are publicly available on GitHub at xxx.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.01785

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
Asia > Middle East > Saudi Arabia (0.14)

Genre: Research Report > New Finding (0.45)

Industry:

Food & Agriculture > Fishing (0.68)
Education (0.67)
Media (0.54)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)

Add feedback

Hybrid-Neuromorphic Approach for Underwater Robotics Applications: A Conceptual Framework

Sudevan, Vidya, Zayer, Fakhreddine, Javed, Sajid, Karki, Hamad, De Masi, Giulia, Dias, Jorge

arXiv.org Artificial IntelligenceNov-21-2024

This paper introduces the concept of employing neuromorphic methodologies for task-oriented underwater robotics applications. In contrast to the increasing computational demands of conventional deep learning algorithms, neuromorphic technology, leveraging spiking neural network architectures, promises sophisticated artificial intelligence with significantly reduced computational requirements and power consumption, emulating human brain operational principles. Despite documented neuromorphic technology applications in various robotic domains, its utilization in marine robotics remains largely unexplored. Thus, this article proposes a unified framework for integrating neuromorphic technologies for perception, pose estimation, and haptic-guided conditional control of underwater vehicles, customized to specific user-defined objectives. This conceptual framework stands to revolutionize underwater robotics, enhancing efficiency and autonomy while reducing energy consumption. By enabling greater adaptability and robustness, this advancement could facilitate applications such as underwater exploration, environmental monitoring, and infrastructure maintenance, thereby contributing to significant progress in marine science and technology.

application, artificial intelligence, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2411.13962

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas > Upstream (0.47)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Dehazing-aided Multi-Rate Multi-Modal Pose Estimation Framework for Mitigating Visual Disturbances in Extreme Underwater Domain

Sudevan, Vidya, Zayer, Fakhreddine, Hassan, Taimur, Javed, Sajid, Karki, Hamad, De Masi, Giulia, Dias, Jorge

arXiv.org Artificial IntelligenceNov-21-2024

This paper delves into the potential of DU-VIO, a dehazing-aided hybrid multi-rate multi-modal Visual-Inertial Odometry (VIO) estimation framework, designed to thrive in the challenging realm of extreme underwater environments. The cutting-edge DU-VIO framework is incorporating a GAN-based pre-processing module and a hybrid CNN-LSTM module for precise pose estimation, using visibility-enhanced underwater images and raw IMU data. Accurate pose estimation is paramount for various underwater robotics and exploration applications. However, underwater visibility is often compromised by suspended particles and attenuation effects, rendering visual-inertial pose estimation a formidable challenge. DU-VIO aims to overcome these limitations by effectively removing visual disturbances from raw image data, enhancing the quality of image features used for pose estimation. We demonstrate the effectiveness of DU-VIO by calculating RMSE scores for translation and rotation vectors in comparison to their reference values. These scores are then compared to those of a base model using a modified AQUALOC Dataset. This study's significance lies in its potential to revolutionize underwater robotics and exploration. DU-VIO offers a robust solution to the persistent challenge of underwater visibility, significantly improving the accuracy of pose estimation. This research contributes valuable insights and tools for advancing underwater technology, with far-reaching implications for scientific research, environmental monitoring, and industrial applications.

artificial intelligence, estimation, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2411.13988

Country: Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.28)

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas > Upstream (0.61)

Technology:

Information Technology > Artificial Intelligence > Vision > Video Understanding (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

CPLIP: Zero-Shot Learning for Histopathology with Comprehensive Vision-Language Alignment

Javed, Sajid, Mahmood, Arif, Ganapathi, Iyyakutti Iyappan, Dharejo, Fayaz Ali, Werghi, Naoufel, Bennamoun, Mohammed

arXiv.org Artificial IntelligenceJun-7-2024

This paper proposes Comprehensive Pathology Language Image Pre-training (CPLIP), a new unsupervised technique designed to enhance the alignment of images and text in histopathology for tasks such as classification and segmentation. This methodology enriches vision-language models by leveraging extensive data without needing ground truth annotations. CPLIP involves constructing a pathology-specific dictionary, generating textual descriptions for images using language models, and retrieving relevant images for each text snippet via a pre-trained model. The model is then fine-tuned using a many-to-many contrastive learning method to align complex interrelated concepts across both modalities. Evaluated across multiple histopathology tasks, CPLIP shows notable improvements in zero-shot learning scenarios, outperforming existing methods in both interpretability and robustness and setting a higher benchmark for the application of vision-language models in the field. To encourage further research and replication, the code for CPLIP is available on GitHub at https://cplip.github.io/

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2406.05205

Country:

Oceania > Australia (0.14)
North America > United States (0.14)
Asia > Pakistan (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Oncology > Leukemia (1.00)
Health & Medicine > Therapeutic Area > Musculoskeletal (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
(7 more...)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)

Add feedback

Accurate and Efficient Urban Street Tree Inventory with Deep Learning on Mobile Phone Imagery

Khan, Asim, Nawaz, Umair, Ulhaq, Anwaar, Gondal, Iqbal, Javed, Sajid

arXiv.org Artificial IntelligenceJan-2-2024

Deforestation, a major contributor to climate change, poses detrimental consequences such as agricultural sector disruption, global warming, flash floods, and landslides. Conventional approaches to urban street tree inventory suffer from inaccuracies and necessitate specialised equipment. To overcome these challenges, this paper proposes an innovative method that leverages deep learning techniques and mobile phone imaging for urban street tree inventory. Our approach utilises a pair of images captured by smartphone cameras to accurately segment tree trunks and compute the diameter at breast height (DBH). Compared to traditional methods, our approach exhibits several advantages, including superior accuracy, reduced dependency on specialised equipment, and applicability in hard-to-reach areas. We evaluated our method on a comprehensive dataset of 400 trees and achieved a DBH estimation accuracy with an error rate of less than 2.5%. Our method holds significant potential for substantially improving forest management practices. By enhancing the accuracy and efficiency of tree inventory, our model empowers urban management to mitigate the adverse effects of deforestation and climate change.

artificial intelligence, breast height, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2401.0118

Country:

Asia > Middle East > UAE (0.47)
Oceania > Australia (0.46)

Genre: Research Report (0.84)

Industry: Health & Medicine (0.34)

Technology:

Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Unsupervised Mutual Transformer Learning for Multi-Gigapixel Whole Slide Image Classification

Javed, Sajid, Mahmood, Arif, Qaiser, Talha, Werghi, Naoufel, Rajpoot, Nasir

arXiv.org Artificial IntelligenceMay-3-2023

Classification of gigapixel Whole Slide Images (WSIs) is an important prediction task in the emerging area of computational pathology. There has been a surge of research in deep learning models for WSI classification with clinical applications such as cancer detection or prediction of molecular mutations from WSIs. Most methods require expensive and labor-intensive manual annotations by expert pathologists. Weakly supervised Multiple Instance Learning (MIL) methods have recently demonstrated excellent performance; however, they still require large slide-level labeled training datasets that need a careful inspection of each slide by an expert pathologist. In this work, we propose a fully unsupervised WSI classification algorithm based on mutual transformer learning. Instances from gigapixel WSI (i.e., image patches) are transformed into a latent space and then inverse-transformed to the original space. Using the transformation loss, pseudo-labels are generated and cleaned using a transformer label-cleaner. The proposed transformer-based pseudo-label generation and cleaning modules mutually train each other iteratively in an unsupervised manner. A discriminative learning mechanism is introduced to improve normal versus cancerous instance labeling. In addition to unsupervised classification, we demonstrate the effectiveness of the proposed framework for weak supervision for cancer subtype classification as downstream analysis. Extensive experiments on four publicly available datasets show excellent performance compared to the state-of-the-art methods. We intend to make the source code of our algorithm publicly available soon.

artificial intelligence, classification, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2305.02032

Country: Asia (0.67)

Genre: Research Report > Promising Solution (0.34)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

DFR-FastMOT: Detection Failure Resistant Tracker for Fast Multi-Object Tracking Based on Sensor Fusion

Nagy, Mohamed, Khonji, Majid, Dias, Jorge, Javed, Sajid

arXiv.org Artificial IntelligenceFeb-28-2023

Persistent multi-object tracking (MOT) allows autonomous vehicles to navigate safely in highly dynamic environments. One of the well-known challenges in MOT is object occlusion when an object becomes unobservant for subsequent frames. The current MOT methods store objects information, like objects' trajectory, in internal memory to recover the objects after occlusions. However, they retain short-term memory to save computational time and avoid slowing down the MOT method. As a result, they lose track of objects in some occlusion scenarios, particularly long ones. In this paper, we propose DFR-FastMOT, a light MOT method that uses data from a camera and LiDAR sensors and relies on an algebraic formulation for object association and fusion. The formulation boosts the computational time and permits long-term memory that tackles more occlusion scenarios. Our method shows outstanding tracking performance over recent learning and non-learning benchmarks with about 3% and 4% margin in MOTA, respectively. Also, we conduct extensive experiments that simulate occlusion phenomena by employing detectors with various distortion levels. The proposed solution enables superior performance under various distortion levels in detection over current state-of-art methods. Our framework processes about 7,763 frames in 1.48 seconds, which is seven times faster than recent benchmarks. The framework will be available at https://github.com/MohamedNagyMostafa/DFR-FastMOT.

artificial intelligence, detection, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2302.14807

Country: Asia (0.28)

Genre: Research Report (0.64)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Vision (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Higher-order Sparse Convolutions in Graph Neural Networks

Giraldo, Jhony H., Javed, Sajid, Mahmood, Arif, Malliaros, Fragkiskos D., Bouwmans, Thierry

arXiv.org Artificial IntelligenceFeb-21-2023

Graph Neural Networks (GNNs) have been applied to many problems in computer sciences. Capturing higher-order relationships between nodes is crucial to increase the expressive power of GNNs. However, existing methods to capture these relationships could be infeasible for large-scale graphs. In this work, we introduce a new higher-order sparse convolution based on the Sobolev norm of graph signals. Our Sparse Sobolev GNN (S-SobGNN) computes a cascade of filters on each layer with increasing Hadamard powers to get a more diverse set of functions, and then a linear combination layer weights the embeddings of each filter. We evaluate S-SobGNN in several applications of semi-supervised learning. S-SobGNN shows competitive performance in all applications as compared to several state-of-the-art methods.

artificial intelligence, graph, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2302.10505

Country: Europe > France (0.15)

Genre: Research Report (0.85)

Industry: Health & Medicine > Therapeutic Area (0.96)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)

Add feedback

Robust PCA, Subspace Learning, and Tracking

Vaswani, Namrata, Bouwmans, Thierry, Javed, Sajid, Narayanamurthy, Praneeth

arXiv.org Machine LearningMar-1-2018

PCA is one of the most widely used dimension reduction techniques. A related easier problem is "subspace learning" or "subspace estimation". Given relatively clean data, both are easily solved via singular value decomposition (SVD). The problem of subspace learning or PCA in the presence of outliers is called robust subspace learning or robust PCA (RPCA). For long data sequences, if one tries to use a single lower dimensional subspace to represent the data, the required subspace dimension may end up being quite large. For such data, a better model is to assume that it lies in a low-dimensional subspace that can change over time, albeit gradually. The problem of tracking such data (and the subspaces) while being robust to outliers is called robust subspace tracking (RST). This article provides a magazine-style overview of the entire field of robust subspace learning and tracking. In particular solutions for three problems are discussed in detail: RPCA via sparse+low-rank matrix decomposition (S+LR), RST via S+LR, and "robust subspace recovery". This assumes that an entire data vector is either an outlier or an inlier. The S+LR formulation instead assumes that outliers occur on only a few data vector indices and hence are well modeled as sparse corruptions.

algorithm, artificial intelligence, health & medicine, (20 more...)

arXiv.org Machine Learning

1711.09492

Country: North America > United States (0.27)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (0.67)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Data Science (1.00)
Information Technology > Communications (1.00)
(3 more...)

Add feedback