Hussain, Muhammad
YOLOv12: A Breakdown of the Key Architectural Features
Alif, Mujadded Al Rabbani, Hussain, Muhammad
This paper presents an architectural analysis of YOLOv12, a significant advancement in single-stage, real-time object detection building upon the strengths of its predecessors while introducing key improvements. The model incorporates an optimised backbone (R-ELAN), 7x7 separable convolutions, and FlashAttention-driven area-based attention, improving feature extraction, enhanced efficiency, and robust detections. With multiple model variants, similar to its predecessors, YOLOv12 offers scalable solutions for both latency-sensitive and high-accuracy applications. Experimental results manifest consistent gains in mean average precision (mAP) and inference speed, making YOLOv12 a compelling choice for applications in autonomous systems, security, and real-time analytics. By achieving an optimal balance between computational efficiency and performance, YOLOv12 sets a new benchmark for real-time computer vision, facilitating deployment across diverse hardware platforms, from edge devices to high-performance clusters.
Diabetic Retinopathy Screening Using Custom-Designed Convolutional Neural Network
Saeed, Fahman, Hussain, Muhammad, Member, Senior, IEEE, null, Aboalsamh, Hatim A, Member, Senior, IEEE, null, Adel, Fadwa Al, Owaifeer, Adi Mohammed Al
The prevalence of diabetic retinopathy (DR) has reached 34.6% worldwide and is a major cause of blindness among middle-aged diabetic patients. Regular DR screening using fundus photography helps detect its complications and prevent its progression to advanced levels. As manual screening is time-consuming and subjective, machine learning (ML) and deep learning (DL) have been employed to aid graders. However, the existing CNN-based methods use either pre-trained CNN models or a brute force approach to design new CNN models, which are not customized to the complexity of fundus images. To overcome this issue, we introduce an approach for custom-design of CNN models, whose architectures are adapted to the structural patterns of fundus images and better represent the DR-relevant features. It takes the leverage of k-medoid clustering, principal component analysis (PCA), and inter-class and intra-class variations to automatically determine the depth and width of a CNN model. The designed models are lightweight, adapted to the internal structures of fundus images, and encode the discriminative patterns of DR lesions. The technique is validated on a local dataset from King Saud University Medical City, Saudi Arabia, and two challenging benchmark datasets from Kaggle: EyePACS and APTOS2019. The custom-designed models outperform the famous pre-trained CNN models like ResNet152, Densnet121, and ResNeSt50 with a significant decrease in the number of parameters and compete well with the state-of-the-art CNN-based DR screening methods. The proposed approach is helpful for DR screening under diverse clinical settings and referring the patients who may need further assessment and treatment to expert ophthalmologists.
Alignment-Free Cross-Sensor Fingerprint Matching based on the Co-Occurrence of Ridge Orientations and Gabor-HoG Descriptor
AlShehri, Helala, Hussain, Muhammad, AboAlSamh, Hatim, Emad-ul-Haq, Qazi, Azmi, Aqil M.
The existing automatic fingerprint verification methods are designed to work under the assumption that the same sensor is installed for enrollment and authentication (regular matching). There is a remarkable decrease in efficiency when one type of contact-based sensor is employed for enrolment and another type of contact-based sensor is used for authentication (cross-matching or fingerprint sensor interoperability problem,). The ridge orientation patterns in a fingerprint are invariant to sensor type. Based on this observation, we propose a robust fingerprint descriptor called the co-occurrence of ridge orientations (Co-Ror), which encodes the spatial distribution of ridge orientations. Employing this descriptor, we introduce an efficient automatic fingerprint verification method for cross-matching problem. Further, to enhance the robustness of the method, we incorporate scale based ridge orientation information through Gabor-HoG descriptor. The two descriptors are fused with canonical correlation analysis (CCA), and the matching score between two fingerprints is calculated using city-block distance. The proposed method is alignment-free and can handle the matching process without the need for a registration step. The intensive experiments on two benchmark databases (FingerPass and MOLF) show the effectiveness of the method and reveal its significant enhancement over the state-of-the-art methods such as VeriFinger (a commercial SDK), minutia cylinder-code (MCC), MCC with scale, and the thin-plate spline (TPS) model. The proposed research will help security agencies, service providers and law-enforcement departments to overcome the interoperability problem of contact sensors of different technology and interaction types.
Automatic Emotion Recognition (AER) System based on Two-Level Ensemble of Lightweight Deep CNN Models
Qazi, Emad-ul-Haq, Hussain, Muhammad, AboAlsamh, Hatim, Ullah, Ihsan
Emotions play a crucial role in human interaction, health care and security investigations and monitoring. Automatic emotion recognition (AER) using electroencephalogram (EEG) signals is an effective method for decoding the real emotions, which are independent of body gestures, but it is a challenging problem. Several automatic emotion recognition systems have been proposed, which are based on traditional hand-engineered approaches and their performances are very poor. Motivated by the outstanding performance of deep learning (DL) in many recognition tasks, we introduce an AER system (Deep-AER) based on EEG brain signals using DL. A DL model involves a large number of learnable parameters, and its training needs a large dataset of EEG signals, which is difficult to acquire for AER problem. To overcome this problem, we proposed a lightweight pyramidal one-dimensional convolutional neural network (LP-1D-CNN) model, which involves a small number of learnable parameters. Using LP-1D-CNN, we build a two level ensemble model. In the first level of the ensemble, each channel is scanned incrementally by LP-1D-CNN to generate predictions, which are fused using majority vote. The second level of the ensemble combines the predictions of all channels of an EEG signal using majority vote for detecting the emotion state. We validated the effectiveness and robustness of Deep-AER using DEAP, a benchmark dataset for emotion recognition research. The results indicate that FRONT plays dominant role in AER and over this region, Deep-AER achieved the accuracies of 98.43% and 97.65% for two AER problems, i.e., high valence vs low valence (HV vs LV) and high arousal vs low arousal (HA vs LA), respectively. The comparison reveals that Deep-AER outperforms the state-of-the-art systems with large margin. The Deep-AER system will be helpful in monitoring for health care and security investigations.
Eigen Values Features for the Classification of Brain Signals corresponding to 2D and 3D Educational Contents
Bamatraf, Saeed, Hussain, Muhammad, Qazi, Emad-ul-Haq, Aboalsamh, Hatim
In this paper, we have proposed a brain signal classification method, which uses eigenvalues of the covariance matrix as features to classify images (topomaps) created from the brain signals. The signals are recorded during the answering of 2D and 3D questions. The system is used to classify the correct and incorrect answers for both 2D and 3D questions. Using the classification technique, the impacts of 2D and 3D multimedia educational contents on learning, memory retention and recall will be compared. The subjects learn similar 2D and 3D educational contents. Afterwards, subjects are asked 20 multiple-choice questions (MCQs) associated with the contents after thirty minutes (Short-Term Memory) and two months (Long-Term Memory). Eigenvalues features extracted from topomaps images are given to K-Nearest Neighbor (KNN) and Support Vector Machine (SVM) classifiers, in order to identify the states of the brain related to incorrect and correct answers. Excellent accuracies obtained by both classifiers and by applying statistical analysis on the results, no significant difference is indicated between 2D and 3D multimedia educational contents on learning, memory retention and recall in both STM and LTM.
An Efficient Intelligent System for the Classification of Electroencephalography (EEG) Brain Signals using Nuclear Features for Human Cognitive Tasks
Qazi, Emad-ul-Haq, Hussain, Muhammad, Aboalsamh, Hatim
Representation and classification of Electroencephalography (EEG) brain signals are critical processes for their analysis in cognitive tasks. Particularly, extraction of discriminative features from raw EEG signals, without any pre-processing, is a challenging task. Motivated by nuclear norm, we observed that there is a significant difference between the variances of EEG signals captured from the same brain region when a subject performs different tasks. This observation lead us to use singular value decomposition for computing dominant variances of EEG signals captured from a certain brain region while performing a certain task and use them as features (nuclear features). A simple and efficient class means based minimum distance classifier (CMMDC) is enough to predict brain states. This approach results in the feature space of significantly small dimension and gives equally good classification results on clean as well as raw data. We validated the effectiveness and robustness of the technique using four datasets of different tasks: fluid intelligence clean data (FICD), fluid intelligence raw data (FIRD), memory recall task (MRT), and eyes open / eyes closed task (EOEC). For each task, we analyzed EEG signals over six (06) different brain regions with 8, 16, 20, 18, 18 and 100 electrodes. The nuclear features from frontal brain region gave the 100% prediction accuracy. The discriminant analysis of the nuclear features has been conducted using intra-class and inter-class variations. Comparisons with the state-of-the-art techniques showed the superiority of the proposed system.