Zulkernine, Farhana
Self-Supervised Keypoint Detection with Distilled Depth Keypoint Representation
Anand, Aman, Rashno, Elyas, Eskandari, Amir, Zulkernine, Farhana
Existing unsupervised keypoint detection methods apply artificial deformations to images such as masking a significant portion of images and using reconstruction of original image as a learning objective to detect keypoints. However, this approach lacks depth information in the image and often detects keypoints on the background. To address this, we propose Distill-DKP, a novel cross-modal knowledge distillation framework that leverages depth maps and RGB images for keypoint detection in a self-supervised setting. During training, Distill-DKP extracts embedding-level knowledge from a depth-based teacher model to guide an image-based student model with inference restricted to the student. Experiments show that Distill-DKP significantly outperforms previous unsupervised methods by reducing mean L2 error by 47.15% on Human3.6M, mean average error by 5.67% on Taichi, and improving keypoints accuracy by 1.3% on DeepFashion dataset. Detailed ablation studies demonstrate the sensitivity of knowledge distillation across different layers of the network. Project Page: https://23wm13.github.io/distill-dkp/
SDA-GRIN for Adaptive Spatial-Temporal Multivariate Time Series Imputation
Eskandari, Amir, Anand, Aman, Sharma, Drishti, Zulkernine, Farhana
In various applications, the multivariate time series often suffers from missing data. This issue can significantly disrupt systems that rely on the data. Spatial and temporal dependencies can be leveraged to impute the missing samples. Existing imputation methods often ignore dynamic changes in spatial dependencies. We propose a Spatial Dynamic Aware Graph Recurrent Imputation Network (SDA-GRIN) which is capable of capturing dynamic changes in spatial dependencies.SDA-GRIN leverages a multi-head attention mechanism to adapt graph structures with time. SDA-GRIN models multivariate time series as a sequence of temporal graphs and uses a recurrent message-passing architecture for imputation. We evaluate SDA-GRIN on four real-world datasets: SDA-GRIN improves MSE by 9.51% for the AQI and 9.40% for AQI-36. On the PEMS-BAY dataset, it achieves a 1.94% improvement in MSE. Detailed ablation study demonstrates the effect of window sizes and missing data on the performance of the method. Project page:https://ameskandari.github.io/sda-grin/
Leveraging Large Language Models for Patient Engagement: The Power of Conversational AI in Digital Health
Wen, Bo, Norel, Raquel, Liu, Julia, Stappenbeck, Thaddeus, Zulkernine, Farhana, Chen, Huamin
The rapid advancements in large language models (LLMs) have opened up new opportunities for transforming patient engagement in healthcare through conversational AI. This paper presents an overview of the current landscape of LLMs in healthcare, specifically focusing on their applications in analyzing and generating conversations for improved patient engagement. We showcase the power of LLMs in handling unstructured conversational data through four case studies: (1) analyzing mental health discussions on Reddit, (2) developing a personalized chatbot for cognitive engagement in seniors, (3) summarizing medical conversation datasets, and (4) designing an AI-powered patient engagement system. These case studies demonstrate how LLMs can effectively extract insights and summarizations from unstructured dialogues and engage patients in guided, goal-oriented conversations. Leveraging LLMs for conversational analysis and generation opens new doors for many patient-centered outcomes research opportunities. However, integrating LLMs into healthcare raises important ethical considerations regarding data privacy, bias, transparency, and regulatory compliance. We discuss best practices and guidelines for the responsible development and deployment of LLMs in healthcare settings. Realizing the full potential of LLMs in digital health will require close collaboration between the AI and healthcare professionals communities to address technical challenges and ensure these powerful tools' safety, efficacy, and equity.
Comparative Analysis of Open-Source Language Models in Summarizing Medical Text Data
Chen, Yuhao, Wang, Zhimu, Wen, Bo, Zulkernine, Farhana
Unstructured text in medical notes and dialogues contains rich information. Recent advancements in Large Language Models (LLMs) have demonstrated superior performance in question answering and summarization tasks on unstructured text data, outperforming traditional text analysis approaches. However, there is a lack of scientific studies in the literature that methodically evaluate and report on the performance of different LLMs, specifically for domain-specific data such as medical chart notes. We propose an evaluation approach to analyze the performance of open-source LLMs such as Llama2 and Mistral for medical summarization tasks, using GPT-4 as an assessor. Our innovative approach to quantitative evaluation of LLMs can enable quality control, support the selection of effective LLMs for specific tasks, and advance knowledge discovery in digital health.
A Preliminary Study on Pattern Reconstruction for Optimal Storage of Wearable Sensor Data
Mahfuz, Sazia, Zulkernine, Farhana
Efficient querying and retrieval of healthcare data is posing a critical challenge today with numerous connected devices continuously generating petabytes of images, text, and internet of things (IoT) sensor data. One approach to efficiently store the healthcare data is to extract the relevant and representative features and store only those features instead of the continuous streaming data. However, it raises a question as to the amount of information content we can retain from the data and if we can reconstruct the pseudo-original data when needed. By facilitating relevant and representative feature extraction, storage and reconstruction of near original pattern, we aim to address some of the challenges faced by the explosion of the streaming data. We present a preliminary study, where we explored multiple autoencoders for concise feature extraction and reconstruction for human activity recognition (HAR) sensor data. Our Multi-Layer Perceptron (MLP) deep autoencoder achieved a storage reduction of 90.18% compared to the three other implemented autoencoders namely convolutional autoencoder, Long-Short Term Memory (LSTM) autoencoder, and convolutional LSTM autoencoder which achieved storage reductions of 11.18%, 49.99%, and 72.35% respectively. Encoded features from the autoencoders have smaller size and dimensions which help to reduce the storage space. For higher dimensions of the representation, storage reduction was low. But retention of relevant information was high, which was validated by classification performed on the reconstructed data.
ReViSe: Remote Vital Signs Measurement Using Smartphone Camera
Qiao, Donghao, Ayesha, Amtul Haq, Zulkernine, Farhana, Masroor, Raihan, Jaffar, Nauman
We propose an end-to-end framework to measure people's vital signs including Heart Rate (HR), Heart Rate Variability (HRV), Oxygen Saturation (SpO2) and Blood Pressure (BP) based on the rPPG methodology from the video of a user's face captured with a smartphone camera. We extract face landmarks with a deep learning-based neural network model in real-time. Multiple face patches also called Regions-of-Interest (RoIs) are extracted by using the predicted face landmarks. Several filters are applied to reduce the noise from the RoIs in the extracted cardiac signals called Blood Volume Pulse (BVP) signal. The measurements of HR, HRV and SpO2 are validated on two public rPPG datasets namely the TokyoTech rPPG and the Pulse Rate Detection (PURE) datasets, on which our models achieved the following Mean Absolute Errors (MAE): a) for HR, 1.73Beats-Per-Minute (bpm) and 3.95bpm respectively; b) for HRV, 18.55ms and 25.03ms respectively, and c) for SpO2, an MAE of 1.64% on the PURE dataset. We validated our end-to-end rPPG framework, ReViSe, in daily living environment, and thereby created the Video-HR dataset. Our HR estimation model achieved an MAE of 2.49bpm on this dataset. Since no publicly available rPPG datasets existed for BP measurement with face videos, we used a dataset with signals from fingertip sensor to train our deep learning-based BP estimation model and also created our own video dataset, Video-BP. On our Video-BP dataset, our BP estimation model achieved an MAE of 6.7mmHg for Systolic Blood Pressure (SBP), and an MAE of 9.6mmHg for Diastolic Blood Pressure (DBP). ReViSe framework has been validated on datasets with videos recorded in daily living environment as opposed to less noisy laboratory environment as reported by most state-of-the-art techniques.
Towards a Natural Language Query Processing System
Montgomery, Chantal, Isah, Haruna, Zulkernine, Farhana
Tackling the information retrieval gap between non-technical database end-users and those with the knowledge of formal query languages has been an interesting area of data management and analytics research. The use of natural language interfaces to query information from databases offers the opportunity to bridge the communication challenges between end-users and systems that use formal query languages. Previous research efforts mainly focused on developing structured query interfaces to relational databases. However, the evolution of unstructured big data such as text, images, and video has exposed the limitations of traditional structured query interfaces. While the existing web search tools prove the popularity and usability of natural language query, they return complete documents and web pages instead of focused query responses and are not applicable to database systems. This paper reports our study on the design and development of a natural language query interface to a backend relational database. The novelty in the study lies in defining a graph database as a middle layer to store necessary metadata needed to transform a natural language query into structured query language that can be executed on backend databases. We implemented and evaluated our approach using a restaurant dataset. The translation results for some sample queries yielded a 90% accuracy rate.
Detecting Irregular Patterns in IoT Streaming Data for Fall Detection
Mahfuz, Sazia, Isah, Haruna, Zulkernine, Farhana, Nicholls, Peter
Abstract-- Detecting patterns in real time streaming data has been an interesting and challenging data analytics problem. With the proliferation of a variety of sensor devices, real-time analytics of data from the Internet of Things (IoT) to learn regular and irregular patterns has become an important machine learning problem to enable predictive analytics for automated notification and decision support. In this work, we address the problem of learning an irregular human activity pattern, fall, from streaming IoT data from wearable sensors. We present a deep neural network model for detecting fall based on accelerometer data giving 98.75 percent accuracy using an online physical activity monitoring dataset called "MobiAct", which was published by Vavoulas et al. The initial model was developed using IBM Watson studio and then later transferred and deployed on IBM Cloud with the streaming analytics service supported by IBM Streams for monitoring real-time IoT data. We also present the systems architecture of the real-time fall detection framework that we intend to use with Mbientlab's wearable health monitoring sensors for real time patient monitoring at retirement homes or rehabilitation clinics.
A Voice Controlled E-Commerce Web Application
Kandhari, Mandeep Singh, Zulkernine, Farhana, Isah, Haruna
Abstract-- Automatic voice-controlled systems have changed the way humans interact with a computer. Voice or speech recognition systems allow a user to make a hands-free request to the computer, which in turn processes the request and serves the user with appropriate responses. After years of research and developments in machine learning and artificial intelligence, today voice-controlled technologies have become more efficient and are widely applied in many domains to enable and improve human-tohuman andhuman-to-computer interactions. The state-of-the-art e-commerce applications with the help of web technologies offer interactive and user-friendly interfaces. However, there are some instances where people, especially with visual disabilities, are not able to fully experience the serviceability of such applications. A voice-controlled system embedded in a web application can enhance user experience and can provide voice as a means to control the functionality of e-commerce websites. In this paper, we propose a taxonomy of speech recognition systems (SRS) and present a voice-controlled commodity purchase e-commerce application using IBM Watson speech-to-text to demonstrate its usability. The prototype can be extended to other application scenarios such as government service kiosks and enable analytics of the converted text data for scenarios such as medical diagnosis at the clinics. I. INTRODUCTION Voice recognition is used interchangeably with speech recognition, however, voice recognition is primarily the task of determining the identity of a speaker rather than the content of the speaker's speech [1].
The use of Virtual Reality in Enhancing Interdisciplinary Research and Education
Leung, Tiffany, Zulkernine, Farhana, Isah, Haruna
Virtual Reality (VR) is increasingly being recognized for its educational potential and as an effective way to convey new knowledge to people, it supports interactive and collaborative activities. Affordable VR powered by mobile technologies is opening a new world of opportunities that can transform the ways in which we learn and engage with others. This paper reports our study regarding the application of VR in stimulating interdisciplinary communication. It investigates the promises of VR in interdisciplinary education and research. The main contributions of this study are (i) literature review of theories of learning underlying the justification of the use of VR systems in education, (ii) taxonomy of the various types and implementations of VR systems and their application in supporting education and research (iii) evaluation of educational applications of VR from a broad range of disciplines, (iv) investigation of how the learning process and learning outcomes are affected by VR systems, and (v) comparative analysis of VR and traditional methods of teaching in terms of quality of learning. This study seeks to inspire and inform interdisciplinary researchers and learners about the ways in which VR might support them and also VR software developers to push the limits of their craft.