Kollias, Dimitrios
Can Machine Learning Assist in Diagnosis of Primary Immune Thrombocytopenia? A feasibility study
Miah, Haroon, Kollias, Dimitrios, Pedone, Giacinto Luca, Provan, Drew, Chen, Frederick
Primary Immune thrombocytopenia (ITP) is a rare autoimmune disease characterised by immune-mediated destruction of peripheral blood platelets in patients leading to low platelet counts and bleeding. The diagnosis and effective management of ITP is challenging because there is no established test to confirm the disease and no biomarker with which one can predict the response to treatment and outcome. In this work we conduct a feasibility study to check if machine learning can be applied effectively for diagnosis of ITP using routine blood tests and demographic data in a non-acute outpatient setting. Various ML models, including Logistic Regression, Support Vector Machine, k-Nearest Neighbor, Decision Tree and Random Forest, were applied to data from the UK Adult ITP Registry and a general hematology clinic. Two different approaches were investigated: a demographic-unaware and a demographic-aware one. We conduct extensive experiments to evaluate the predictive performance of these models and approaches, as well as their bias. The results revealed that Decision Tree and Random Forest models were both superior and fair, achieving nearly perfect predictive and fairness scores, with platelet count identified as the most significant variable. Models not provided with demographic information performed better in terms of predictive accuracy but showed lower fairness score, illustrating a trade-off between predictive performance and fairness.
Bridging the Gap: Protocol Towards Fair and Consistent Affect Analysis
Hu, Guanyu, Papadopoulou, Eleni, Kollias, Dimitrios, Tzouveli, Paraskevi, Wei, Jie, Yang, Xinyu
The increasing integration of machine learning algorithms in daily life underscores the critical need for fairness and equity in their deployment. As these technologies play a pivotal role in decision-making, addressing biases across diverse subpopulation groups, including age, gender, and race, becomes paramount. Automatic affect analysis, at the intersection of physiology, psychology, and machine learning, has seen significant development. However, existing databases and methodologies lack uniformity, leading to biased evaluations. This work addresses these issues by analyzing six affective databases, annotating demographic attributes, and proposing a common protocol for database partitioning. Emphasis is placed on fairness in evaluations. Extensive experiments with baseline and state-of-the-art methods demonstrate the impact of these changes, revealing the inadequacy of prior assessments. The findings underscore the importance of considering demographic attributes in affect analysis research and provide a foundation for more equitable methodologies. Our annotations, code and pre-trained models are available at: https://github.com/dkollias/Fair-Consistent-Affect-Analysis
Ensuring UAV Safety: A Vision-only and Real-time Framework for Collision Avoidance Through Object Detection, Tracking, and Distance Estimation
Karampinis, Vasileios, Arsenos, Anastasios, Filippopoulos, Orfeas, Petrongonas, Evangelos, Skliros, Christos, Kollias, Dimitrios, Kollias, Stefanos, Voulodimos, Athanasios
In the last twenty years, unmanned aerial vehicles (UAVs) have garnered growing interest due to their expanding applications in both military and civilian domains. Detecting non-cooperative aerial vehicles with efficiency and estimating collisions accurately are pivotal for achieving fully autonomous aircraft and facilitating Advanced Air Mobility (AAM). This paper presents a deep-learning framework that utilizes optical sensors for the detection, tracking, and distance estimation of non-cooperative aerial vehicles. In implementing this comprehensive sensing framework, the availability of depth information is essential for enabling autonomous aerial vehicles to perceive and navigate around obstacles. In this work, we propose a method for estimating the distance information of a detected aerial object in real time using only the input of a monocular camera. In order to train our deep learning components for the object detection, tracking and depth estimation tasks we utilize the Amazon Airborne Object Tracking (AOT) Dataset. In contrast to previous approaches that integrate the depth estimation module into the object detector, our method formulates the problem as image-to-image translation. We employ a separate lightweight encoder-decoder network for efficient and robust depth estimation. In a nutshell, the object detection module identifies and localizes obstacles, conveying this information to both the tracking module for monitoring obstacle movement and the depth estimation module for calculating distances. Our approach is evaluated on the Airborne Object Tracking (AOT) dataset which is the largest (to the best of our knowledge) air-to-air airborne object dataset.
CUE-Net: Violence Detection Video Analytics with Spatial Cropping, Enhanced UniformerV2 and Modified Efficient Additive Attention
Senadeera, Damith Chamalke, Yang, Xiaoyun, Kollias, Dimitrios, Slabaugh, Gregory
To respond to the challenge of efficient, automated violence detection from video, effective computer vision In this paper we introduce CUE-Net, a novel architecture methods are required. Deep learning techniques such designed for automated violence detection in video as Convolutional Neural Networks (CNNs) and more recently surveillance. As surveillance systems become more prevalent Transformer-based architectures have shown a great due to technological advances and decreasing costs, promise in solving computer vision related automated violence the challenge of efficiently monitoring vast amounts of detection [21, 22, 31]. The success of violence detection video data has intensified. CUE-Net addresses this challenge is highly dependent on the objects and people present by combining spatial Cropping with an enhanced version in the captured videos [22, 31]. Detection is difficult when of the UniformerV2 architecture, integrating convolutional the relevant features of the violent incidents are not captured and self-attention mechanisms alongside a novel properly, for example when the people involved in the Modified Efficient Additive Attention mechanism (which reduces violent incident are far away and occupy only a small part the quadratic time complexity of self-attention) to of the frame, as seen in one of the example videos from effectively and efficiently identify violent activities.
FaceRNET: a Facial Expression Intensity Estimation Network
Kollias, Dimitrios, Psaroudakis, Andreas, Arsenos, Anastasios, Theofilou, Paraskevi
This paper presents our approach for Facial Expression Intensity Estimation from videos. It includes two components: i) a representation extractor network that extracts various emotion descriptors (valence-arousal, action units and basic expressions) from each videoframe; ii) a RNN that captures temporal information in the data, followed by a mask layer which enables handling varying input video lengths through dynamic routing. This approach has been tested on the Hume-Reaction dataset yielding excellent results.
BTDNet: a Multi-Modal Approach for Brain Tumor Radiogenomic Classification
Kollias, Dimitrios, Vendal, Karanjot, Gadhavi, Priyanka, Russom, Solomon
Brain tumors pose significant health challenges worldwide, with glioblastoma being one of the most aggressive forms. Accurate determination of the O6-methylguanine-DNA methyltransferase (MGMT) promoter methylation status is crucial for personalized treatment strategies. However, traditional methods are labor-intensive and time-consuming. This paper proposes a novel multi-modal approach, BTDNet, leveraging multi-parametric MRI scans, including FLAIR, T1w, T1wCE, and T2 3D volumes, to predict MGMT promoter methylation status. BTDNet addresses two main challenges: the variable volume lengths (i.e., each volume consists of a different number of slices) and the volume-level annotations (i.e., the whole 3D volume is annotated and not the independent slices that it consists of). BTDNet consists of four components: i) the data augmentation one (that performs geometric transformations, convex combinations of data pairs and test-time data augmentation); ii) the 3D analysis one (that performs global analysis through a CNN-RNN); iii) the routing one (that contains a mask layer that handles variable input feature lengths), and iv) the modality fusion one (that effectively enhances data representation, reduces ambiguities and mitigates data scarcity). The proposed method outperforms by large margins the state-of-the-art methods in the RSNA-ASNR-MICCAI BraTS 2021 Challenge, offering a promising avenue for enhancing brain tumor diagnosis and treatment.
ABAW: Valence-Arousal Estimation, Expression Recognition, Action Unit Detection & Emotional Reaction Intensity Estimation Challenges
Kollias, Dimitrios, Tzirakis, Panagiotis, Baird, Alice, Cowen, Alan, Zafeiriou, Stefanos
The fifth Affective Behavior Analysis in-the-wild (ABAW) Competition is part of the respective ABAW Workshop which will be held in conjunction with IEEE Computer Vision and Pattern Recognition Conference (CVPR), 2023. The 5th ABAW Competition is a continuation of the Competitions held at ECCV 2022, IEEE CVPR 2022, ICCV 2021, IEEE FG 2020 and CVPR 2017 Conferences, and is dedicated at automatically analyzing affect. For this year's Competition, we feature two corpora: i) an extended version of the Aff-Wild2 database and ii) the Hume-Reaction dataset. The former database is an audiovisual one of around 600 videos of around 3M frames and is annotated with respect to:a) two continuous affect dimensions -valence (how positive/negative a person is) and arousal (how active/passive a person is)-; b) basic expressions (e.g. happiness, sadness, neutral state); and c) atomic facial muscle actions (i.e., action units). The latter dataset is an audiovisual one in which reactions of individuals to emotional stimuli have been annotated with respect to seven emotional expression intensities. Thus the 5th ABAW Competition encompasses four Challenges: i) uni-task Valence-Arousal Estimation, ii) uni-task Expression Classification, iii) uni-task Action Unit Detection, and iv) Emotional Reaction Intensity Estimation. In this paper, we present these Challenges, along with their corpora, we outline the evaluation metrics, we present the baseline systems and illustrate their obtained performance.
A Deep Neural Architecture for Harmonizing 3-D Input Data Analysis and Decision Making in Medical Imaging
Kollias, Dimitrios, Arsenos, Anastasios, Kollias, Stefanos
Such applications are, for example, 3-D chest CT scan analysis for pneumonia, COVID-19, or Lung cancer diagnosis [1], [2]; 3-D magnetic resonance image (MRI) analysis for Parkinson's, or Alzheimer's disease prediction [3], [4]; 3-D subject's movement analysis for action recognition & Parkinson's detection [5]; analysis of audiovisual data showing subject's behaviour for affect recognition [6]; anomaly detection in nuclear power plants [7]. Dealing with a single annotation per volumetric input and harmonizing the input variable length constitutes a significant problem when training Deep Neural Networks (DNNs) to perform the respective prediction, or classification task. Furthermore, in each of the above application fields, public, or private datasets are produced in different environments and contexts and are used to train deep learning systems to successfully perform the respective tasks. Extensive research is currently made on using data-driven knowledge, extracted from a single, or from multiple datasets, so as to deal with other datasets. Transfer learning, domain adaptation, meta-learning, domain generalization, continual or life long learning are specific topics of this research, based on different conditions related to the considered datasets [8].
Affect Analysis in-the-wild: Valence-Arousal, Expressions, Action Units and a Unified Framework
Kollias, Dimitrios, Zafeiriou, Stefanos
Affect recognition based on subjects' facial expressions has been a topic of major research in the attempt to generate machines that can understand the way subjects feel, act and react. In the past, due to the unavailability of large amounts of data captured in real-life situations, research has mainly focused on controlled environments. However, recently, social media and platforms have been widely used. Moreover, deep learning has emerged as a means to solve visual analysis and recognition problems. This paper exploits these advances and presents significant contributions for affect analysis and recognition in-the-wild. Affect analysis and recognition can be seen as a dual knowledge generation problem, involving: i) creation of new, large and rich in-the-wild databases and ii) design and training of novel deep neural architectures that are able to analyse affect over these databases and to successfully generalise their performance on other datasets. The paper focuses on large in-the-wild databases, i.e., Aff-Wild and Aff-Wild2 and presents the design of two classes of deep neural networks trained with these databases. The first class refers to uni-task affect recognition, focusing on prediction of the valence and arousal dimensional variables. The second class refers to estimation of all main behavior tasks, i.e. valence-arousal prediction; categorical emotion classification in seven basic facial expressions; facial Action Unit detection. A novel multi-task and holistic framework is presented which is able to jointly learn and effectively generalize and perform affect recognition over all existing in-the-wild databases. Large experimental studies illustrate the achieved performance improvement over the existing state-of-the-art in affect recognition. HIS paper presents recent developments and research directions in affective behavior analysis in-the-wild, which is a major targeted characteristic of human computer interaction systems in real life applications. Such systems, machines and robots, should be able to automatically sense and interpret facial and audio-visual signals relevant to emotions, appraisals and intentions; thus, being able to interact in a'human-centered' and engaging manner with people, as their digital assistants in the home, work, operational or industrial environment. Through human affect recognition, the reactions of the machine, or robot, will be consistent with people's expectations and emotions; their verbal and non-verbal interactions will be positively received by humans. Moreover, this interaction should not be dependent on the respective context, nor the human's age, sex, ethnicity, educational level, profession, or social position. As a consequence, the development of intelligent systems able to analyze human behavior in-the-wild can contribute to generation of trust, understanding and closeness between humans and machines in real life environments.
Image Generation and Recognition (Emotions)
Carlsson, Hanne, Kollias, Dimitrios
Generative Adversarial Networks (GANs) were proposed in 2014 by Goodfellow et al., and have since been extended into multiple computer vision applications. This report provides a thorough survey of recent GAN research, outlining the various architectures and applications, as well as methods for training GANs and dealing with latent space. This is followed by a discussion of potential areas for future GAN research, including: evaluating GANs, better understanding GANs, and techniques for training GANs. The second part of this report outlines the compilation of a dataset of images `in the wild' representing each of the 7 basic human emotions, and analyses experiments done when training a StarGAN on this dataset combined with the FER2013 dataset.