xception model
Extending Information Bottleneck Attribution to Video Sequences
Solopova, Veronika, Schmidt, Lucas, Kolossa, Dorothea
We introduce VIBA, a novel approach for explainable video classification by adapting Information Bottlenecks for Attribution (IBA) to video sequences. While most traditional explainability methods are designed for image models, our IBA framework addresses the need for explainability in temporal models used for video analysis. To demonstrate its effectiveness, we apply VIBA to video deepfake detection, testing it on two architectures: the Xception model for spatial features and a VGG11-based model for capturing motion dynamics through optical flow. Using a custom dataset that reflects recent deepfake generation techniques, we adapt IBA to create relevance and optical flow maps, visually highlighting manipulated regions and motion inconsistencies. Our results show that VIBA generates temporally and spatially consistent explanations, which align closely with human annotations, thus providing interpretability for video classification and particularly for deepfake detection.
Comparative Analysis of Machine Learning Approaches for Bone Age Assessment: A Comprehensive Study on Three Distinct Models
R., Nandavardhan, R., Somanathan, Suresh, Vikram, P, Savaridassan
Radiologists and doctors make use of X-ray images of the non-dominant hands of children and infants to assess the possibility of genetic conditions and growth abnormalities. This is done by assessing the difference between the actual extent of growth found using the X-rays and the chronological age of the subject. The assessment was done conventionally using The Greulich Pyle (GP) or Tanner Whitehouse (TW) approach. These approaches require a high level of expertise and may often lead to observer bias. Hence, to automate the process of assessing the X-rays, and to increase its accuracy and efficiency, several machine learning models have been developed. These machine-learning models have several differences in their accuracy and efficiencies, leading to an unclear choice for the suitable model depending on their needs and available resources. Methods: In this study, we have analyzed the 3 most widely used models for the automation of bone age prediction, which are the Xception model, VGG model and CNN model. These models were trained on the preprocessed dataset and the accuracy was measured using the MAE in terms of months for each model. Using this, the comparison between the models was done. Results: The 3 models, Xception, VGG, and CNN models have been tested for accuracy and other relevant factors.
arcjetCV: an open-source software to analyze material ablation
Quintart, Alexandre, Haw, Magnus, Semeraro, Federico
arcjetCV is an open-source Python software designed to automate time-resolved measurements of heatshield material recession and recession rates from arcjet test video footage. This new automated and accessible capability greatly exceeds previous manual extraction methods, enabling rapid and detailed characterization of material recession for any sample with a profile video. arcjetCV automates the video segmentation process using machine learning models, including a one-dimensional (1D) Convolutional Neural Network (CNN) to infer the time-window of interest, a two-dimensional (2D) CNN for image and edge segmentation, and a Local Outlier Factor (LOF) for outlier filtering. A graphical user interface (GUI) simplifies the user experience and an application programming interface (API) allows users to call the core functions from scripts, enabling video batch processing. arcjetCV's capability to measure time-resolved recession in turn enables characterization of non-linear processes (shrinkage, swelling, melt flows, etc.), contributing to higher fidelity validation and improved modeling of heatshield material performance. The source code associated with this article can be found at https://github.com/magnus-haw/arcjetCV.
Leveraging Deep Learning and Xception Architecture for High-Accuracy MRI Classification in Alzheimer Diagnosis
Li, Shaojie, Qu, Haichen, Dong, Xinqi, Dang, Bo, Zang, Hengyi, Gong, Yulu
Exploring the application of deep learning technologies in the field of medical diagnostics, Magnetic Resonance Imaging (MRI) provides a unique perspective for observing and diagnosing complex neurodegenerative diseases such as Alzheimer Disease (AD). With advancements in deep learning, particularly in Convolutional Neural Networks (CNNs) and the Xception network architecture, we are now able to analyze and classify vast amounts of MRI data with unprecedented accuracy. The progress of this technology not only enhances our understanding of brain structural changes but also opens up new avenues for monitoring disease progression through non-invasive means and potentially allows for precise diagnosis in the early stages of the disease. This study aims to classify MRI images using deep learning models to identify different stages of Alzheimer Disease through a series of innovative data processing and model construction steps. Our experimental results show that the deep learning framework based on the Xception model achieved a 99.6% accuracy rate in the multi-class MRI image classification task, demonstrating its potential application value in assistive diagnosis. Future research will focus on expanding the dataset, improving model interpretability, and clinical validation to further promote the application of deep learning technology in the medical field, with the hope of bringing earlier diagnosis and more personalized treatment plans to Alzheimer Disease patients.
Enhancing Image Caption Generation Using Reinforcement Learning with Human Feedback
L, Adarsh N, P, Arun V, L, Aravindh N
Research on generative models to produce human-aligned / human-preferred outputs has seen significant recent contributions. Between text and image-generative models, we narrowed our focus to text-based generative models, particularly to produce captions for images that align with human preferences. In this research, we explored a potential method to amplify the performance of the Deep Neural Network Model to generate captions that are preferred by humans. This was achieved by integrating Supervised Learning and Reinforcement Learning with Human Feedback (RLHF) using the Flickr8k dataset. Also, a novel loss function that is capable of optimizing the model based on human feedback is introduced. In this paper, we provide a concise sketch of our approach and results, hoping to contribute to the ongoing advances in the field of human-aligned generative AI models.
Automated COVID-19 CT Image Classification using Multi-head Channel Attention in Deep CNN
Ghosh, Susmita, Chatterjee, Abhiroop
The rapid spread of COVID-19 has necessitated efficient and accurate diagnostic methods. Computed Tomography (CT) scan images have emerged as a valuable tool for detecting the disease. In this article, we present a novel deep learning approach for automated COVID-19 CT scan classification where a modified Xception model is proposed which incorporates a newly designed channel attention mechanism and weighted global average pooling to enhance feature extraction thereby improving classification accuracy. The channel attention module selectively focuses on informative regions within each channel, enabling the model to learn discriminative features for COVID-19 detection. Experiments on a widely used COVID-19 CT scan dataset demonstrate a very good accuracy of 96.99% and show its superiority to other state-of-the-art techniques. This research can contribute to the ongoing efforts in using artificial intelligence to combat current and future pandemics and can offer promising and timely solutions for efficient medical image analysis tasks.
Cdiscount's Image Classification Challenge
While the company already sells everything from TVs to trampolines, the list of products is still rapidly growing. This is up from 10 million products only 2 years ago. Ensuring that so many products are well classified is a challenging task. As these methods now seem close to their maximum potential, Cdiscount.com In this challenge, we are required to build a model that automatically classifies the products based on their images.
Deep convolutional surrogates and degrees of freedom in thermal design
Keramati, Hadi, Hamdullahpur, Feridun
We present surrogate models for heat transfer and pressure drop prediction of complex fin geometries generated using composite Bezier curves. Thermal design process includes iterative high fidelity simulation which is complex, computationally expensive, and time-consuming. With the advancement in machine learning algorithms as well as Graphics Processing Units (GPUs), we can utilize the parallel processing architecture of GPUs rather than solely relying on CPUs to accelerate the thermo-fluid simulation. In this study, Convolutional Neural Networks (CNNs) are used to predict results of Computational Fluid Dynamics (CFD) directly from topologies saved as images. The case with a single fin as well as multiple morphable fins are studied. A comparison of Xception network and regular CNN is presented for the case with a single fin design. Results show that high accuracy in prediction is observed for single fin design particularly using Xception network. Increasing design freedom to multiple fins increases the error in prediction. This error, however, remains within three percent for pressure drop and heat transfer estimation which is valuable for design purpose.
An Xceptional way of looking at CNN models
Since the dawn of BatchNormalization, data scientists both young and old have scoured the layers of neural networks searching for ways to improve their models' ability to identify the difference between a nose and chin. Or at least, this has been the focus of Convolutional Neural Networks (CNN) when image detection is performed on the human face. Using headshot images of people, can we as a society, create a model that can accurately tell us the different parts of our face the same way our brains do when we get up in the morning and look in a mirror? The answer to that question is a constantly evolving one but here we explore a number of different CNN architectures in an attempt to provide our own. So what is a CNN and how does it work?
Deep learning isn't hard anymore
This had the effect of bottlenecking deep learning, limiting it to the few projects that met those conditions. Over the last couple years, however, things have changed. The driver behind this growth is transfer learning. Transfer learning, broadly, is the idea that the knowledge accumulated in a model trained for a specific task--say, identifying flowers in a photo--can be transferred to another model to assist in making predictions for a different, related task--like identifying melanomas on someone's skin. Note: If you want a more technical dive into transfer learning, Sebastian Ruder has written a fantastic primer.