Calgary
A systematic study of race and sex bias in CNN-based cardiac MR segmentation
Lee, Tiarna, Puyol-Anton, Esther, Ruijsink, Bram, Shi, Miaojing, King, Andrew P.
In computer vision there has been significant research interest in assessing potential demographic bias in deep learning models. One of the main causes of such bias is imbalance in the training data. In medical imaging, where the potential impact of bias is arguably much greater, there has been less interest. In medical imaging pipelines, segmentation of structures of interest plays an important role in estimating clinical biomarkers that are subsequently used to inform patient management. Convolutional neural networks (CNNs) are starting to be used to automate this process. We present the first systematic study of the impact of training set imbalance on race and sex bias in CNN-based segmentation. We focus on segmentation of the structures of the heart from short axis cine cardiac magnetic resonance images, and train multiple CNN segmentation models with different levels of race/sex imbalance. We find no significant bias in the sex experiment but significant bias in two separate race experiments, highlighting the need to consider adequate representation of different demographic groups in health datasets.
Canada: twelve new AI projects and $50M investment for SCALE AI - Actu IA
Canadian supercluster based in Montreal, SCALE AI acts as an investment and innovation hub to accelerate the adoption and rapid integration of AI in Canada. This Monday, August 22, it unveiled twelve new projects aimed at optimizing production and transportation through AI. With the goal of addressing critical challenges currently facing supply chains, including the impact of the pandemic, labor shortages, and environmental requirements, it will provide $50 million in unprecedented financial support. Funded by the federal and Quebec governments, SCALE AI brings together the retail, manufacturing, transportation, infrastructure and information and communications technology (ICT) sectors to build smart supply chains. The supercluster has nearly 500 industrial partners, research institutions and other AI players with whom it develops programs to support investment projects by companies implementing concrete AI applications.
MALICE: Manipulation Attacks on Learned Image ComprEssion
Liu, Kang, Wu, Di, Wang, Yiru, Feng, Dan, Tan, Benjamin, Garg, Siddharth
Deep learning techniques have shown promising results in image compression, with competitive bitrate and image reconstruction quality from compressed latent. However, while image compression has progressed towards a higher peak signal-to-noise ratio (PSNR) and fewer bits per pixel (bpp), their robustness to adversarial images has never received deliberation. In this work, we, for the first time, investigate the robustness of image compression systems where imperceptible perturbation of input images can precipitate a significant increase in the bitrate of their compressed latent. To characterize the robustness of state-of-the-art learned image compression, we mount white-box and black-box attacks. Our white-box attack employs fast gradient sign method on the entropy estimation of the bitstream as its bitrate approximation. We propose DCT-Net simulating JPEG compression with architectural simplicity and lightweight training as the substitute in the black-box attack and enable fast adversarial transferability. Our results on six image compression models, each with six different bitrate qualities (thirty-six models in total), show that they are surprisingly fragile, where the white-box attack achieves up to 56.326x and black-box 1.947x bpp change. To improve robustness, we propose a novel compression architecture factorAtn which incorporates attention modules and a basic factorized entropy model, resulting in a promising trade-off between the rate-distortion performance and robustness to adversarial attacks that surpasses existing learned image compressors.
A Comparative Study of Speaker Role Identification in Air Traffic Communication Using Deep Learning Approaches
Guo, Dongyue, Zhang, Jianwei, Yang, Bo, Lin, Yi
Automatic spoken instruction understanding (SIU) of the controller-pilot conversations in the air traffic control (ATC) requires not only recognizing the words and semantics of the speech but also determining the role of the speaker. However, few of the published works on the automatic understanding systems in air traffic communication focus on speaker role identification (SRI). In this paper, we formulate the SRI task of controller-pilot communication as a binary classification problem. Furthermore, the text-based, speech-based, and speech and text based multi-modal methods are proposed to achieve a comprehensive comparison of the SRI task. To ablate the impacts of the comparative approaches, various advanced neural network architectures are applied to optimize the implementation of text-based and speech-based methods. Most importantly, a multi-modal speaker role identification network (MMSRINet) is designed to achieve the SRI task by considering both the speech and textual modality features. To aggregate modality features, the modal fusion module is proposed to fuse and squeeze acoustic and textual representations by modal attention mechanism and self-attention pooling layer, respectively. Finally, the comparative approaches are validated on the ATCSpeech corpus collected from a real-world ATC environment. The experimental results demonstrate that all the comparative approaches are worked for the SRI task, and the proposed MMSRINet shows the competitive performance and robustness than the other methods on both seen and unseen data, achieving 98.56%, and 98.08% accuracy, respectively.
The Unified Mathematical Framework for IMU Preintegration in Inertial-Aided Navigation System
Luo, Yarong, Liu, Yang, Guo, Chi, Liu, Jingnan
This paper proposes a unified mathematical framework for inertial measurement unit (IMU) preintegration in inertial-aided navigation system in different frames under different motion condition. The navigation state is precisely discretized as three parts: local increment, global state, and global increment. The global increment can be calculated in different frames such as local geodetic navigation frame and earth-centered-earth-fixed frame. The local increment which is referred as the IMU preintegration can be calculated under different assumptions according to the motion of the agent and the grade of the IMU. Thus, it more accurate and more convenient for online state estimation of inertial-integrated navigation system under different environment. Furthermore, the covariance propagation based on left perturbation is proposed for the first time, which is independent of the inputs of the gyroscope and accelerometer. Finally, we show the monotonicity of the uncertainty for determinant optimality criteria and R\'enyi entropy optimality criteria.
RealityTalk: Real-Time Speech-Driven Augmented Presentation for AR Live Storytelling
Liao, Jian, Karim, Adnan, Jadon, Shivesh, Kazi, Rubaiat Habib, Suzuki, Ryo
We present RealityTalk, a system that augments real-time live presentations with speech-driven interactive virtual elements. Augmented presentations leverage embedded visuals and animation for engaging and expressive storytelling. However, existing tools for live presentations often lack interactivity and improvisation, while creating such effects in video editing tools require significant time and expertise. RealityTalk enables users to create live augmented presentations with real-time speech-driven interactions. The user can interactively prompt, move, and manipulate graphical elements through real-time speech and supporting modalities. Based on our analysis of 177 existing video-edited augmented presentations, we propose a novel set of interaction techniques and then incorporated them into RealityTalk. We evaluate our tool from a presenter's perspective to demonstrate the effectiveness of our system.
A Discriminative Hierarchical PLDA-based Model for Spoken Language Recognition
Ferrer, Luciana, Castan, Diego, McLaren, Mitchell, Lawson, Aaron
Spoken language recognition (SLR) refers to the automatic process used to determine the language present in a speech sample. SLR is an important task in its own right, for example, as a tool to analyze or categorize large amounts of multi-lingual data. Further, it is also an essential tool for selecting downstream applications in a work flow, for example, to chose appropriate speech recognition or machine translation models. SLR systems are usually composed of two stages, one where an embedding representing the audio sample is extracted and a second one which computes the final scores for each language. In this work, we approach the SLR task as a detection problem and implement the second stage as a probabilistic linear discriminant analysis (PLDA) model. We show that discriminative training of the PLDA parameters gives large gains with respect to the usual generative training. Further, we propose a novel hierarchical approach where two PLDA models are trained, one to generate scores for clusters of highly-related languages and a second one to generate scores conditional to each cluster. The final language detection scores are computed as a combination of these two sets of scores. The complete model is trained discriminatively to optimize a cross-entropy objective. We show that this hierarchical approach consistently outperforms the non-hierarchical one for detection of highly related languages, in many cases by large margins. We train our systems on a collection of datasets including over 100 languages, and test them both on matched and mismatched conditions, showing that the gains are robust to condition mismatch.
SudhaLive as AISudha
I am going to speak with you using an AI voice because I have a sore throat. I had built a text to speech voice synthesizer as my college project in my Computer Science Engineering undergrads decades back. I am using Google WaveNet for text to speech. This is #SudhaLive my weekly livestream where I share opportunities in AI space -- jobs, fellowship, courses and my analysis of one topic from the AI world. Let's hear a female voice from India now.
Selective Self-Assembly using Re-Programmable Magnetic Pixels
Nisser, Martin, Makaram, Yashaswini, Faruqi, Faraz, Suzuki, Ryo, Mueller, Stefanie
This paper introduces a method to generate highly selective encodings that can be magnetically "programmed" onto physical modules to enable them to self-assemble in chosen configurations. We generate these encodings based on Hadamard matrices, and show how to design the faces of modules to be maximally attractive to their intended mate, while remaining maximally agnostic to other faces. We derive guarantees on these bounds, and verify their attraction and agnosticism experimentally. Using cubic modules whose faces have been covered in soft magnetic material, we show how inexpensive, passive modules with planar faces can be used to selectively self-assemble into target shapes without geometric guides. We show that these modules can be easily re-programmed for new target shapes using a CNC-based magnetic plotter, and demonstrate self-assembly of 8 cubes in a water tank.
Leveraging Smartphone Sensors for Detecting Abnormal Gait for Smart Wearable Mobile Technologies
Tasjid, Md Shahriar, Marouf, Ahmed Al
Walking is one of the most common modes of terrestrial locomotion for humans. Walking is essential for humans to perform most kinds of daily activities. When a person walks, there is a pattern in it, and it is known as gait. Gait analysis is used in sports and healthcare. We can analyze this gait in different ways, like using video captured by the surveillance cameras or depth image cameras in the lab environment. It also can be recognized by wearable sensors. e.g., accelerometer, force sensors, gyroscope, flexible goniometer, magneto resistive sensors, electromagnetic tracking system, force sensors, and electromyography (EMG). Analysis through these sensors required a lab condition, or users must wear these sensors. For detecting abnormality in gait action of a human, we need to incorporate the sensors separately. We can know about one's health condition by abnormal human gait after detecting it. Understanding a regular gait vs. abnormal gait may give insights to the health condition of the subject using the smart wearable technologies. Therefore, in this paper, we proposed a way to analyze abnormal human gait through smartphone sensors. Though smart devices like smartphones and smartwatches are used by most of the person nowadays. So, we can track down their gait using sensors of these intelligent wearable devices.