Goto

Collaborating Authors

 Calgary


A.I. Breakthrough Could Disrupt the $11 Trillion Medical Sector

#artificialintelligence

A massive disruption now appears imminent in one of the world's largest – and most important – industries. In much the same way that Amazon disrupted the retail business – and how PayPal disrupted the payments industry – one under-the-radar health technology company now seeks to transform the $11.85 trillion global health industry. By moving healthcare away from brick and mortar, traditional medicine into an AI-driven tool that offers unprecedented speed, efficiency, and accuracy... Investors still have a brief window of opportunity to get in on this transformational investment opportunity while it still flies beneath Wall Street's radar. But as you'll soon discover, this company's technology is so powerful that it could become a valuable addition to hundreds of millions of households worldwide. Whether most patients, providers, or large healthcare companies realize it or not, the healthcare industry is already in the early stages of significant change. That's because patients now desire access to more information – and better information – in the blink of an eye. In a recent survey of U.S. health consumers, 71% reported facing major frustrations through their experience with healthcare providers. Concerns ranged from difficulties scheduling appointments to impersonal visits.


AES Are Both Overstable And Oversensitive: Explaining Why And Proposing Defenses

arXiv.org Artificial Intelligence

Deep-learning based Automatic Essay Scoring (AES) systems are being actively used by states and language testing agencies alike to evaluate millions of candidates for life-changing decisions ranging from college applications to visa approvals. However, little research has been put to understand and interpret the black-box nature of deep-learning based scoring algorithms. Previous studies indicate that scoring models can be easily fooled. In this paper, we explore the reason behind their surprising adversarial brittleness. We utilize recent advances in interpretability to find the extent to which features such as coherence, content, vocabulary, and relevance are important for automated scoring mechanisms. We use this to investigate the oversensitivity i.e., large change in output score with a little change in input essay content) and overstability i.e., little change in output scores with large changes in input essay content) of AES. Our results indicate that autoscoring models, despite getting trained as "end-to-end" models with rich contextual embeddings such as BERT, behave like bag-of-words models. A few words determine the essay score without the requirement of any context making the model largely overstable. This is in stark contrast to recent probing studies on pre-trained representation learning models, which show that rich linguistic features such as parts-of-speech and morphology are encoded by them. Further, we also find that the models have learnt dataset biases, making them oversensitive. To deal with these issues, we propose detection-based protection models that can detect oversensitivity and overstability causing samples with high accuracies. We find that our proposed models are able to detect unusual attribution patterns and flag adversarial samples successfully.


A Survey on Cost Types, Interaction Schemes, and Annotator Performance Models in Selection Algorithms for Active Learning in Classification

arXiv.org Machine Learning

Pool-based active learning (AL) aims to optimize the annotation process (i.e., labeling) as the acquisition of annotations is often time-consuming and therefore expensive. For this purpose, an AL strategy queries annotations intelligently from annotators to train a high-performance classification model at a low annotation cost. Traditional AL strategies operate in an idealized framework. They assume a single, omniscient annotator who never gets tired and charges uniformly regardless of query difficulty. However, in real-world applications, we often face human annotators, e.g., crowd or in-house workers, who make annotation mistakes and can be reluctant to respond if tired or faced with complex queries. Recently, a wide range of novel AL strategies has been proposed to address these issues. They differ in at least one of the following three central aspects from traditional AL: (1) They explicitly consider (multiple) human annotators whose performances can be affected by various factors, such as missing expertise. (2) They generalize the interaction with human annotators by considering different query and annotation types, such as asking an annotator for feedback on an inferred classification rule. (3) They take more complex cost schemes regarding annotations and misclassifications into account. This survey provides an overview of these AL strategies and refers to them as real-world AL. Therefore, we introduce a general real-world AL strategy as part of a learning cycle and use its elements, e.g., the query and annotator selection algorithm, to categorize about 60 real-world AL strategies. Finally, we outline possible directions for future research in the field of AL.


What BERT Based Language Models Learn in Spoken Transcripts: An Empirical Study

arXiv.org Artificial Intelligence

Language Models (LMs) have been ubiquitously leveraged in various tasks including spoken language understanding (SLU). Spoken language requires careful understanding of speaker interactions, dialog states and speech induced multimodal behaviors to generate a meaningful representation of the conversation. In this work, we propose to dissect SLU into three representative properties:conversational (disfluency, pause, overtalk), channel (speaker-type, turn-tasks) and ASR (insertion, deletion,substitution). We probe BERT based language models (BERT, RoBERTa) trained on spoken transcripts to investigate its ability to understand multifarious properties in absence of any speech cues. Empirical results indicate that LM is surprisingly good at capturing conversational properties such as pause prediction and overtalk detection from lexical tokens. On the downsides, the LM scores low on turn-tasks and ASR errors predictions. Additionally, pre-training the LM on spoken transcripts restrain its linguistic understanding. Finally, we establish the efficacy and transferability of the mentioned properties on two benchmark datasets: Switchboard Dialog Act and Disfluency datasets.


Dynamic and Systematic Survey of Deep Learning Approaches for Driving Behavior Analysis

arXiv.org Artificial Intelligence

Improper driving results in fatalities, damages, increased energy consumptions, and depreciation of the vehicles. Analyzing driving behaviour could lead to optimize and avoid mentioned issues. By identifying the type of driving and mapping them to the consequences of that type of driving, we can get a model to prevent them. In this regard, we try to create a dynamic survey paper to review and present driving behaviour survey data for future researchers in our research. By analyzing 58 articles, we attempt to classify standard methods and provide a framework for future articles to be examined and studied in different dashboards and updated about trends.


Algebraic Semantics of Generalized RIFs

arXiv.org Artificial Intelligence

A number of numeric measures like rough inclusion functions (RIFs) are used in general rough sets and soft computing. But these are often intrusive by definition, and amount to making unjustified assumptions about the data. The contamination problem is also about recognizing the domains of discourses involved in this, specifying errors and reducing data intrusion relative to them. In this research, weak quasi rough inclusion functions (wqRIFs) are generalized to general granular operator spaces with scope for limiting contamination. New algebraic operations are defined over collections of such functions, and are studied by the present author. It is shown by her that the algebras formed by the generalized wqRIFs are ordered hemirings with additional operators. By contrast, the generalized rough inclusion functions lack similar structure. This potentially contributes to improving the selection (possibly automatic) of such functions, training methods, and reducing contamination (and data intrusion) in applications. The underlying framework and associated concepts are explained in some detail, as they are relatively new.


BeamTransformer: Microphone Array-based Overlapping Speech Detection

arXiv.org Artificial Intelligence

We propose BeamTransformer, an efficient architecture to leverage beamformer's edge in spatial filtering and transformer's capability in context sequence modeling. BeamTransformer seeks to optimize modeling of sequential relationship among signals from different spatial direction. Overlapping speech detection is one of the tasks where such optimization is favorable. In this paper we effectively apply BeamTransformer to detect overlapping segments. Comparing to single-channel approach, BeamTransformer exceeds in learning to identify the relationship among different beam sequences and hence able to make predictions not only from the acoustic signals but also the localization of the source. The results indicate that a successful incorporation of microphone array signals can lead to remarkable gains. Moreover, BeamTransformer takes one step further, as speech from overlapped speakers have been internally separated into different beams.


OdoNet: Untethered Speed Aiding for Vehicle Navigation Without Hardware Wheeled Odometer

arXiv.org Artificial Intelligence

Abstract--Odometer has been proven to significantly improve the accuracy of the Global Navigation Satellite System / Inertial Navigation System (GNSS/INS) integrated vehicle navigation in GNSS-challenged environments. However, the odometer is inaccessible in many applications, especially for aftermarket devices. To apply forward speed aiding without hardware wheeled odometer, we propose OdoNet, an untethered one-dimensional Convolution Neural Network (CNN)-based pseudo-odometer model learning from a single Inertial Measurement Unit (IMU), which can act as an alternative to the wheeled odometer. Dedicated experiments have been conducted to verify the feasibility and robustness of the OdoNet. The results indicate that the IMU individuality, the vehicle loads, and the road conditions have little impact on the robustness and precision of the OdoNet, while the IMU biases and the mounting angles may notably ruin the OdoNet. Thus, a data-cleaning procedure is added to effectively mitigate the impacts of the IMU biases and the mounting angles. Compared to the process using only non-holonomic constraint (NHC), after employing the pseudo-odometer, the positioning error is reduced by around 68%, while the percentage is around 74% for the hardware wheeled odometer. In conclusion, the proposed OdoNet can be employed as an untethered pseudo-odometer for vehicle navigation, which can efficiently improve the accuracy and reliability of the positioning in GNSS-denied environments. Inertial measurement units (IMU) can work I. Inertial Navigation System) integrated navigation system usability of the integrated navigation system. Due to lower cost can provide full navigation parameters, including position, and lower power consumption, low-grade MEMS velocity, and attitude, and thus has been widely used in land (Micro-Electro-Mechanical System) IMU has been widely vehicles. With the wide establishment of the ground-based applied to vehicle navigation.


An Empirical Study on End-to-End Singing Voice Synthesis with Encoder-Decoder Architectures

arXiv.org Artificial Intelligence

With the rapid development of neural network architectures and speech processing models, singing voice synthesis with neural networks is becoming the cutting-edge technique of digital music production. In this work, in order to explore how to improve the quality and efficiency of singing voice synthesis, in this work, we use encoder-decoder neural models and a number of vocoders to achieve singing voice synthesis. We conduct experiments to demonstrate that the models can be trained using voice data with pitch information, lyrics and beat information, and the trained models can produce smooth, clear and natural singing voice that is close to real human voice. As the models work in the end-to-end manner, they allow users who are not domain experts to directly produce singing voice by arranging pitches, lyrics and beats.


On the Exploitability of Audio Machine Learning Pipelines to Surreptitious Adversarial Examples

arXiv.org Artificial Intelligence

Machine learning (ML) models are known to be vulnerable to adversarial examples. Applications of ML to voice biometrics authentication are no exception. Yet, the implications of audio adversarial examples on these real-world systems remain poorly understood given that most research targets limited defenders who can only listen to the audio samples. Conflating detectability of an attack with human perceptibility, research has focused on methods that aim to produce imperceptible adversarial examples which humans cannot distinguish from the corresponding benign samples. We argue that this perspective is coarse for two reasons: 1. Imperceptibility is impossible to verify; it would require an experimental process that encompasses variations in listener training, equipment, volume, ear sensitivity, types of background noise etc, and 2. It disregards pipeline-based detection clues that realistic defenders leverage. This results in adversarial examples that are ineffective in the presence of knowledgeable defenders. Thus, an adversary only needs an audio sample to be plausible to a human. We thus introduce surreptitious adversarial examples, a new class of attacks that evades both human and pipeline controls. In the white-box setting, we instantiate this class with a joint, multi-stage optimization attack. Using an Amazon Mechanical Turk user study, we show that this attack produces audio samples that are more surreptitious than previous attacks that aim solely for imperceptibility. Lastly we show that surreptitious adversarial examples are challenging to develop in the black-box setting.