Goto

Collaborating Authors

 Calgary


ALMOST: Adversarial Learning to Mitigate Oracle-less ML Attacks via Synthesis Tuning

arXiv.org Artificial Intelligence

Oracle-less machine learning (ML) attacks have broken various logic locking schemes. Regular synthesis, which is tailored for area-power-delay optimization, yields netlists where key-gate localities are vulnerable to learning. Thus, we call for security-aware logic synthesis. We propose ALMOST, a framework for adversarial learning to mitigate oracle-less ML attacks via synthesis tuning. ALMOST uses a simulated-annealing-based synthesis recipe generator, employing adversarially trained models that can predict state-of-the-art attacks' accuracies over wide ranges of recipes and key-gate localities. Experiments on ISCAS benchmarks confirm the attacks' accuracies drops to around 50\% for ALMOST-synthesized circuits, all while not undermining design optimization.


Design and Fabrication of a Fiber Bragg Grating Shape Sensor for Shape Reconstruction of a Continuum Manipulator

arXiv.org Artificial Intelligence

Continuum dexterous manipulators (CDMs) are suitable for performing tasks in a constrained environment due to their high dexterity and maneuverability. Despite the inherent advantages of CDMs in minimally invasive surgery, real-time control of CDMs' shape during non-constant curvature bending is still challenging. This study presents a novel approach for the design and fabrication of a large deflection fiber Bragg grating (FBG) shape sensor embedded within the lumens inside the walls of a CDM with a large instrument channel. The shape sensor consisted of two fibers, each with three FBG nodes. A shape-sensing model was introduced to reconstruct the centerline of the CDM based on FBG wavelengths. Different experiments, including shape sensor tests and CDM shape reconstruction tests, were conducted to assess the overall accuracy of the shape sensing. The FBG sensor evaluation results revealed the linear curvature-wavelength relationship with the large curvature detection of 0.045 mm at a 90 degrees bending angle and a sensitivity of up to 5.50 nm/mm in each bending direction. The CDM's shape reconstruction experiments in a free environment demonstrated the shape tracking accuracy of 0.216+-0.126 mm for positive/negative deflections. Also, the CDM shape reconstruction error for three cases of bending with obstacles were observed to be 0.436+-0.370 mm for the proximal case, 0.485+-0.418 mm for the middle case, and 0.312+-0.261 mm for the distal case. This study indicates the adequate performance of the FBG sensor and the effectiveness of the model for tracking the shape of the large-deflection CDM with nonconstant-curvature bending for minimally-invasive orthopaedic applications.


Learning General Audio Representations with Large-Scale Training of Patchout Audio Transformers

arXiv.org Artificial Intelligence

The success of supervised deep learning methods is largely due to their ability to learn relevant features from raw data. Deep Neural Networks (DNNs) trained on large-scale datasets are capable of capturing a diverse set of features, and learning a representation that can generalize onto unseen tasks and datasets that are from the same domain. Hence, these models can be used as powerful feature extractors, in combination with shallower models as classifiers, for smaller tasks and datasets where the amount of training data is insufficient for learning an end-to-end model from scratch. During the past years, Convolutional Neural Networks (CNNs) have largely been the method of choice for audio processing. However, recently attention-based transformer models have demonstrated great potential in supervised settings, outperforming CNNs. In this work, we investigate the use of audio transformers trained on large-scale datasets to learn general-purpose representations. We study how the different setups in these audio transformers affect the quality of their embeddings. We experiment with the models' time resolution, extracted embedding level, and receptive fields in order to see how they affect performance on a variety of tasks and datasets, following the HEAR 2021 NeurIPS challenge evaluation setup. Our results show that representations extracted by audio transformers outperform CNN representations. Furthermore, we will show that transformers trained on Audioset can be extremely effective representation extractors for a wide range of downstream tasks.


End-to-End Speech Recognition: A Survey

arXiv.org Artificial Intelligence

Within components (models, knowledge sources) of an ASR system the classical approach, deep learning has been introduced before coming to a decision. This is in line with Bayes' to acoustic and language modeling. In acoustic modeling, decision rule, which exactly requires a single global decision deep learning replaced Gaussian mixture distributions (hybrid integrating all available knowledge sources. HMM [3], [4]) or augmented the acoustic feature set c) Joint Training: In terms of model training, E2E suggests (nonlinear disciminant/tandem approach [5], [6]). In language estimating all parameters of all components of a model modeling, deep learning replaced count-based approaches [7], jointly using a single objective function that is consistent with [8], [9]. However, when introducing deep learning, the classical the task at hand, which in case of ASR means minimizing the ASR architecture was not yet touched. Classical stateof-the-art expected word error rate. ASR systems today are composed of many separate d) Training Data: Joint training of an integrated model components and knowledge sources, especially speech signal implies using a single kind of training data, which in case preprocessing, methods for robustness w.r.t.


SF2Former: Amyotrophic Lateral Sclerosis Identification From Multi-center MRI Data Using Spatial and Frequency Fusion Transformer

arXiv.org Artificial Intelligence

Amyotrophic Lateral Sclerosis (ALS) is a complex neurodegenerative disorder involving motor neuron degeneration. Significant research has begun to establish brain magnetic resonance imaging (MRI) as a potential biomarker to diagnose and monitor the state of the disease. Deep learning has turned into a prominent class of machine learning programs in computer vision and has been successfully employed to solve diverse medical image analysis tasks. However, deep learning-based methods applied to neuroimaging have not achieved superior performance in ALS patients classification from healthy controls due to having insignificant structural changes correlated with pathological features. Therefore, the critical challenge in deep models is to determine useful discriminative features with limited training data. By exploiting the long-range relationship of image features, this study introduces a framework named SF2Former that leverages vision transformer architecture's power to distinguish the ALS subjects from the control group. To further improve the network's performance, spatial and frequency domain information are combined because MRI scans are captured in the frequency domain before being converted to the spatial domain. The proposed framework is trained with a set of consecutive coronal 2D slices, which uses the pre-trained weights on ImageNet by leveraging transfer learning. Finally, a majority voting scheme has been employed to those coronal slices of a particular subject to produce the final classification decision. Our proposed architecture has been thoroughly assessed with multi-modal neuroimaging data using two well-organized versions of the Canadian ALS Neuroimaging Consortium (CALSNIC) multi-center datasets. The experimental results demonstrate the superiority of our proposed strategy in terms of classification accuracy compared with several popular deep learning-based techniques.


Memory-efficient model-based deep learning with convergence and robustness guarantees

arXiv.org Artificial Intelligence

Computational imaging has been revolutionized by compressed sensing algorithms, which offer guaranteed uniqueness, convergence, and stability properties. Model-based deep learning methods that combine imaging physics with learned regularization priors have emerged as more powerful alternatives for image recovery. The main focus of this paper is to introduce a memory efficient model-based algorithm with similar theoretical guarantees as CS methods. The proposed iterative algorithm alternates between a gradient descent involving the score function and a conjugate gradient algorithm to encourage data consistency. The score function is modeled as a monotone convolutional neural network. Our analysis shows that the monotone constraint is necessary and sufficient to enforce the uniqueness of the fixed point in arbitrary inverse problems. In addition, it also guarantees the convergence to a fixed point, which is robust to input perturbations. We introduce two implementations of the proposed MOL framework, which differ in the way the monotone property is imposed. The first approach enforces a strict monotone constraint, while the second one relies on an approximation. The guarantees are not valid for the second approach in the strict sense. However, our empirical studies show that the convergence and robustness of both approaches are comparable, while the less constrained approximate implementation offers better performance. The proposed deep equilibrium formulation is significantly more memory efficient than unrolled methods, which allows us to apply it to 3D or 2D+time problems that current unrolled algorithms cannot handle.


Autoencoders as Pattern Filters

arXiv.org Artificial Intelligence

An autoencoder is a feed-forward Deep Neural Network (DNN) consisting of an encoder and a decoder, trained to reproduce its input at the output layer (Figure 1) [1], [2]. Autoencoders are used in a large variety of applications: dimensionality reduction, feature extraction, image denoising, imputing missing data etc. Depending on the dimensionality of the hidden layer we can distinguish two types of autoencoders: Undercomplete: the hidden layer has a lower dimension than the input/output layers. Overcomplete: the hidden layer has a higher dimension than the input/output layers. In general, the dimensionality of the hidden layer should be different than the dimensionality of the input/output layers in order to avoid learning an identity data transformation. Undercomplete autoencoders are typically used in unsupervised learning tasks, such as: dimensionality reduction, feature learning, and generative models. The encoder is generally used to learn a lower dimensional latent representation of the input samples, performing an efficient compression through non-linear transformations. In the same time, the decoder learns how to reconstruct the input samples from this latent compressed representation.


Automatic Classification of Symmetry of Hemithoraces in Canine and Feline Radiographs

arXiv.org Artificial Intelligence

Purpose: Thoracic radiographs are commonly used to evaluate patients with confirmed or suspected thoracic pathology. Proper patient positioning is more challenging in canine and feline radiography than in humans due to less patient cooperation and body shape variation. Improper patient positioning during radiograph acquisition has the potential to lead to a misdiagnosis. Asymmetrical hemithoraces are one of the indications of obliquity for which we propose an automatic classification method. Approach: We propose a hemithoraces segmentation method based on Convolutional Neural Networks (CNNs) and active contours. We utilized the U-Net model to segment the ribs and spine and then utilized active contours to find left and right hemithoraces. We then extracted features from the left and right hemithoraces to train an ensemble classifier which includes Support Vector Machine, Gradient Boosting and Multi-Layer Perceptron. Five-fold cross-validation was used, thorax segmentation was evaluated by Intersection over Union (IoU), and symmetry classification was evaluated using Precision, Recall, Area under Curve and F1 score. Results: Classification of symmetry for 900 radiographs reported an F1 score of 82.8% . To test the robustness of the proposed thorax segmentation method to underexposure and overexposure, we synthetically corrupted properly exposed radiographs and evaluated results using IoU. The results showed that the models IoU for underexposure and overexposure dropped by 2.1% and 1.2%, respectively. Conclusions: Our results indicate that the proposed thorax segmentation method is robust to poor exposure radiographs. The proposed thorax segmentation method can be applied to human radiography with minimal changes.


Filterbank Learning for Noise-Robust Small-Footprint Keyword Spotting

arXiv.org Artificial Intelligence

In the context of keyword spotting (KWS), the replacement of handcrafted speech features by learnable features has not yielded superior KWS performance. In this study, we demonstrate that filterbank learning outperforms handcrafted speech features for KWS whenever the number of filterbank channels is severely decreased. Reducing the number of channels might yield certain KWS performance drop, but also a substantial energy consumption reduction, which is key when deploying common always-on KWS on low-resource devices. Experimental results on a noisy version of the Google Speech Commands Dataset show that filterbank learning adapts to noise characteristics to provide a higher degree of robustness to noise, especially when dropout is integrated. Thus, switching from typically used 40-channel log-Mel features to 8-channel learned features leads to a relative KWS accuracy loss of only 3.5% while simultaneously achieving a 6.3x energy consumption reduction.


Teachable Reality: Prototyping Tangible Augmented Reality with Everyday Objects by Leveraging Interactive Machine Teaching

arXiv.org Artificial Intelligence

This paper introduces Teachable Reality, an augmented reality (AR) prototyping tool for creating interactive tangible AR applications with arbitrary everyday objects. Teachable Reality leverages vision-based interactive machine teaching (e.g., Teachable Machine), which captures real-world interactions for AR prototyping. It identifies the user-defined tangible and gestural interactions using an on-demand computer vision model. Based on this, the user can easily create functional AR prototypes without programming, enabled by a trigger-action authoring interface. Therefore, our approach allows the flexibility, customizability, and generalizability of tangible AR applications that can address the limitation of current marker-based approaches. We explore the design space and demonstrate various AR prototypes, which include tangible and deformable interfaces, context-aware assistants, and body-driven AR applications. The results of our user study and expert interviews confirm that our approach can lower the barrier to creating functional AR prototypes while also allowing flexible and general-purpose prototyping experiences.