Goto

Collaborating Authors

 Calgary


Scalable Plug-and-Play ADMM with Convergence Guarantees

arXiv.org Machine Learning

Plug-and-play priors (PnP) is a broadly applicable methodology for solving inverse problems by exploiting statistical priors specified as denoisers. Recent work has reported the state-of-the-art performance of PnP algorithms using pre-trained deep neural nets as denoisers in a number of imaging applications. However, current PnP algorithms are impractical in large-scale settings due to their heavy computational and memory requirements. This work addresses this issue by proposing an incremental variant of the widely used PnP-ADMM algorithm, making it scalable to large-scale datasets. We theoretically analyze the convergence of the algorithm under a set of explicit assumptions, extending recent theoretical results in the area. Additionally, we show the effectiveness of our algorithm with nonsmooth data-fidelity terms and deep neural net priors, its fast convergence compared to existing PnP algorithms, and its scalability in terms of speed and memory.


Japan's smart cities: Technological dreams or 'Big Brother' nightmares?

The Japan Times

Osaka โ€“ Late last month, the Diet passed a revised bill paving the way for so-called "super cities" or "smart cities." Supporters tout them as high-tech marvels where artificial intelligence and big data are to be used to provide more efficient and cost-effective solutions to social problems, especially in areas faced with aging and declining populations and a reduced tax base. Opponents warn that data leaks could lead to privacy violations and even a surveillance state. What was the purpose of the recently passed bill? In order to realize the creation of smart cities in various parts of the country, any number of basic regulations involving multiple ministries needs to be changed. The May 27 revision to a national strategic special zone law included measures the government can now take to do that more quickly and under more specific guidelines.


GACELA -- A generative adversarial context encoder for long audio inpainting

arXiv.org Machine Learning

We introduce GACELA, a generative adversarial network (GAN) designed to restore missing musical audio data with a duration ranging between hundreds of milliseconds to a few seconds, i.e., to perform long-gap audio inpainting. While previous work either addressed shorter gaps or relied on exemplars by copying available information from other signal parts, GACELA addresses the inpainting of long gaps in two aspects. First, it considers various time scales of audio information by relying on five parallel discriminators with increasing resolution of receptive fields. Second, it is conditioned not only on the available information surrounding the gap, i.e., the context, but also on the latent variable of the conditional GAN. This addresses the inherent multi-modality of audio inpainting at such long gaps and provides the option of user-defined inpainting. GACELA was tested in listening tests on music signals of varying complexity and gap durations ranging from 375~ms to 1500~ms. While our subjects were often able to detect the inpaintings, the severity of the artifacts decreased from unacceptable to mildly disturbing. GACELA represents a framework capable to integrate future improvements such as processing of more auditory-related features or more explicit musical features.


Scaling your AI-powered Battlesnake with distributed reinforcement learning in Amazon SageMaker Amazon Web Services

#artificialintelligence

Battlesnake is an AI competition in which you build AI-powered snakes. Battlesnake's rules are similar to the traditional snakes game. Your goal is to be the last surviving snake when competing against other snakes. Developers of all levels build snakes using techniques ranging from unique heuristic-based strategies to state-of-the-art deep reinforcement learning (RL) algorithms. You can use the SageMaker Battlesnake Starter Pack to build your own snake and compete in the Battlesnake arena.


Algebraic Approach to Directed Rough Sets

arXiv.org Artificial Intelligence

In relational approach to general rough sets, ideas of directed relations are supplemented with additional conditions for multiple algebraic approaches in this research paper. The relations are also specialized to representations of general parthood that are upper-directed, reflexive and antisymmetric for a better behaved groupoidal semantics over the set of roughly equivalent objects by the first author. Another distinct algebraic semantics over the set of approximations, and a new knowledge interpretation are also invented in this research by her. Because of minimal conditions imposed on the relations, neighborhood granulations are used in the construction of all approximations (granular and pointwise). Necessary and sufficient conditions for the lattice of local upper approximations to be completely distributive are proved by the second author. These results are related to formal concept analysis. Applications to student centered learning and decision making are also outlined.


Computer model predicts how drugs affect heart rhythm

#artificialintelligence

UC Davis Health researchers have developed a computer model to screen drugs for unintended cardiac side effects, especially arrhythmia risk. Colleen E. Clancy with Pei-Chi Yang and Kevin DeMarco of her research team (from left to right). Published in Circulation Research, the study was led by Colleen E. Clancy, professor of physiology and membrane biology, and Igor Vorobyov, assistant professor of physiology and membrane biology. Clancy is a recognized leader in using high-performance computing to understand electrical changes in the heart. "One main reason for a drug being removed from the market is potentially life-threatening arrhythmias," Clancy said.


Deep Reinforcement Learning (DRL): Another Perspective for Unsupervised Wireless Localization

arXiv.org Machine Learning

Location is key to spatialize internet-of-things (IoT) data. However, it is challenging to use low-cost IoT devices for robust unsupervised localization (i.e., localization without training data that have known location labels). Thus, this paper proposes a deep reinforcement learning (DRL) based unsupervised wireless-localization method. The main contributions are as follows. (1) This paper proposes an approach to model a continuous wireless-localization process as a Markov decision process (MDP) and process it within a DRL framework. (2) To alleviate the challenge of obtaining rewards when using unlabeled data (e.g., daily-life crowdsourced data), this paper presents a reward-setting mechanism, which extracts robust landmark data from unlabeled wireless received signal strengths (RSS). (3) To ease requirements for model re-training when using DRL for localization, this paper uses RSS measurements together with agent location to construct DRL inputs. The proposed method was tested by using field testing data from multiple Bluetooth 5 smart ear tags in a pasture. Meanwhile, the experimental verification process reflected the advantages and challenges for using DRL in wireless localization.


Bio-Inspired Modality Fusion for Active Speaker Detection

arXiv.org Machine Learning

Human beings have developed fantastic abilities to integrate information from various sensory sources exploring their inherent complementarity. Perceptual capabilities are therefore heightened enabling, for instance, the well known "cocktail party" and McGurk effects, i.e. speech disambiguation from a panoply of sound signals. This fusion ability is also key in refining the perception of sound source location, as in distinguishing whose voice is being heard in a group conversation. Furthermore, Neuroscience has successfully identified the superior colliculus region in the brain as the one responsible for this modality fusion, with a handful of biological models having been proposed to approach its underlying neurophysiological process. Deriving inspiration from one of these models, this paper presents a methodology for effectively fusing correlated auditory and visual information for active speaker detection. Such an ability can have a wide range of applications, from teleconferencing systems to social robotics. The detection approach initially routes auditory and visual information through two specialized neural network structures. The resulting embeddings are fused via a novel layer based on the superior colliculus, whose topological structure emulates spatial neuron cross-mapping of unimodal perceptual fields. The validation process employed two publicly available datasets, with achieved results confirming and greatly surpassing initial expectations.


Deep Transform and Metric Learning Network: Wedding Deep Dictionary Learning and Neural Networks

arXiv.org Machine Learning

On account of its many successes in inference tasks and denoising applications, Dictionary Learning (DL) and its related sparse optimization problems have garnered a lot of research interest. While most solutions have focused on single layer dictionaries, the improved recently proposed Deep DL (DDL) methods have also fallen short on a number of issues. We propose herein, a novel DDL approach where each DL layer can be formulated as a combination of one linear layer and a Recurrent Neural Network (RNN). The RNN is shown to flexibly account for the layer-associated and learned metric. Our proposed work unveils new insights into Neural Networks and DDL and provides a new, efficient and competitive approach to jointly learn a deep transform and a metric for inference applications. Extensive experiments are carried out to demonstrate that the proposed method can not only outperform existing DDL but also state-of-the-art generic CNNs.


Deep Speaker Embeddings for Far-Field Speaker Recognition on Short Utterances

arXiv.org Machine Learning

Speaker recognition systems based on deep speaker embeddings have achieved significant performance in controlled conditions according to the results obtained for early NIST SRE (Speaker Recognition Evaluation) datasets. From the practical point of view, taking into account the increased interest in virtual assistants (such as Amazon Alexa, Google Home, AppleSiri, etc.), speaker verification on short utterances in uncontrolled noisy environment conditions is one of the most challenging and highly demanded tasks. This paper presents approaches aimed to achieve two goals: a) improve the quality of far-field speaker verification systems in the presence of environmental noise, reverberation and b) reduce the system qualitydegradation for short utterances. For these purposes, we considered deep neural network architectures based on TDNN (TimeDelay Neural Network) and ResNet (Residual Neural Network) blocks. We experimented with state-of-the-art embedding extractors and their training procedures. Obtained results confirm that ResNet architectures outperform the standard x-vector approach in terms of speaker verification quality for both long-duration and short-duration utterances. We also investigate the impact of speech activity detector, different scoring models, adaptation and score normalization techniques. The experimental results are presented for publicly available data and verification protocols for the VoxCeleb1, VoxCeleb2, and VOiCES datasets.