If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."
However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …
The problem of phase retrieval has been intriguing researchers for decades due to its appearance in a wide range of applications. The task of a phase retrieval algorithm is typically to recover a signal from linear phase-less measurements. In this paper, we approach the problem by proposing a hybrid model-based data-driven deep architecture, referred to as the Unfolded Phase Retrieval (UPR), that shows potential in improving the performance of the state-of-the-art phase retrieval algorithms. Specifically, the proposed method benefits from versatility and interpretability of well established model-based algorithms, while simultaneously benefiting from the expressive power of deep neural networks. Our numerical results illustrate the effectiveness of such hybrid deep architectures and showcase the untapped potential of data-aided methodologies to enhance the existing phase retrieval algorithms.
Many machine learning problems can be interpreted as learning for matching two types of objects (e.g., images and captions, users and products, queries and documents). The matching level of two objects is usually measured as the inner product in a certain feature space, while the modeling effort focuses on mapping of objects from the original space to the feature space. This schema, although proven successful on a range of matching tasks, is insufficient for capturing the rich structure in the matching process of more complicated objects. In this paper, we propose a new deep architecture to more effectively model the complicated matching relations between two objects from heterogeneous domains. More specifically, we apply this model to matching tasks in natural language, e.g., finding sensible responses for a tweet, or relevant answers to a given question.
We propose a new deep architecture for topic modeling, based on Poisson Factor Analysis (PFA) modules. The model is composed of a Poisson distribution to model observed vectors of counts, as well as a deep hierarchy of hidden binary units. Rather than using logistic functions to characterize the probability that a latent binary unit is on, we employ a Bernoulli-Poisson link, which allows PFA modules to be used repeatedly in the deep architecture. We also describe an approach to build discriminative topic models, by adapting PFA modules. We derive efficient inference via MCMC and stochastic variational methods, that scale with the number of non-zeros in the data and binary units, yielding significant efficiency, relative to models based on logistic links.
We develop a fully discriminative learning approach for supervised Latent Dirichlet Allocation (LDA) model using Back Propagation (i.e., BP-sLDA), which maximizes the posterior probability of the prediction variable given the input document. Different from traditional variational learning or Gibbs sampling approaches, the proposed learning method applies (i) the mirror descent algorithm for maximum a posterior inference and (ii) back propagation over a deep architecture together with stochastic gradient/mirror descent for model parameter estimation, leading to scalable and end-to-end discriminative learning of the model. As a byproduct, we also apply this technique to develop a new learning method for the traditional unsupervised LDA model (i.e., BP-LDA). Experimental results on three real-world regression and classification tasks show that the proposed methods significantly outperform the previous supervised topic models, neural networks, and is on par with deep neural networks. Papers published at the Neural Information Processing Systems Conference.
We describe Swapout, a new stochastic training method, that outperforms ResNets of identical network structure yielding impressive results on CIFAR-10 and CIFAR-100. When viewed as a regularization method swapout not only inhibits co-adaptation of units in a layer, similar to dropout, but also across network layers. We conjecture that swapout achieves strong regularization by implicitly tying the parameters across layers. When viewed as an ensemble training method, it samples a much richer set of architectures than existing methods such as dropout or stochastic depth. We propose a parameterization that reveals connections to exiting architectures and suggests a much richer set of architectures to be explored.
Emotion recognition from speech is one of the key steps towards emotional intelligence in advanced human-machine interaction. Identifying emotions in human speech requires learning features that are robust and discriminative across diverse domains that differ in terms of language, spontaneity of speech, recording conditions, and types of emotions. This corresponds to a learning scenario in which the joint distributions of features and labels may change substantially across domains. In this paper, we propose a deep architecture that jointly exploits a convolutional network for extracting domain-shared features and a long short-term memory network for classifying emotions using domain-specific features. We use transferable features to enable model adaptation from multiple source domains, given the sparseness of speech emotion data and the fact that target domains are short of labeled data. A comprehensive cross-corpora experiment with diverse speech emotion domains reveals that transferable features provide gains ranging from 4.3% to 18.4% in speech emotion recognition. We evaluate several domain adaptation approaches, and we perform an ablation study to understand which source domains add the most to the overall recognition effectiveness for a given target domain.
We present a new approach to integrating deep learning with knowledge-based systems that we believe shows promise. Our approach seeks to emulate reasoning structure, which can be inspected part-way through, rather than simply learning reasoner answers, which is typical in many of the black-box systems currently in use. We demonstrate that this idea is feasible by training a long short-term memory (LSTM) artificial neural network to learn EL+ reasoning patterns with two different data sets. We also show that this trained system is resistant to noise by corrupting a percentage of the test data and comparing the reasoner's and LSTM's predictions on corrupt data with correct answers.
Parameterized mathematical models play a central role in understanding and design of complex information systems. However, they often cannot take into account the intricate interactions innate to such systems. On the contrary, purely data-driven approaches do not need explicit mathematical models for data generation and have a wider applicability at the cost of interpretability. In this paper, we consider the design of a one-bit compressive variational autoencoder, and propose a novel hybrid model-based and data-driven methodology that allows us not only to design the sensing matrix and the quantization thresholds for one-bit data acquisition, but also allows for learning the latent-parameters of iterative optimization algorithms specifically designed for the problem of one-bit sparse signal recovery. In addition, the proposed method has the ability to adaptively learn the proper quantization thresholds, paving the way for amplitude recovery in one-bit compressive sensing. Our results demonstrate a significant improvement compared to state-of-the-art model-based algorithms.
Capitalizing on the need for addressing the existing challenges associated with gesture recognition via sparse multichannel surface Electromyography (sEMG) signals, the paper proposes a novel deep learning model, referred to as the XceptionTime architecture. The proposed innovative XceptionTime is designed by integration of depthwise separable convolutions, adaptive average pooling, and a novel non-linear normalization technique. At the heart of the proposed architecture is several XceptionTime modules concatenated in series fashion designed to capture both temporal and spatial information-bearing contents of the sparse multichannel sEMG signals without the need for data augmentation and/or manual design of feature extraction. In addition, through integration of adaptive average pooling, Conv1D, and the non-linear normalization approach, XceptionTime is less prone to overfitting, more robust to temporal translation of the input, and more importantly is independent from the input window size. Finally, by utilizing the depthwise separable convolutions, the XceptionTime network has far fewer parameters resulting in a less complex network. The performance of XceptionTime is tested on a sub Ninapro dataset, DB1, and the results showed a superior performance in comparison to any existing counterparts. In this regard, 5:71% accuracy improvement, on a window size 200ms, is reported in this paper, for the first time.
ABSTRACT With growing consumer adoption of online grocery shopping through platforms such as Amazon Fresh, Instacart, and Walmart Grocery, there is a pressing business need to provide relevant recommendations throughout the customer journey. In this paper, we introduce a production within-basket grocery recommendation system, RTT2V ec, which generates real-time personalized product recommendations to supplement the user's current grocery basket. We conduct extensive offline evaluation of our system and demonstrate a 9.4% uplift in prediction metrics over baseline state-of-the-art within-basket recommendation models. We also propose an approximate inference technique 11.6x times faster than exact inference approaches. In production, our system has resulted in an increase in average basket size, improved product discovery, and enabled faster user checkout.