Country
Fourier Spectrum Discrepancies in Deep Network Generated Images
Dzanic, Tarik, Witherden, Freddie
Advancements in deep generative models such as generative adversarial networks and variational autoencoders have resulted in the ability to generate realistic images that are visually indistinguishable from real images. In this paper, we present an analysis of the high-frequency Fourier modes of real and deep network generated images and the effects of resolution and image compression on these modes. Using this, we propose a detection method based on the frequency spectrum of the images which is able to achieve an accuracy of up to 99.2\% in classifying real, Style-GAN generated, and VQ-VAE2 generated images on a dataset of 2000 images with less than 10\% training data. Furthermore, we suggest a method for modifying the high-frequency attributes of deep network generated images to mimic real images.
The Canonical Distortion Measure for Vector Quantization and Function Approximation
To measure the quality of a set of vector quantization points a means of measuring the distance between a random point and its quantization is required. Common metrics such as the {\em Hamming} and {\em Euclidean} metrics, while mathematically simple, are inappropriate for comparing natural signals such as speech or images. In this paper it is shown how an {\em environment} of functions on an input space $X$ induces a {\em canonical distortion measure} (CDM) on X. The depiction 'canonical" is justified because it is shown that optimizing the reconstruction error of X with respect to the CDM gives rise to optimal piecewise constant approximations of the functions in the environment. The CDM is calculated in closed form for several different function classes. An algorithm for training neural networks to implement the CDM is presented along with some encouraging experimental results.
Progressive Feature Polishing Network for Salient Object Detection
Wang, Bo, Chen, Quan, Zhou, Min, Zhang, Zhiqiang, Jin, Xiaogang, Gai, Kun
Feature matters for salient object detection. Existing methods mainly focus on designing a sophisticated structure to incorporate multi-level features and filter out cluttered features. We present Progressive Feature Polishing Network (PFPN), a simple yet effective framework to progressively polish the multi-level features to be more accurate and representative. By employing multiple Feature Polishing Modules (FPMs) in a recurrent manner, our approach is able to detect salient objects with fine details without any post-processing. A FPM parallelly updates the features of each level by directly incorporating all higher level context information. Moreover, it can keep the dimensions and hierarchical structures of the feature maps, which makes it flexible to be integrated with any CNN-based models. Empirical experiments show that our results are monotonically getting better with increasing number of FPMs. Without bells and whistles, PFPN outperforms the state-of-the-art methods significantly on five benchmark datasets under various evaluation metrics.
Multiple Patients Behavior Detection in Real-time using mmWave Radar and Deep CNNs
Jin, Feng, Zhang, Renyuan, Sengupta, Arindam, Cao, Siyang, Hariri, Salim, Agarwal, Nimit K., Agarwal, Sumit K.
To address potential gaps noted in patient monitoring in the hospital, a novel patient behavior detection system using mmWave radar and deep convolution neural network (CNN), which supports the simultaneous recognition of multiple patients' behaviors in real-time, is proposed. In this study, we use an mmWave radar to track multiple patients and detect the scattering point cloud of each one. For each patient, the Doppler pattern of the point cloud over a time period is collected as the behavior signature. A three-layer CNN model is created to classify the behavior for each patient. The tracking and point clouds detection algorithm was also implemented on an mmWave radar hardware platform with an embedded graphics processing unit (GPU) board to collect Doppler pattern and run the CNN model. A training dataset of six types of behavior were collected, over a long duration, to train the model using Adam optimizer with an objective to minimize cross-entropy loss function. Lastly, the system was tested for real-time operation and obtained a very good inference accuracy when predicting each patient's behavior in a two-patient scenario.
A Neural Network Based on the Johnson $S_\mathrm{U}$ Translation System and Related Application to Electromyogram Classification
Hayashi, Hideaki, Shibanoki, Taro, Tsuji, Toshio
Electromyogram (EMG) classification is a key technique in EMG-based control systems. The existing EMG classification methods do not consider the characteristics of EMG features that the distribution has skewness and kurtosis, causing drawbacks such as the requirement of hyperparameter tuning. In this paper, we propose a neural network based on the Johnson $S_\mathrm{U}$ translation system that is capable of representing distributions with skewness and kurtosis. The Johnson system is a normalizing translation that transforms non-normal data to a normal distribution, thereby enabling the representation of a wide range of distributions. In this study, a discriminative model based on the multivariate Johnson $S_\mathrm{U}$ translation system is transformed into a linear combination of coefficients and input vectors using log-linearization. This is then incorporated into a neural network structure, thereby allowing the calculation of the posterior probability of the input vectors for each class and the determination of model parameters as weight coefficients of the network. The uniqueness of convergence of the network learning is theoretically guaranteed. In the experiments, the suitability of the proposed network for distributions including skewness and kurtosis is evaluated using artificially generated data. Its applicability for real biological data is also evaluated via an EMG classification experiment. The results show that the proposed network achieves high classification performance without the need for hyperparameter optimization.
Face shape classification using Inception v3
In this paper, we present experimental results obtained from retraining the last layer of the Inception v3 model in classifying images of human faces into one of five basic face shapes. The accuracy of the retrained Inception v3 model was compared with that of the following classification methods that uses facial landmark distance ratios and angles as features: linear discriminant analysis (LDA), support vector machines with linear kernel (SVM-LIN), support vector machines with radial basis function kernel (SVM-RBF), artificial neural networks or multilayer perceptron (MLP), and k-nearest neighbors (KNN). All classifiers were trained and tested using a total of 500 images of female celebrities with known face shapes collected from the Internet. Results show that training accuracy and overall accuracy ranges from 98.0% to 100% and from 84.4% to 84.8% for Inception v3 and from 50.6% to 73.0% and from 36.4% to 64.6% for the other classifiers depending on the training set size used. This result shows that the retrained Inception v3 model was able to fit the training data well and outperform the other classifiers without the need to handpick specific features to include in model training. Future work should consider expanding the labeled dataset, preferably one that can also be freely distributed to the research community, so that proper model cross-validation can be performed. As far as we know, this is the first in the literature to use convolutional neural networks in face-shape classification. The scripts are available at https://github.com/adonistio/inception-face-shape-classifier.
The Similarity-Consensus Regularized Multi-view Learning for Dimension Reduction
Meng, Xiangzhu, Wang, Huibing, Feng, Lin
During the last decades, learning a low-dimensional space with discriminative information for dimension reduction (DR) has gained a surge of interest. However, it's not accessible for these DR methods to achieve satisfactory performance when facing the features from multiple views. In multi-view learning problems, one instance can be represented by multiple heterogeneous features, which are highly related but sometimes look different from each other. In addition, correlations between features from multiple views always vary greatly, which challenges the capability of multi-view learning methods. Consequently, constructing a multi-view learning framework with generalization and scalability, which could take advantage of multi-view information as much as possible, is extremely necessary but challenging. To implement the above target, this paper proposes a novel multi-view learning framework based on similarity consensus, which makes full use of correlations among multi-view features while considering the scalability and robustness of the framework. It aims to straightforwardly extend those existing DR methods into multi-view learning domain by preserving the similarity between different views to capture the low-dimensional embedding. Two schemes based on pairwise-consensus and centroid-consensus are separately proposed to force multiple views to learn from each other and then an iterative alternating strategy is developed to obtain the optimal solution. The proposed method is evaluated on 5 benchmark datasets and comprehensive experiments show that our proposed multi-view framework can yield comparable and promising performance with previous approaches proposed in recent literatures.
Performance evaluation of deep neural networks for forecasting time-series with multiple structural breaks and high volatility
Jain, Shikhar, Kaushik, Rohit, Jain, Siddhant, Dash, Tirtharaj
The problem of automatic forecasting of time-series data has been a long-standing challenge for the machine learning and forecasting community. The problem is relatively simple when the series is stationary. However, the majority of the real-world time-series problems have non-stationary characteristics making the understanding of the trend and seasonality very complex. Further, it is assumed that the future response is dependent on the past data and, therefore, can be modeled using a function approximator. Our interest in this paper is to study the applicability of the popular deep neural networks (DNN) comprehensively as function approximators for non-stationary time-series forecasting. We employ the following DNN models: Multi-layer Perceptron (MLP), Convolutional Neural Network (CNN), and RNN with Long-Short Term Memory (LSTM-RNN) and RNN with Gated-Recurrent Unit (GRU-RNN). These powerful DNN methods have been evaluated over popular Indian financial stocks data comprising of five stocks from National Stock Exchange Nifty-50 (NSE-Nifty50), and five stocks from Bombay Stock Exchange 30 (BSE-30). Further, the performance evaluation of these DNNs in terms of their predictive power has been done using two fashions: (1) single-step forecasting, (2) multi-step forecasting. Our extensive simulation experiments on these ten datasets report that the performance of these DNNs for single-step forecasting is pretty convincing as the predictions are found to follow the truely observed values closely. However, we also find that all these DNN models perform miserably in the case of multi-step time-series forecasting, based on the datasets used by us. Consequently, we observe that none of these DNN models are reliable for multi-step time-series forecasting.
Deep learning methods in speaker recognition: a review
Sztahó, Dávid, Szaszák, György, Beke, András
This paper summarizes the applied deep learning practices in the field of speaker recognition, both verification and identification. Speaker recognition has been a widely used field topic of speech technolog y. Many research works have been carried out and little progress has been achieved in the past 5 - 6 years. However, as deep learning techniques do advance in most machine learning fields, the former state - of - the - art methods are getting replaced by them in s peaker recognition too. It seems that DL becomes the now state - of - the - art solution for both speaker verification and identification. The standard x - vectors, additional to i - vectors, are used as baseline in most of the novel works. The increasing amount of gathered data opens up the territory to DL, where they are the most effective.
Sequential Recommendation with Relation-Aware Kernelized Self-Attention
Ji, Mingi, Joo, Weonyoung, Song, Kyungwoo, Kim, Yoon-Yeong, Moon, Il-Chul
Recent studies identified that sequential Recommendation is improved by the attention mechanism. By following this development, we propose Relation-Aware Kernelized Self-Attention (RKSA) adopting a self-attention mechanism of the Transformer with augmentation of a probabilistic model. The original self-attention of Transformer is a deterministic measure without relation-awareness. Therefore, we introduce a latent space to the self-attention, and the latent space models the recommendation context from relation as a multivariate skew-normal distribution with a kernelized covariance matrix from co-occurrences, item characteristics, and user information. This work merges the self-attention of the Transformer and the sequential recommendation by adding a probabilistic model of the recommendation task specifics. We experimented RKSA over the benchmark datasets, and RKSA shows significant improvements compared to the recent baseline models. Also, RKSA were able to produce a latent space model that answers the reasons for recommendation.