Helfrich, Kyle, Ye, Qiang

The underlying dynamical system carries temporal information from one time step to another and captures potential dependencies among the terms of a sequence. Like other deep neural networks, the weights of an RNN are learned by gradient descent. For the input at a time step to affect the output at a later time step, the gradients must back-propagate through each step. Since a sequence can be quite long, RNNs are prone to suffer from vanishing or exploding gradients as described in (Bengio, Frasconi, and Simard 1993) and (Pas-canu, Mikolov, and Bengio 2013). One consequence of this well-known problem is the difficulty of the network to model input-output dependency over a large number of time steps. There have been many different architectures that are designed to mitigate this problem. The most popular RNN architectures such as LSTMs (Hochreiter and Schmidhu-ber 1997) and GRUs (Cho et al. 2014), incorporate a gating mechanism to explicitly retain or discard information.

Guo, Peichang, Ye, Qiang

Convolutional neural network is a very important model of deep learning. It can help avoid the exploding/vanishing gradient problem and improve the generalizability of a neural network if the singular values of the Jacobian of a layer are bounded around $1$ in the training process. We propose a new penalty function for a convolutional kernel to let the singular values of the corresponding transformation matrix are bounded around $1$. We show how to carry out the gradient type methods. The penalty is about the transformation matrix corresponding to a kernel, not directly about the kernel, which is different from results in existing papers. This provides a new regularization method about the weights of convolutional layers. Other penalty functions about a kernel can be devised following this idea in future.

Maduranga, Kehelwala D. G., Helfrich, Kyle E., Ye, Qiang

Recurrent neural networks (RNNs) have been successfully used on a wide range of sequential data problems. A well known difficulty in using RNNs is the \textit{vanishing or exploding gradient} problem. Recently, there have been several different RNN architectures that try to mitigate this issue by maintaining an orthogonal or unitary recurrent weight matrix. One such architecture is the scaled Cayley orthogonal recurrent neural network (scoRNN) which parameterizes the orthogonal recurrent weight matrix through a scaled Cayley transform. This parametrization contains a diagonal scaling matrix consisting of positive or negative one entries that can not be optimized by gradient descent. Thus the scaling matrix is fixed before training and a hyperparameter is introduced to tune the matrix for each particular task. In this paper, we develop a unitary RNN architecture based on a complex scaled Cayley transform. Unlike the real orthogonal case, the transformation uses a diagonal scaling matrix consisting of entries on the complex unit circle which can be optimized using gradient descent and no longer requires the tuning of a hyperparameter. We also provide an analysis of a potential issue of the modReLU activiation function which is used in our work and several other unitary RNNs. In the experiments conducted, the scaled Cayley unitary recurrent neural network (scuRNN) achieves comparable or better results than scoRNN and other unitary RNNs without fixing the scaling matrix.

Helfrich, Kyle, Willmott, Devin, Ye, Qiang

Recurrent Neural Networks (RNNs) are designed to handle sequential data but suffer from vanishing or exploding gradients. Recent work on Unitary Recurrent Neural Networks (uRNNs) have been used to address this issue and in some cases, exceed the capabilities of Long Short-Term Memory networks (LSTMs). We propose a simpler and novel update scheme to maintain orthogonal recurrent weight matrices without using complex valued matrices. This is done by parametrizing with a skew-symmetric matrix using the Cayley transform. Such a parametrization is unable to represent matrices with negative one eigenvalues, but this limitation is overcome by scaling the recurrent weight matrix by a diagonal matrix consisting of ones and negative ones. The proposed training scheme involves a straightforward gradient calculation and update step. In several experiments, the proposed scaled Cayley orthogonal recurrent neural network (scoRNN) achieves superior results with fewer trainable parameters than other unitary RNNs.

Ye, Qiang (University of Kentucky) | Zhi, Weifeng (University of Kentucky)

We consider an alignment algorithm for reconstructing global coordinates from local coordinates constructed for sections of manifolds. We show that, under certain conditions, the align- ment algorithm can successfully recover global coordinates even when local neighborhoods have different dimensions. Our results generalize an earlier analysis to allow alignment of sections of different dimensions. We also apply our result to a semisupervised learning problem.

Ye, Qiang (University of Kentucky) | Zhi, Weifeng (University of Kentucky)