Convolution operation helps in leaning the context much faster than the LSTMs. The encoder can be parallelized using the Convolution, however, I am confused with parallelization of the decoder. During training, when we know the output translated sentence, we can provide the decoder the output sentence as the input by shifting it one time to the right. However, during testing, we have to run the decoder n times to extract n words of the output sentence, using the predicted word in the current time step as the input to the decoder in the next timestep. Using a decoder with LSTM / RNN layers would have increased the per layer execution time complexity, where a convolutional decoder can execute each layer parallel, but LSTM decoder would have still run 1 time compared to n times of convolutional decoder.
Is there any truth to the notion that college instructors who implement active learning receive lower teaching evaluations? Henderson et al. present data from college physics instructors who attended a new-faculty workshop and attempted to incorporate active learning into their introductory course. Contrary to common belief, 48% of these instructors reported an increase in student evaluations, 32% reported no change, and only 20% reported a decrease in their evaluations. The authors acknowledge the limitations of the study, including the nature of self-reported data as well as changes in student evaluations over time, yet provide the overall recommendation that instructors (and institutions) should not let perceived anxiety over negative student evaluations be a reason to avoid implementing evidence-based teaching practices.
With the progress in machine translation, it becomes more subtle to develop the evaluation metric capturing the systems’ differences in comparison to the human translations. In contrast to the current efforts in leveraging more linguistic information to depict translation quality, this paper takes the thread of combining language independent features for a robust solution to MT evaluation metric. To compete with finer granularity of modeling brought by linguistic features, the proposed method augments the word level metrics by a letter based calculation. An empirical study is then conducted over WMT data to train the metrics by ranking SVM. The results reveal that the integration of current language independent metrics can generate well enough performance for a variety of languages. Time-split data validation is promising as a better training setting, though the greedy strategy also works well.
Neural programming involves training neural networks to learn programs from data. Previous works have failed to achieve good generalization performance, especially on programs with high complexity or on large domains. This is because they mostly rely either on black-box function evaluations that do not capture the structure of the program, or on detailed execution traces that are expensive to obtain, and hence the training data has poor coverage of the domain under consideration. We present a novel framework that utilizes black-box function evaluations, in conjunction with symbolic expressions that integrate relationships between the given functions. We employ tree LSTMs to incorporate the structure of the symbolic expression trees. We use tree encoding for numbers present in function evaluation data, based on their decimal representation. We present an evaluation benchmark for this task to demonstrate our proposed model combines symbolic reasoning and function evaluation in a fruitful manner, obtaining high accuracies in our experiments. Our framework generalizes significantly better to expressions of higher depth and is able to fill partial equations with valid completions.
The performance of deep neural networks crucially depends on good hyperparameter configurations. Bayesian optimization is a powerful framework for optimizing the hyperparameters of DNNs. These methods need sufficient evaluation data to approximate and minimize the validation error function of hyperparameters. However, the expensive evaluation cost of DNNs leads to very few evaluation data within a limited time, which greatly reduces the efficiency of Bayesian optimization. Besides, the previous researches focus on using the complete evaluation data to conduct Bayesian optimization, and ignore the intermediate evaluation data generated by early stopping methods. To alleviate the insufficient evaluation data problem, we propose a fast hyperparameter optimization method, HOIST, that utilizes both the complete and intermediate evaluation data to accelerate the hyperparameter optimization of DNNs. Specifically, we train multiple basic surrogates to gather information from the mixed evaluation data, and then combine these basic surrogates using weighted bagging to provide an accurate ensemble surrogate. Our empirical studies show that HOIST outperforms the state-of-theart approaches on a wide range of DNNs, including feed forward neural networks, convolutional neural networks, recurrent neural networks, and variational autoencoder.