new version
in time series forecasting and achieve SOT A results, these two papers investigate different problems of time series
As for causal convolution, both papers have their own motivations (robustness V .S. quick response) to use it and utilize Transformer networks to outperform existing works. We plan to add ablation study to illustrate this in the new version. Rolling window: It is used in [3], [6] and [17], our main baselines. Note that loss function, split procedure and some other details are the same as [3] for fair comparisons. The window selection procedures and other details are elaborated in Appendix A.2.
Response for " Generalized Block-Diagonal Structure Pursuit: Learning Soft Latent Task 1 Assignment against Negative Transfer " 2 ID3136
Response for "Generalized Block-Diagonal Structure Pursuit: Learning Soft Latent T ask We thank all the reviewers for their valuable comments. We have fixed the typos pointed out by the reviewers. Is the framework limited only to linear models? Thm.3, the generalization ability will be promising if the loss is small (not necessarily only the optimal value) and the In this sense, a local critical point would be a good candidate solution. Are the constraints in the Obj included in the class H (L, S, null S, U)?
We thank all the reviewers for the valuable comments and suggestions
We thank all the reviewers for the valuable comments and suggestions. Besides, we indeed use dropout as in NoisyStudent (the paper you mentioned) to help generalization. We also combine SemiNAS with other NAS algorithm (e.g., Regularized Evolution) and We will add such experiments in the new version. SemiNAS (RE) consuming 2000 pairs to compare with RE under the same number of queries, and it achieves 94.03% CIFAR-10, there exist some differences. It runs each model for 3 times and collect the 3 results to reduce the variance.
the related discussions and further experiment results in the new version, shall our paper be accepted
We thank all reviewers for the insightful feedback. Below we address all questions raised in the reviews. More intuition can be added in Section 3. COT could greatly benefit sequential learning. To support our intuition, we provide two arguments in Appendix A.3: the For the justification, please see our response to Reviewer 2. WaveGAN (trained with WGAN-GP loss) and COT -GAN without the mixing trick. We respectfully disagree with the reviewer on this comment.
Thanks all the reviewers for the detailed and thoughtful comments
Thanks all the reviewers for the detailed and thoughtful comments. HMM-based works [1, 2, 3], all of which proposed methods to estimate alignments from unsegmented data. We've not thoroughly explored to improve the duration predictor and simply follow the same We design the grouped 1x1 convolutions to be able to mix channels. For example, to generate a speech of 5.8 Therefore, adopting parallel TTS models significantly improves the sampling speed of end-to-end systems. In Section 5.3, we showed that varying temperature can change We will add a reference about Viterbi training.
the related discussions and further experiment results in the new version, shall our paper be accepted
We thank all reviewers for the insightful feedback. Below we address all questions raised in the reviews. More intuition can be added in Section 3. COT could greatly benefit sequential learning. To support our intuition, we provide two arguments in Appendix A.3: the For the justification, please see our response to Reviewer 2. WaveGAN (trained with WGAN-GP loss) and COT -GAN without the mixing trick. We respectfully disagree with the reviewer on this comment.