Reviews: On Multiplicative Integration with Recurrent Neural Networks

Neural Information Processing Systems 

My biggest concern about this work is the lack of novelty. Despite the claimed differences, the proposed method is a special case of what proposed in [10]. I doubt that the slight different parameterization (remove one factor-hidden matrix and introduce more bias terms) makes much difference. I strongly suspect that the improved performance is due to better optimization (HF has proven to be very brittle). I also found weak the argument for which gating makes gradients flow better because there is no guarantee this is going to happen.