Reviews: Multimodal Residual Learning for Visual QA
–Neural Information Processing Systems
The authors successfully built upon two effective ideas, the deep residual learning and element-wise multiplication for implicit attention, and created a solution for general multi-modal tasks. Experiments were carefully run to select an optimal architecture and hyper-parameters for the targeted Visual QA task. The results appeared to be superb, compared to previous studies with various deep learning techniques. It would be helpful if the authors can present additional comparison with existing techniques in terms of model parameter size, as well as amount of data required for learning. It would also be interesting to separately assess the value of residual learning and implicit attention on the Visual QA task, to help understand which aspect is the most critical.
Neural Information Processing Systems
Jan-20-2025, 16:25:39 GMT
- Technology: