Review for NeurIPS paper: TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for Sign Language Translation
–Neural Information Processing Systems
Weaknesses: W1 The submission claims that existing approaches only capture spatial appearance (line 42), but the one that is compared with [2] is actually based on RNNs, that have the potential to capture motion information across a sequence of frames. W2 While the work acknowledges the challenges of of motion blurs and fine-grained gesture details (line 40), it does not address them in the proposed approach. W3 The quantitative gains in terms of BLEU (9.58 to 13.41) and ROUGE (31.80 to 34.96) scores are not outstanding. W4 The results of [2] by exploiting the glosses available in the dataset are better than the ones in this submission. Given that the contributions of the work address the visual representation, it is not argues why the proposed techniques are also assess with the Sign-to-Gloss-to-Text set up considered in [2].
Neural Information Processing Systems
Jan-26-2025, 11:31:33 GMT