Goto

Collaborating Authors

 Education



Scaling Sign Language Translation

Neural Information Processing Systems

Sign language translation (SL T) addresses the problem of translating information from a sign language in video to a spoken language in text. Existing studies, while showing progress, are often limited to narrow domains and/or few sign languages and struggle with open-domain tasks. In this paper, we push forward the frontier of SL T by scaling pretraining data, model size, and number of translation directions. We perform large-scale SL T pretraining on different data including 1) noisy multilingual Y ouTube SL T data, 2) parallel text corpora, and 3) SL T data augmented by translating video captions to other languages with off-the-shelf machine translation models. We unify different pretraining tasks with task-specific prompts under the encoder-decoder architecture, and initialize the SL T model with pretrained (m/By)T5 models across model sizes. SL T pretraining results on How2Sign and FLEURS-ASL#0 (ASL to 42 spoken languages) demonstrate the significance of data/model scaling and cross-lingual cross-modal transfer, as well as the feasibility of zero-shot SL T. We finetune the pretrained SL T models on 5 downstream open-domain SL T benchmarks covering 5 sign languages. Experiments show substantial quality improvements over the vanilla baselines, surpassing the previous state-of-the-art (SOT A) by wide margins.


Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers

Neural Information Processing Systems

By employing object identifiers, we transform diverse 3D scene-language tasks into a unified question-answering format, facilitating joint training without the need for additional task-specific heads.



A Study of Plasticity Loss in On-Policy Deep Reinforcement Learning

Neural Information Processing Systems

We demonstrate that plasticity loss is pervasive under domain shift in this regime, and that a number of methods developed to resolve it in other settings fail, sometimes even performing worse than applying no intervention at all. In contrast, we find that a class of "regenerative" methods are able to consistently mitigate plasticity loss in a variety of contexts, including in gridworld tasks and



Oracle-EfficientDifferentiallyPrivateLearningwith PublicData

Neural Information Processing Systems

Due to statistical lower bounds on the learnability of many function classes under privacy constraints, there has been recent interest in leveraging public data to improve the performance of private learning algorithms.



Auslan-Daily: Australian Sign Language Translation for Daily Communication and News

Neural Information Processing Systems

Considering different geographic regions generally have their own native sign languages, it is valuable to establish corresponding SL T datasets to support related communication and research. Auslan, as a sign language specific to Australia, still lacks a dedicated large-scale dataset for SL T.