Goto

Collaborating Authors

 Philippines


Auslan-Daily: Australian Sign Language Translation for Daily Communication and News

Neural Information Processing Systems

Considering different geographic regions generally have their own native sign languages, it is valuable to establish corresponding SL T datasets to support related communication and research. Auslan, as a sign language specific to Australia, still lacks a dedicated large-scale dataset for SL T.


YouTube was down for thousands of users in the US

Engadget

Samsung Galaxy Unpacked 2026 is Feb. 25 Valve's Steam Machine: Everything we know It was also down in some other countries. YouTube is experiencing an outage across the United States, with users in other countries like Canada, India, the Philippines, Australia and Russia also having problems with accessing the website. The issue seems to have started at around 8 PM Eastern time and reached 338,000 reports on Downdetector before starting to taper down. More users reported having issues accessing the app, but I personally lost access to the web homepage first. As of 9:22 PM, users are still reporting being unable to access YouTube on Reddit .




A Appendix

Neural Information Processing Systems

The complete list may be seen in Table 8. Here are a few general notes about these strings: 1. Based on their recommendations, we did the following: 1. zh, zh_Latn: This resulted in the special filters described below. URLs) the corpora were in languages different from the LangID predictions. This is mainly mis-rendered PDFs and may have practical applications for denoising, or for decoding such garbled PDFs.



Language Model Tokenizers Introduce Unfairness Between Languages

Neural Information Processing Systems

Recent language models have shown impressive multilingual performance, even when not explicitly trained for it. Despite this, there are concerns about the quality of their outputs across different languages. In this paper, we show how disparity in the treatment of different languages arises at the tokenization stage, well before a model is even invoked. The same text translated into different languages can have drastically different tok-enization lengths, with differences up to 15 times in some cases. These disparities persist even for tokenizers that are intentionally trained for multilingual support.