Goto

Collaborating Authors

 Niger


New dinosaur discovered in Sahara desert was a horned 'hell heron'

Popular Science

New dinosaur discovered in Sahara desert was a horned'hell heron' 'Spinosaurus mirabilis' sported a 20-inch-tall'scimitar' on its head. Spinosaurus mirabilis stands along river's edge over its prey some 95 million years ago. A scimitar-shaped head crest and interdigitating teeth characterize this wading giant, one of the last-surviving spinosaurid species. Breakthroughs, discoveries, and DIY tips sent six days a week. Paleontologists still know comparatively little about fin-backed dinosaurs .




VastTrack: Vast Category Visual Object Tracking

Neural Information Processing Systems

V astTrack consists of a few attractive properties: (1) V ast Object Category . In particular, it covers targets from 2,115 categories, significantly surpassing object classes of existing popular benchmarks ( e.g ., GOT -10k with 563 classes and LaSOT with 70 categories). Through providing such vast object classes, we expect to learn more general object tracking.



MassSpecGym: A benchmark for the discovery and identification of molecules Roman Bushuiev

Neural Information Processing Systems

Despite decades of progress in machine learning applications for predicting molecular structures from MS/MS spectra, the development of new methods is severely hindered by the lack of standard datasets and evaluation protocols. To address this problem, we propose MassSpecGym - the first comprehensive benchmark for the discovery and identification of molecules from MS/MS data.




Language Model Tokenizers Introduce Unfairness Between Languages

Neural Information Processing Systems

Recent language models have shown impressive multilingual performance, even when not explicitly trained for it. Despite this, there are concerns about the quality of their outputs across different languages. In this paper, we show how disparity in the treatment of different languages arises at the tokenization stage, well before a model is even invoked. The same text translated into different languages can have drastically different tok-enization lengths, with differences up to 15 times in some cases. These disparities persist even for tokenizers that are intentionally trained for multilingual support.