Goto

Collaborating Authors

 Niger


New dinosaur discovered in Sahara desert was a horned 'hell heron'

Popular Science

New dinosaur discovered in Sahara desert was a horned'hell heron' 'Spinosaurus mirabilis' sported a 20-inch-tall'scimitar' on its head. Spinosaurus mirabilis stands along river's edge over its prey some 95 million years ago. A scimitar-shaped head crest and interdigitating teeth characterize this wading giant, one of the last-surviving spinosaurid species. Breakthroughs, discoveries, and DIY tips sent six days a week. Paleontologists still know comparatively little about fin-backed dinosaurs .


VastTrack: Vast Category Visual Object Tracking

Neural Information Processing Systems

V astTrack consists of a few attractive properties: (1) V ast Object Category . In particular, it covers targets from 2,115 categories, significantly surpassing object classes of existing popular benchmarks ( e.g ., GOT -10k with 563 classes and LaSOT with 70 categories). Through providing such vast object classes, we expect to learn more general object tracking.


MassSpecGym: A benchmark for the discovery and identification of molecules Roman Bushuiev

Neural Information Processing Systems

Despite decades of progress in machine learning applications for predicting molecular structures from MS/MS spectra, the development of new methods is severely hindered by the lack of standard datasets and evaluation protocols. To address this problem, we propose MassSpecGym - the first comprehensive benchmark for the discovery and identification of molecules from MS/MS data.




Language Model Tokenizers Introduce Unfairness Between Languages

Neural Information Processing Systems

Recent language models have shown impressive multilingual performance, even when not explicitly trained for it. Despite this, there are concerns about the quality of their outputs across different languages. In this paper, we show how disparity in the treatment of different languages arises at the tokenization stage, well before a model is even invoked. The same text translated into different languages can have drastically different tok-enization lengths, with differences up to 15 times in some cases. These disparities persist even for tokenizers that are intentionally trained for multilingual support.



Are drones, AI making it harder to fight armed groups in the Sahel?

Al Jazeera

Are drones, AI making it harder to fight armed groups in the Sahel? The brazen attack on the international airport and nearby military airbase in Niamey, Niger's capital, came overnight between January 28 and 29. Balls of orange fire flew across the sky as the Nigerien army attempted to respond while residents ducked for cover and whispered prayers, as shown in videos on social media. ISIL (ISIS) in Sahel Province, or ISSP - a Niger-based outfit earlier known as the ISIL affiliate in the Greater Sahara or ISGS - has since claimed responsibility and says it killed several soldiers, although the Nigerien army disputes this. Many of its fighters had breached military drone hangars using RPGs and mortars, and managed to damage several aircraft and one civilian aeroplane, according to videos from the group.


Fast Best-of-N Decoding via Speculative Rejection Hanshi Sun

Neural Information Processing Systems

The safe and effective deployment of Large Language Models (LLMs) involves a critical step called alignment, which ensures that the model's responses are in accordance with human preferences. Prevalent alignment techniques, such as DPO, PPO and their variants, align LLMs by changing the pre-trained model weights during a phase called post-training. While predominant, these post-training methods add substantial complexity before LLMs can be deployed. Inference-time alignment methods avoid the complex post-training step and instead bias the generation towards responses that are aligned with human preferences. The best-known inference-time alignment method, called Best-of-N, is as effective as the state-of-the-art post-training procedures. Unfortunately, Best-of-N requires vastly more resources at inference time than standard decoding strategies, which makes it computationally not viable.