Goto

Collaborating Authors

 Ali, Arslan


Cosmos World Foundation Model Platform for Physical AI

arXiv.org Artificial Intelligence

Physical AI needs to be trained digitally first. It needs a digital twin of itself, the policy model, and a digital twin of the world, the world model. In this paper, we present the Cosmos World Foundation Model Platform to help developers build customized world models for their Physical AI setups. We position a world foundation model as a general-purpose world model that can be fine-tuned into customized world models for downstream applications. Our platform covers a video curation pipeline, pre-trained world foundation models, examples of post-training of pre-trained world foundation models, and video tokenizers. To help Physical AI builders solve the most critical problems of our society, we make our platform open-source and our models open-weight with permissive licenses available via https://github.com/NVIDIA/Cosmos.


Rapid Detection of Aircrafts in Satellite Imagery based on Deep Neural Networks

arXiv.org Artificial Intelligence

Object detection is one of the fundamental objectives in Applied Computer Vision. In some of the applications, object detection becomes very challenging such as in the case of satellite image processing. Satellite image processing has remained the focus of researchers in domains of Precision Agriculture, Climate Change, Disaster Management, etc. Therefore, object detection in satellite imagery is one of the most researched problems in this domain. This paper focuses on aircraft detection. in satellite imagery using deep learning techniques. In this paper, we used YOLO deep learning framework for aircraft detection. This method uses satellite images collected by different sources as learning for the model to perform detection. Object detection in satellite images is mostly complex because objects have many variations, types, poses, sizes, complex and dense background. YOLO has some limitations for small size objects (less than$\sim$32 pixels per object), therefore we upsample the prediction grid to reduce the coarseness of the model and to accurately detect the densely clustered objects. The improved model shows good accuracy and performance on different unknown images having small, rotating, and dense objects to meet the requirements in real-time.