AITopics | Dollár, Piotr

Collaborating Authors

Dollár, Piotr

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The effectiveness of MAE pre-pretraining for billion-scale pretraining

Singh, Mannat, Duval, Quentin, Alwala, Kalyan Vasudev, Fan, Haoqi, Aggarwal, Vaibhav, Adcock, Aaron, Joulin, Armand, Dollár, Piotr, Feichtenhofer, Christoph, Girshick, Ross, Girdhar, Rohit, Misra, Ishan

arXiv.org Artificial IntelligenceJan-24-2024

This paper revisits the standard pretrain-then-finetune paradigm used in computer vision for visual recognition tasks. Typically, state-of-the-art foundation models are pretrained using large scale (weakly) supervised datasets with billions of images. We introduce an additional pre-pretraining stage that is simple and uses the self-supervised MAE technique to initialize the model. While MAE has only been shown to scale with the size of models, we find that it scales with the size of the training dataset as well. Thus, our MAE-based pre-pretraining scales with both model and data size making it applicable for training foundation models. Pre-pretraining consistently improves both the model convergence and the downstream transfer performance across a range of model scales (millions to billions of parameters), and dataset sizes (millions to billions of images). We measure the effectiveness of pre-pretraining on 10 different visual recognition tasks spanning image classification, video recognition, object detection, low-shot classification and zero-shot recognition. Our largest model achieves new state-of-the-art results on iNaturalist-18 (91.7%), ImageNet-ReaL (91.1%), 1-shot ImageNet-1k (63.6%), and zero-shot transfer on Food-101 (96.2%). Our study reveals that model initialization plays a significant role, even for web-scale pretraining with billions of images, and our models are available publicly.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2303.13496

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.87)

Add feedback

Segment Anything

Kirillov, Alexander, Mintun, Eric, Ravi, Nikhila, Mao, Hanzi, Rolland, Chloe, Gustafson, Laura, Xiao, Tete, Whitehead, Spencer, Berg, Alexander C., Lo, Wan-Yen, Dollár, Piotr, Girshick, Ross

arXiv.org Artificial IntelligenceApr-5-2023

We introduce the Segment Anything (SA) project: a new task, model, and dataset for image segmentation. Using our efficient model in a data collection loop, we built the largest segmentation dataset to date (by far), with over 1 billion masks on 11M licensed and privacy respecting images. The model is designed and trained to be promptable, so it can transfer zero-shot to new image distributions and tasks. We evaluate its capabilities on numerous tasks and find that its zero-shot performance is impressive -- often competitive with or even superior to prior fully supervised results. We are releasing the Segment Anything Model (SAM) and corresponding dataset (SA-1B) of 1B masks and 11M images at https://segment-anything.com to foster research into foundation models for computer vision.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2304.02643

Country: North America > United States (0.92)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.46)
Personal > Interview (0.45)

Industry:

Law (1.00)
Information Technology (1.00)
Government (1.00)
Banking & Finance (0.67)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.92)
(2 more...)

Add feedback

Learning Features by Watching Objects Move

Pathak, Deepak, Girshick, Ross, Dollár, Piotr, Darrell, Trevor, Hariharan, Bharath

arXiv.org Artificial IntelligenceApr-12-2017

This paper presents a novel yet intuitive approach to unsupervised feature learning. Inspired by the human visual system, we explore whether low-level motion-based grouping cues can be used to learn an effective visual representation. Specifically, we use unsupervised motion-based segmentation on videos to obtain segments, which we use as 'pseudo ground truth' to train a convolutional network to segment objects from a single frame. Given the extensive evidence that motion plays a key role in the development of the human visual system, we hope that this straightforward approach to unsupervised learning will be more effective than cleverly designed 'pretext' tasks studied in the literature. Indeed, our extensive experiments show that this is the case. When used for transfer learning on object detection, our representation significantly outperforms previous unsupervised approaches across multiple settings, especially when training data for the target task is scarce.

deep learning, neural network, representation, (19 more...)

arXiv.org Artificial Intelligence

1612.0637

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.46)

Technology: