AITopics | Fleet, David J.

Collaborating Authors

Fleet, David J.

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SpotlessSplats: Ignoring Distractors in 3D Gaussian Splatting

Sabour, Sara, Goli, Lily, Kopanas, George, Matthews, Mark, Lagun, Dmitry, Guibas, Leonidas, Jacobson, Alec, Fleet, David J., Tagliasacchi, Andrea

arXiv.org Artificial IntelligenceJun-28-2024

3D Gaussian Splatting (3DGS) is a promising technique for 3D reconstruction, offering efficient training and rendering speeds, making it suitable for real-time applications.However, current methods require highly controlled environments (no moving people or wind-blown elements, and consistent lighting) to meet the inter-view consistency assumption of 3DGS. This makes reconstruction of real-world captures problematic. We present SpotlessSplats, an approach that leverages pre-trained and general-purpose features coupled with robust optimization to effectively ignore transient distractors. Our method achieves state-of-the-art reconstruction quality both visually and quantitatively, on casual captures.

artificial intelligence, distractor, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2406.20055

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Improving Ab-Initio Cryo-EM Reconstruction with Semi-Amortized Pose Inference

Shekarforoush, Shayan, Lindell, David B., Brubaker, Marcus A., Fleet, David J.

arXiv.org Artificial IntelligenceJun-14-2024

Cryo-Electron Microscopy (cryo-EM) is an increasingly popular experimental technique for estimating the 3D structure of macromolecular complexes such as proteins based on 2D images. These images are notoriously noisy, and the pose of the structure in each image is unknown \textit{a priori}. Ab-initio 3D reconstruction from 2D images entails estimating the pose in addition to the structure. In this work, we propose a new approach to this problem. We first adopt a multi-head architecture as a pose encoder to infer multiple plausible poses per-image in an amortized fashion. This approach mitigates the high uncertainty in pose estimation by encouraging exploration of pose space early in reconstruction. Once uncertainty is reduced, we refine poses in an auto-decoding fashion. In particular, we initialize with the most likely pose and iteratively update it for individual images using stochastic gradient descent (SGD). Through evaluation on synthetic datasets, we demonstrate that our method is able to handle multi-modal pose distributions during the amortized inference stage, while the later, more flexible stage of direct pose optimization yields faster and more accurate convergence of poses compared to baselines. Finally, on experimental data, we show that our approach is faster than state-of-the-art cryoAI and achieves higher-resolution reconstruction.

artificial intelligence, machine learning, reconstruction, (17 more...)

arXiv.org Artificial Intelligence

2406.10455

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)
(2 more...)

Add feedback

Greedy Growing Enables High-Resolution Pixel-Based Diffusion Models

Vasconcelos, Cristina N., Rashwan, Abdullah, Waters, Austin, Walker, Trevor, Xu, Keyang, Yan, Jimmy, Qian, Rui, Luo, Shixin, Parekh, Zarana, Bunner, Andrew, Fei, Hongliang, Garg, Roopal, Guo, Mandy, Kajic, Ivana, Li, Yeqing, Nandwani, Henna, Pont-Tuset, Jordi, Onoe, Yasumasa, Rosston, Sarah, Wang, Su, Zhou, Wenlei, Swersky, Kevin, Fleet, David J., Baldridge, Jason M., Wang, Oliver

arXiv.org Artificial IntelligenceMay-26-2024

We address the long-standing problem of how to learn effective pixel-based image diffusion models at scale, introducing a remarkably simple greedy growing method for stable training of large-scale, high-resolution models. without the needs for cascaded super-resolution components. The key insight stems from careful pre-training of core components, namely, those responsible for text-to-image alignment {\it vs.} high-resolution rendering. We first demonstrate the benefits of scaling a {\it Shallow UNet}, with no down(up)-sampling enc(dec)oder. Scaling its deep core layers is shown to improve alignment, object structure, and composition. Building on this core model, we propose a greedy algorithm that grows the architecture into high-resolution end-to-end models, while preserving the integrity of the pre-trained representation, stabilizing training, and reducing the need for large high-resolution datasets. This enables a single stage model capable of generating high-resolution images without the need of a super-resolution cascade. Our key results rely on public datasets and show that we are able to train non-cascaded models up to 8B parameters with no further regularization schemes. Vermeer, our full pipeline model trained with internal datasets to produce 1024x1024 images, without cascades, is preferred by 44.0% vs. 21.4% human evaluators over SDXL.

artificial intelligence, machine learning, resolution, (16 more...)

arXiv.org Artificial Intelligence

2405.16759

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Industry:

Media (0.92)
Leisure & Entertainment > Sports (0.67)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

A Personalized Video-Based Hand Taxonomy: Application for Individuals with Spinal Cord Injury

Dousty, Mehdy, Fleet, David J., Zariffa, José

arXiv.org Artificial IntelligenceMar-26-2024

Hand function is critical for our interactions and quality of life. Spinal cord injuries (SCI) can impair hand function, reducing independence. A comprehensive evaluation of function in home and community settings requires a hand grasp taxonomy for individuals with impaired hand function. Developing such a taxonomy is challenging due to unrepresented grasp types in standard taxonomies, uneven data distribution across injury levels, and limited data. This study aims to automatically identify the dominant distinct hand grasps in egocentric video using semantic clustering. Egocentric video recordings collected in the homes of 19 individual with cervical SCI were used to cluster grasping actions with semantic significance. A deep learning model integrating posture and appearance data was employed to create a personalized hand taxonomy. Quantitative analysis reveals a cluster purity of 67.6% +- 24.2% with with 18.0% +- 21.8% redundancy. Qualitative assessment revealed meaningful clusters in video content. This methodology provides a flexible and effective strategy to analyze hand function in the wild. It offers researchers and clinicians an efficient tool for evaluating hand function, aiding sensitive assessments and tailored intervention plans.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2403.18094

Country: North America > Canada > Ontario > Toronto (0.15)

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Generalist Framework for Panoptic Segmentation of Images and Videos

Chen, Ting, Li, Lala, Saxena, Saurabh, Hinton, Geoffrey, Fleet, David J.

arXiv.org Artificial IntelligenceOct-12-2023

Panoptic segmentation assigns semantic and instance ID labels to every pixel of an image. As permutations of instance IDs are also valid solutions, the task requires learning of high-dimensional one-to-many mapping. As a result, state-of-the-art approaches use customized architectures and task-specific loss functions. We formulate panoptic segmentation as a discrete data generation problem, without relying on inductive bias of the task. A diffusion model is proposed to model panoptic masks, with a simple architecture and generic loss function. By simply adding past predictions as a conditioning signal, our method is capable of modeling video (in a streaming setting) and thereby learns to track object instances automatically. With extensive experiments, we demonstrate that our simple approach can perform competitively to state-of-the-art specialist methods in similar settings.

artificial intelligence, machine learning, segmentation, (17 more...)

arXiv.org Artificial Intelligence

2210.06366

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Synthetic Data from Diffusion Models Improves ImageNet Classification

Azizi, Shekoofeh, Kornblith, Simon, Saharia, Chitwan, Norouzi, Mohammad, Fleet, David J.

arXiv.org Artificial IntelligenceApr-17-2023

Deep generative models are becoming increasingly powerful, now generating diverse high fidelity photo-realistic samples given text prompts. Have they reached the point where models of natural images can be used for generative data augmentation, helping to improve challenging discriminative tasks? We show that large-scale text-to image diffusion models can be fine-tuned to produce class conditional models with SOTA FID (1.76 at 256x256 resolution) and Inception Score (239 at 256x256). The model also yields a new SOTA in Classification Accuracy Scores (64.96 for 256x256 generative samples, improving to 69.24 for 1024x1024 samples). Augmenting the ImageNet training set with samples from the resulting models yields significant improvements in ImageNet classification accuracy over strong ResNet and Vision Transformer baselines.

artificial intelligence, guidance weight, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2304.08466

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.48)

Add feedback

Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting

Wang, Su, Saharia, Chitwan, Montgomery, Ceslee, Pont-Tuset, Jordi, Noy, Shai, Pellegrini, Stefano, Onoe, Yasumasa, Laszlo, Sarah, Fleet, David J., Soricut, Radu, Baldridge, Jason, Norouzi, Mohammad, Anderson, Peter, Chan, William

arXiv.org Artificial IntelligenceApr-12-2023

Text-guided image editing can have a transformative impact in supporting creative applications. A key challenge is to generate edits that are faithful to input text prompts, while consistent with input images. We present Imagen Editor, a cascaded diffusion model built, by fine-tuning Imagen on text-guided image inpainting. Imagen Editor's edits are faithful to the text prompts, which is accomplished by using object detectors to propose inpainting masks during training. In addition, Imagen Editor captures fine details in the input image by conditioning the cascaded pipeline on the original high resolution image. To improve qualitative and quantitative evaluation, we introduce EditBench, a systematic benchmark for text-guided image inpainting. EditBench evaluates inpainting edits on natural and generated images exploring objects, attributes, and scenes. Through extensive human evaluation on EditBench, we find that object-masking during training leads to across-the-board improvements in text-image alignment -- such that Imagen Editor is preferred over DALL-E 2 and Stable Diffusion -- and, as a cohort, these models are better at object-rendering than text-rendering, and handle material/color/size attributes better than count/shape attributes.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2212.06909

Genre: Research Report (0.64)

Industry: Media (0.49)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.70)

Add feedback

RobustNeRF: Ignoring Distractors with Robust Losses

Sabour, Sara, Vora, Suhani, Duckworth, Daniel, Krasin, Ivan, Fleet, David J., Tagliasacchi, Andrea

arXiv.org Artificial IntelligenceFeb-1-2023

Neural radiance fields (NeRF) excel at synthesizing new views given multi-view, calibrated images of a static scene. When scenes include distractors, which are not persistent during image capture (moving objects, lighting variations, shadows), artifacts appear as view-dependent effects or 'floaters'. To cope with distractors, we advocate a form of robust estimation for NeRF training, modeling distractors in training data as outliers of an optimization problem. Our method successfully removes outliers from a scene and improves upon our baselines, on synthetic and real-world scenes. Our technique is simple to incorporate in modern NeRF frameworks, with few hyper-parameters. It does not assume a priori knowledge of the types of distractors, and is instead focused on the optimization problem rather than pre-processing or modeling transient objects. More results on our page https://robustnerf.github.io/public.

artificial intelligence, distractor, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2302.00833

Country:

North America > Canada (0.28)
Asia (0.28)

Genre: Research Report (0.50)

Industry: Media > Photography (0.48)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Pix2seq: A Language Modeling Framework for Object Detection

Chen, Ting, Saxena, Saurabh, Li, Lala, Fleet, David J., Hinton, Geoffrey

arXiv.org Artificial IntelligenceSep-22-2021

This paper presents Pix2Seq, a simple and generic framework for object detection. Unlike existing approaches that explicitly integrate prior knowledge about the task, we simply cast object detection as a language modeling task conditioned on the observed pixel inputs. Object descriptions (e.g., bounding boxes and class labels) are expressed as sequences of discrete tokens, and we train a neural net to perceive the image and generate the desired sequence. Our approach is based mainly on the intuition that if a neural net knows about where and what the objects are, we just need to teach it how to read them out. Beyond the use of task-specific data augmentations, our approach makes minimal assumptions about the task, yet it achieves competitive results on the challenging COCO dataset, compared to highly specialized and well optimized detection algorithms. Figure 1: Illustration of Pix2Seq framework for object detection. The neural net perceives an image and generates a sequence of tokens that correspond to bounding boxes and class labels. Visual object detection systems aim to recognize and localize all objects of pre-defined categories in an image. The detected objects are typically described by a set of bounding boxes and associated class labels.

artificial intelligence, computer vision, neural network, (15 more...)

arXiv.org Artificial Intelligence

2109.10852

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Cascaded Diffusion Models for High Fidelity Image Generation

Ho, Jonathan, Saharia, Chitwan, Chan, William, Fleet, David J., Norouzi, Mohammad, Salimans, Tim

arXiv.org Artificial IntelligenceJul-7-2021

We show that cascaded diffusion models are capable of generating high fidelity images on the class-conditional ImageNet generation challenge, without any assistance from auxiliary image classifiers to boost sample quality. A cascaded diffusion model comprises a pipeline of multiple diffusion models that generate images of increasing resolution, beginning with a standard diffusion model at the lowest resolution, followed by one or more super-resolution diffusion models that successively upsample the image and add higher resolution details. We find that the sample quality of a cascading pipeline relies crucially on conditioning augmentation, our proposed method of data augmentation of the lower resolution conditioning inputs to the super-resolution models.

artificial intelligence, augmentation, neural network, (14 more...)

arXiv.org Artificial Intelligence

2106.15282

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback