AITopics | Esteves, Carlos

Collaborating Authors

Esteves, Carlos

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Spectral Image Tokenizer

Esteves, Carlos, Suhail, Mohammed, Makadia, Ameesh

arXiv.org Artificial IntelligenceDec-12-2024

Image tokenizers map images to sequences of discrete tokens, and are a crucial component of autoregressive transformer-based image generation. The tokens are typically associated with spatial locations in the input image, arranged in raster scan order, which is not ideal for autoregressive modeling. In this paper, we propose to tokenize the image spectrum instead, obtained from a discrete wavelet transform (DWT), such that the sequence of tokens represents the image in a coarse-to-fine fashion. Our tokenizer brings several advantages: 1) it leverages that natural images are more compressible at high frequencies, 2) it can take and reconstruct images of different resolutions without retraining, 3) it improves the conditioning for next-token prediction -- instead of conditioning on a partial line-by-line reconstruction of the image, it takes a coarse reconstruction of the full image, 4) it enables partial decoding where the first few generated tokens can reconstruct a coarse version of the image, 5) it enables autoregressive models to be used for image upsampling. We evaluate the tokenizer reconstruction metrics as well as multiscale image generation, text-guided image upsampling and editing.

large language model, machine learning, resolution, (21 more...)

arXiv.org Artificial Intelligence

2412.09607

Genre: Research Report (0.82)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Learning to Transform for Generalizable Instance-wise Invariance

Singhal, Utkarsh, Esteves, Carlos, Makadia, Ameesh, Yu, Stella X.

arXiv.org Artificial IntelligenceJan-15-2024

Computer vision research has long aimed to build systems that are robust to spatial transformations found in natural data. Traditionally, this is done using data augmentation or hard-coding invariances into the architecture. However, too much or too little invariance can hurt, and the correct amount is unknown a priori and dependent on the instance. Ideally, the appropriate invariance would be learned from data and inferred at test-time. We treat invariance as a prediction problem. Given any image, we use a normalizing flow to predict a distribution over transformations and average the predictions over them. Since this distribution only depends on the instance, we can align instances before classifying them and generalize invariance across classes. The same distribution can also be used to adapt to out-of-distribution poses. This normalizing flow is trained end-to-end and can learn a much larger range of transformations than Augerino and InstaAug. When used as data augmentation, our method shows accuracy and robustness gains on CIFAR 10, CIFAR10-LT, and TinyImageNet.

artificial intelligence, invariance, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2309.16672

Country: North America > United States > New York (0.14)

Genre: Research Report (0.40)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
Information Technology > Artificial Intelligence > Vision (0.89)
Information Technology > Sensing and Signal Processing > Image Processing (0.68)

Add feedback

Single Mesh Diffusion Models with Field Latents for Texture Generation

Mitchel, Thomas W., Esteves, Carlos, Makadia, Ameesh

arXiv.org Artificial IntelligenceDec-14-2023

We introduce a framework for intrinsic latent diffusion models operating directly on the surfaces of 3D shapes, with the goal of synthesizing high-quality textures. Our approach is underpinned by two contributions: field latents, a latent representation encoding textures as discrete vector fields on the mesh vertices, and field latent diffusion models, which learn to denoise a diffusion process in the learned latent space on the surface. We consider a single-textured-mesh paradigm, where our models are trained to generate variations of a given texture on a mesh. We show the synthesized textures are of superior fidelity compared those from existing single-textured-mesh generative models. Our models can also be adapted for user-controlled editing tasks such as inpainting and label-guided generation. The efficacy of our approach is due in part to the equivariance of our proposed framework under isometries, allowing our models to seamlessly reproduce details across locally similar regions and opening the door to a notion of generative texture transfer.

artificial intelligence, machine learning, texture, (18 more...)

arXiv.org Artificial Intelligence

2312.0925

Country: North America > United States > Louisiana (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Scaling Spherical CNNs

Esteves, Carlos, Slotine, Jean-Jacques, Makadia, Ameesh

arXiv.org Artificial IntelligenceJun-8-2023

Spherical CNNs generalize CNNs to functions on the sphere, by using spherical convolutions as the main linear operation. The most accurate and efficient way to compute spherical convolutions is in the spectral domain (via the convolution theorem), which is still costlier than the usual planar convolutions. For this reason, applications of spherical CNNs have so far been limited to small problems that can be approached with low model capacity. In this work, we show how spherical CNNs can be scaled for much larger problems. To achieve this, we make critical improvements including novel variants of common model components, an implementation of core operations to exploit hardware accelerator characteristics, and application-specific input representations that exploit the properties of our model. Experiments show our larger spherical CNNs reach state-of-the-art on several targets of the QM9 molecular benchmark, which was previously dominated by equivariant graph neural networks, and achieve competitive performance on multiple weather forecasting tasks. Our code is available at https://github.com/google-research/spherical-cnn.

artificial intelligence, machine learning, spherical cnn, (18 more...)

arXiv.org Artificial Intelligence

2306.0542

Country:

North America > United States > Maryland (0.14)
North America > United States > Hawaii (0.14)

Genre: Research Report (0.82)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

ASIC: Aligning Sparse in-the-wild Image Collections

Gupta, Kamal, Jampani, Varun, Esteves, Carlos, Shrivastava, Abhinav, Makadia, Ameesh, Snavely, Noah, Kar, Abhishek

arXiv.org Artificial IntelligenceMar-28-2023

The above is also true for an image of a works assume either ground-truth keypoint annotations or "never-before-seen" object (as opposed to a common object a large dataset of images of a single object category. However, category such as cars) where humans demonstrate surprisingly neither of the above assumptions hold true for the longtail robust generalization despite lacking an object or category of the objects present in the world. We present a selfsupervised specific priors [6]. These correspondences in turn inform technique that directly optimizes on a sparse collection downstream inferences about the object such as shape, of images of a particular object/object category to affordances, and more. In this work, we tackle this problem obtain consistent dense correspondences across the collection. of "low-shot dense correspondence" - i.e. given only a small We use pairwise nearest neighbors obtained from deep in-the-wild image collection ( 10-30 images) of an object features of a pre-trained vision transformer (ViT) model as or object category, we recover dense and consistent correspondences noisy and sparse keypoint matches and make them dense across the entire collection.

artificial intelligence, machine learning, object-oriented architecture, (20 more...)

arXiv.org Artificial Intelligence

2303.16201

Country: North America > United States (0.46)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.75)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.67)

Add feedback