AITopics | Ravishankar, Rahul

Collaborating Authors

Ravishankar, Rahul

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

An Empirical Study of Autoregressive Pre-training from Videos

Rajasegaran, Jathushan, Radosavovic, Ilija, Ravishankar, Rahul, Gandelsman, Yossi, Feichtenhofer, Christoph, Malik, Jitendra

arXiv.org Artificial IntelligenceJan-9-2025

In a paper published in 1951, Shannon, having just published the foundational papers of information theory, proposed a "guessing game" of next word prediction to estimate the entropy of English (Shannon, 1951). Nearly 70 years later, training a high-capacity transformer network (Vaswani et al., 2017) on this task, provided the generative pre-training backbone for Large Language Models (Radford et al., 2018; Devlin et al., 2019; Radford et al., 2019; Brown et al., 2020). Less well known is the fact that in 1954, Fred Attneave (Attneave, 1954) proposed an analog of Shannon's task for images. To quote "We may divide the picture into arbitrarily small elements which we "transmit" to a subject (S) in a cumulative sequence, having them guess at the color of each successive element until they are correct. This method of analysis resembles the scanning process used in television and facsimile systems and accomplishes the like purpose of transforming two spatial dimensions into a single sequence in time".

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2501.05453

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.66)

Add feedback

Scaling Properties of Diffusion Models for Perceptual Tasks

Ravishankar, Rahul, Patel, Zeeshan, Rajasegaran, Jathushan, Malik, Jitendra

arXiv.org Artificial IntelligenceNov-16-2024

In this paper, we argue that iterative computation with diffusion models offers a powerful paradigm for not only generation but also visual perception tasks. We unify tasks such as depth estimation, optical flow, and amodal segmentation under the framework of image-to-image translation, and show how diffusion models benefit from scaling training and test-time compute for these perceptual tasks. Through a careful analysis of these scaling properties, we formulate computeoptimal training and inference recipes to scale diffusion models for visual perception tasks. Our models achieve competitive performance to state-of-the-art methods using significantly less data and compute. Diffusion models have emerged as powerful techniques for generating images and videos, while showing excellent scaling behaviors. In this paper, we present a unified framework to perform a variety of perceptual tasks -- depth estimation, optical flow estimation, and amodal segmentation -- with a single diffusion model, as illustrated in Figure 1. Previous works such as Marigold (Ke et al., 2024), FlowDiffuser (Luo et al., 2024), and pix2gestalt (Ozguroglu et al., 2024) demonstrate the potential of repurposing image diffusion models for various inverse vision tasks individually.

compute, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2411.08034

Country: North America > United States (0.68)

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback