AITopics | Raj, Amit

Collaborating Authors

Raj, Amit

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ICE-G: Image Conditional Editing of 3D Gaussian Splats

Jaganathan, Vishnu, Huang, Hannah Hanyun, Irshad, Muhammad Zubair, Jampani, Varun, Raj, Amit, Kira, Zsolt

arXiv.org Artificial IntelligenceJun-12-2024

Recently many techniques have emerged to create high quality 3D assets and scenes. When it comes to editing of these objects, however, existing approaches are either slow, compromise on quality, or do not provide enough customization. We introduce a novel approach to quickly edit a 3D model from a single reference view. Our technique first segments the edit image, and then matches semantically corresponding regions across chosen segmented dataset views using DINO features. A color or texture change from a particular region of the edit image can then be applied to other views automatically in a semantically sensible manner. These edited views act as an updated dataset to further train and re-style the 3D scene. The end-result is therefore an edited 3D model. Our framework enables a wide variety of editing tasks such as manual local edits, correspondence based style transfer from any example image, and a combination of different styles from multiple example images. We use Gaussian Splats as our primary 3D representation due to their speed and ease of local editing, but our technique works for other methods such as NeRFs as well. We show through multiple examples that our method produces higher quality results while offering fine-grained control of editing. Project page: ice-gaussian.github.io

artificial intelligence, machine learning, texture, (17 more...)

arXiv.org Artificial Intelligence

2406.08488

Genre:

Research Report (0.70)
Overview > Innovation (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

AI Algorithm for Predicting and Optimizing Trajectory of UAV Swarm

Raj, Amit, Ahuja, Kapil, Busnel, Yann

arXiv.org Artificial IntelligenceMay-19-2024

This paper explores the application of Artificial Intelligence (AI) techniques for generating the trajectories of fleets of Unmanned Aerial Vehicles (UAVs). The two main challenges addressed include accurately predicting the paths of UAVs and efficiently avoiding collisions between them. Firstly, the paper systematically applies a diverse set of activation functions to a Feedforward Neural Network (FFNN) with a single hidden layer, which enhances the accuracy of the predicted path compared to previous work. Secondly, we introduce a novel activation function, AdaptoSwelliGauss, which is a sophisticated fusion of Swish and Elliott activations, seamlessly integrated with a scaled and shifted Gaussian component. Swish facilitates smooth transitions, Elliott captures abrupt trajectory changes, and the scaled and shifted Gaussian enhances robustness against noise. This dynamic combination is specifically designed to excel in capturing the complexities of UAV trajectory prediction. This new activation function gives substantially better accuracy than all existing activation functions. Thirdly, we propose a novel Integrated Collision Detection, Avoidance, and Batching (ICDAB) strategy that merges two complementary UAV collision avoidance techniques: changing UAV trajectories and altering their starting times, also referred to as batching. This integration helps overcome the disadvantages of both - reduction in the number of trajectory manipulations, which avoids overly convoluted paths in the first technique, and smaller batch sizes, which reduce overall takeoff time in the second.

activation function, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2405.11722

Country:

Europe (0.14)
Asia > India (0.14)

Genre: Research Report (0.82)

Industry: Transportation > Air (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

DiffusionLight: Light Probes for Free by Painting a Chrome Ball

Phongthawee, Pakkapon, Chinchuthakun, Worameth, Sinsunthithet, Nontaphat, Raj, Amit, Jampani, Varun, Khungurn, Pramook, Suwajanakorn, Supasorn

arXiv.org Artificial IntelligenceJan-1-2024

We present a simple yet effective technique to estimate lighting in a single input image. Current techniques rely heavily on HDR panorama datasets to train neural networks to regress an input with limited field-of-view to a full environment map. However, these approaches often struggle with real-world, uncontrolled settings due to the limited diversity and size of their datasets. To address this problem, we leverage diffusion models trained on billions of standard images to render a chrome ball into the input image. Despite its simplicity, this task remains challenging: the diffusion models often insert incorrect or inconsistent objects and cannot readily generate images in HDR format. Our research uncovers a surprising relationship between the appearance of chrome balls and the initial diffusion noise map, which we utilize to consistently generate high-quality chrome balls. We further fine-tune an LDR difusion model (Stable Diffusion XL) with LoRA, enabling it to perform exposure bracketing for HDR light estimation. Our method produces convincing light estimates across diverse settings and demonstrates superior generalization to in-the-wild scenarios.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2312.09168

Country:

North America > United States (0.14)
Asia > Thailand (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.46)

Add feedback

Which way is `right'?: Uncovering limitations of Vision-and-Language Navigation model

Hahn, Meera, Raj, Amit, Rehg, James M.

arXiv.org Artificial IntelligenceNov-30-2023

The challenging task of Vision-and-Language Navigation (VLN) requires embodied agents to follow natural language instructions to reach a goal location or object (e.g. `walk down the hallway and turn left at the piano'). For agents to complete this task successfully, they must be able to ground objects referenced into the instruction (e.g.`piano') into the visual scene as well as ground directional phrases (e.g.`turn left') into actions. In this work we ask the following question -- to what degree are spatial and directional language cues informing the navigation model's decisions? We propose a series of simple masking experiments to inspect the model's reliance on different parts of the instruction. Surprisingly we uncover that certain top performing models rely only on the noun tokens of the instructions. We propose two training methods to alleviate this concerning limitation.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2312.00151

Country: Europe > United Kingdom (0.14)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.88)

Add feedback

DreamBooth3D: Subject-Driven Text-to-3D Generation

Raj, Amit, Kaza, Srinivas, Poole, Ben, Niemeyer, Michael, Ruiz, Nataniel, Mildenhall, Ben, Zada, Shiran, Aberman, Kfir, Rubinstein, Michael, Barron, Jonathan, Li, Yuanzhen, Jampani, Varun

arXiv.org Artificial IntelligenceMar-27-2023

We present DreamBooth3D, an approach to personalize text-to-3D generative models from as few as 3-6 casually captured images of a subject. Our approach combines recent advances in personalizing text-to-image models (DreamBooth) with text-to-3D generation (DreamFusion). We find that naively combining these methods fails to yield satisfactory subject-specific 3D assets due to personalized text-to-image models overfitting to the input viewpoints of the subject. We overcome this through a 3-stage optimization strategy where we jointly leverage the 3D consistency of neural radiance fields together with the personalization capability of text-to-image models. Our method can produce high-quality, subject-specific 3D assets with text-driven modifications such as novel poses, colors and attributes that are not seen in any of the input images of the subject.

artificial intelligence, dreambooth3d, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2303.13508

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)

Add feedback

Kernel Mean Matching for Content Addressability of GANs

Jitkrittum, Wittawat, Sangkloy, Patsorn, Gondal, Muhammad Waleed, Raj, Amit, Hays, James, Schölkopf, Bernhard

arXiv.org Machine LearningMay-14-2019

We propose a novel procedure which adds "content-addressability" to any given unconditional implicit model e.g., a generative adversarial network (GAN). The procedure allows users to control the generative process by specifying a set (arbitrary size) of desired examples based on which similar samples are generated from the model. The proposed approach, based on kernel mean matching, is applicable to any generative models which transform latent vectors to samples, and does not require retraining of the model. Experiments on various high-dimensional image generation problems (CelebA-HQ, LSUN bedroom, bridge, tower) show that our approach is able to generate images which are consistent with the input set, while retaining the image quality of the original model. To our knowledge, this is the first work that attempts to construct, at test time, a content-addressable generative model from a trained marginal model.

deep learning, neural network, output image, (16 more...)

arXiv.org Machine Learning

1905.05882

Country:

North America > United States > New York (0.14)
North America > United States > California (0.14)

Genre: Research Report > Promising Solution (0.54)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Deep Forward and Inverse Perceptual Models for Tracking and Prediction

Lambert, Alexander, Shaban, Amirreza, Raj, Amit, Liu, Zhen, Boots, Byron

arXiv.org Artificial IntelligenceMay-19-2018

We consider the problems of learning forward models that map state to high-dimensional images and inverse models that map high-dimensional images to state in robotics. Specifically, we present a perceptual model for generating video frames from state with deep networks, and provide a framework for its use in tracking and prediction tasks. We show that our proposed model greatly outperforms standard deconvolutional methods and GANs for image generation, producing clear, photo-realistic images. We also develop a convolutional neural network model for state estimation and compare the result to an Extended Kalman Filter to estimate robot trajectories. We validate all models on a real robotic system.

deep learning, neural network, prediction, (19 more...)

arXiv.org Artificial Intelligence

1710.11311

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback