Goto

Collaborating Authors

 block



POp-GS: Next Best View in 3D-Gaussian Splatting with P-Optimality

Wilson, Joey, Almeida, Marcelino, Mahajan, Sachit, Labrie, Martin, Ghaffari, Maani, Ghasemalizadeh, Omid, Sun, Min, Kuo, Cheng-Hao, Sen, Arnab

arXiv.org Artificial Intelligence

In this paper, we present a novel algorithm for quantifying uncertainty and information gained within 3D Gaussian Splatting (3D-GS) through P-Optimality. While 3D-GS has proven to be a useful world model with high-quality rasterizations, it does not natively quantify uncertainty. Quantifying uncertainty in parameters of 3D-GS is necessary to understand the information gained from acquiring new images as in active perception, or identify redundant images which can be removed from memory due to resource constraints in online 3D-GS SLAM. We propose to quantify uncertainty and information gain in 3D-GS by reformulating the problem through the lens of optimal experimental design, which is a classical solution to measuring information gain. By restructuring information quantification of 3D-GS through optimal experimental design, we arrive at multiple solutions, of which T-Optimality and D-Optimality perform the best quantitatively and qualitatively as measured on two popular datasets. Additionally, we propose a block diagonal approximation of the 3D-GS uncertainty, which provides a measure of correlation for computing more accurate information gain, at the expense of a greater computation cost.


MultiPDENet: PDE-embedded Learning with Multi-time-stepping for Accelerated Flow Simulation

Wang, Qi, Mi, Yuan, Wang, Haoyun, Zhang, Yi, Chengze, Ruizhi, Liu, Hongsheng, Wen, Ji-Rong, Sun, Hao

arXiv.org Artificial Intelligence

Solving partial differential equations (PDEs) by numerical methods meet computational cost challenge for getting the accurate solution since fine grids and small time steps are required. Machine learning can accelerate this process, but struggle with weak generalizability, interpretability, and data dependency, as well as suffer in long-term prediction. To this end, we propose a PDE-embedded network with multiscale time stepping (MultiPDENet), which fuses the scheme of numerical methods and machine learning, for accelerated simulation of flows. In particular, we design a convolutional filter based on the structure of finite difference stencils with a small number of parameters to optimize, which estimates the equivalent form of spatial derivative on a coarse grid to minimize the equation's residual. A Physics Block with a 4th-order Runge-Kutta integrator at the fine time scale is established that embeds the structure of PDEs to guide the prediction. To alleviate the curse of temporal error accumulation in long-term prediction, we introduce a multiscale time integration approach, where a neural network is used to correct the prediction error at a coarse time scale. Experiments across various PDE systems, including the Navier-Stokes equations, demonstrate that MultiPDENet can accurately predict long-term spatiotemporal dynamics, even given small and incomplete training data, e.g., spatiotemporally down-sampled datasets. MultiPDENet achieves the state-of-the-art performance compared with other neural baseline models, also with clear speedup compared to classical numerical methods.


DynaMo: In-Domain Dynamics Pretraining for Visuo-Motor Control

Cui, Zichen Jeff, Pan, Hengkai, Iyer, Aadhithya, Haldar, Siddhant, Pinto, Lerrel

arXiv.org Artificial Intelligence

Imitation learning has proven to be a powerful tool for training complex visuomotor policies. However, current methods often require hundreds to thousands of expert demonstrations to handle high-dimensional visual observations. A key reason for this poor data efficiency is that visual representations are predominantly either pretrained on out-of-domain data or trained directly through a behavior cloning objective. In this work, we present DynaMo, a new in-domain, self-supervised method for learning visual representations. Given a set of expert demonstrations, we jointly learn a latent inverse dynamics model and a forward dynamics model over a sequence of image embeddings, predicting the next frame in latent space, without augmentations, contrastive sampling, or access to ground truth actions. Importantly, DynaMo does not require any out-of-domain data such as Internet datasets or cross-embodied datasets. On a suite of six simulated and real environments, we show that representations learned with DynaMo significantly improve downstream imitation learning performance over prior self-supervised learning objectives, and pretrained representations. Gains from using DynaMo hold across policy classes such as Behavior Transformer, Diffusion Policy, MLP, and nearest neighbors. Finally, we ablate over key components of DynaMo and measure its impact on downstream policy performance. Robot videos are best viewed at https://dynamo-ssl.github.io


Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance

Ahn, Donghoon, Cho, Hyoungwon, Min, Jaewon, Jang, Wooseok, Kim, Jungwoo, Kim, SeonHwa, Park, Hyun Hee, Jin, Kyong Hwan, Kim, Seungryong

arXiv.org Artificial Intelligence

Recent studies have demonstrated that diffusion models are capable of generating high-quality samples, but their quality heavily depends on sampling guidance techniques, such as classifier guidance (CG) and classifier-free guidance (CFG). These techniques are often not applicable in unconditional generation or in various downstream tasks such as image restoration. In this paper, we propose a novel sampling guidance, called Perturbed-Attention Guidance (PAG), which improves diffusion sample quality across both unconditional and conditional settings, achieving this without requiring additional training or the integration of external modules. PAG is designed to progressively enhance the structure of samples throughout the denoising process. It involves generating intermediate samples with degraded structure by substituting selected self-attention maps in diffusion U-Net with an identity matrix, by considering the self-attention mechanisms' ability to capture structural information, and guiding the denoising process away from these degraded samples. In both ADM and Stable Diffusion, PAG surprisingly improves sample quality in conditional and even unconditional scenarios. Moreover, PAG significantly improves the baseline performance in various downstream tasks where existing guidances such as CG or CFG cannot be fully utilized, including ControlNet with empty prompts and image restoration such as inpainting and deblurring.


Quick-Tune: Quickly Learning Which Pretrained Model to Finetune and How

Arango, Sebastian Pineda, Ferreira, Fabio, Kadra, Arlind, Hutter, Frank, Grabocka, Josif

arXiv.org Artificial Intelligence

With the ever-increasing number of pretrained models, machine learning practitioners are continuously faced with which pretrained model to use, and how to finetune it for a new dataset. In this paper, we propose a methodology that jointly searches for the optimal pretrained model and the hyperparameters for finetuning it. Our method transfers knowledge about the performance of many pretrained models with multiple hyperparameter configurations on a series of datasets. To this aim, we evaluated over 20k hyperparameter configurations for finetuning 24 pretrained image classification models on 87 datasets to generate a large-scale meta-dataset. We meta-learn a multi-fidelity performance predictor on the learning curves of this meta-dataset and use it for fast hyperparameter optimization on new datasets. We empirically demonstrate that our resulting approach can quickly select an accurate pretrained model for a new dataset together with its optimal hyperparameters.


Yap

AAAI Conferences

Path planning is a critical part of modern computer games; rare is the game where nothing moves and path planning is unneeded. A* is the workhorse for most path planning applications. Block A* is a state-of-the-art algorithm that is always faster than A* in experiments using game maps. Unlike other methods that improve upon A*'s performance, Block A* is never worse than A* nor require any knowledge of the map. In our experiments, Block A* is ideal for games with randomly generated maps, large maps, or games with a highly dynamic multi-agent environment. Furthermore, in the domain of grid-based any-angle path planning, we show that Block A* is an order of magnitude faster than the previous best any-angle path planning algorithm, Theta*. We empirically show our results using maps from Dragon Age: Origins and Starcraft. Finally, we introduce populated game maps'' as a new test bed that is a better approximation of real game conditions than the standard test beds of this field. The main contributions of this paper is a more rigorous set of experiments for Block A*, and introducing a new test bed (populated game maps) that is a more accurate representation of actual game conditions than the standard test beds.


Lifelong Learning Metrics

New, Alexander, Baker, Megan, Nguyen, Eric, Vallabha, Gautam

arXiv.org Artificial Intelligence

The DARPA Lifelong Learning Machines (L2M) program seeks to yield advances in artificial intelligence (AI) systems so that they are capable of learning (and improving) continuously, leveraging data on one task to improve performance on another, and doing so in a computationally sustainable way. Performers on this program developed systems capable of performing a diverse range of functions, including autonomous driving, real-time strategy, and drone simulation. These systems featured a diverse range of characteristics (e.g., task structure, lifetime duration), and an immediate challenge faced by the program's testing and evaluation team was measuring system performance across these different settings. This document, developed in close collaboration with DARPA and the program performers, outlines a formalism for constructing and characterizing the performance of agents performing lifelong learning scenarios. In Section 2, we introduce the general form of a lifelong learning scenario.


vehicle-detection by JunshengFu

#artificialintelligence

Anaconda is used for managing my dependencies. You can download the weight from here and save it to the weights folder. The code for this step is contained in the function named extract_features and codes from line 464 to 552 in svm_pipeline.py. If the SVM classifier exist, load it directly. Otherwise, I started by reading in all the vehicle and non-vehicle images, around 8000 images in each category.


Robot Planning

AI Magazine

Drew McDermott Research on planning for robots is in such a state of flux that there is disagreement about what planning is and whether it is necessary. We can take planning to be the optimization and debugging of a robot's program by reasoning about possible courses of execution. It is necessary to the extent that fragments of robot programs are combined at run time. There are several strands of research in the field; I survey six: (1) attempts to avoid planning; (2) the design of flexible plan notations; (3) theories of time-constrained planning; (4) planning by projecting and repairing faulty plans; (5) motion planning; and (6) the learning of optimal behaviors from reinforcements. More research is needed on formal semantics for robot plans.