AITopics | scene model

Collaborating Authors

scene model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Magic3D: High-Resolution Text-to-3D Content Creation

Lin, Chen-Hsuan, Gao, Jun, Tang, Luming, Takikawa, Towaki, Zeng, Xiaohui, Huang, Xun, Kreis, Karsten, Fidler, Sanja, Liu, Ming-Yu, Lin, Tsung-Yi

arXiv.org Artificial IntelligenceMar-25-2023

DreamFusion has recently demonstrated the utility of a pre-trained text-to-image diffusion model to optimize Neural Radiance Fields (NeRF), achieving remarkable text-to-3D synthesis results. However, the method has two inherent limitations: (a) extremely slow optimization of NeRF and (b) low-resolution image space supervision on NeRF, leading to low-quality 3D models with a long processing time. In this paper, we address these limitations by utilizing a two-stage optimization framework. First, we obtain a coarse model using a low-resolution diffusion prior and accelerate with a sparse 3D hash grid structure. Using the coarse representation as the initialization, we further optimize a textured 3D mesh model with an efficient differentiable renderer interacting with a high-resolution latent diffusion model. Our method, dubbed Magic3D, can create high quality 3D mesh models in 40 minutes, which is 2x faster than DreamFusion (reportedly taking 1.5 hours on average), while also achieving higher resolution. User studies show 61.7% raters to prefer our approach over DreamFusion. Together with the image-conditioned generation capabilities, we provide users with new ways to control 3D synthesis, opening up new avenues to various creative applications.

artificial intelligence, machine learning, optimization problem, (18 more...)

arXiv.org Artificial Intelligence

2211.1044

Country:

Europe > United Kingdom > England (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > Japan > Honshū > Chūbu > Nagano Prefecture > Nagano (0.04)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment (0.67)
Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)

Add feedback

Text-To-4D Dynamic Scene Generation

Singer, Uriel, Sheynin, Shelly, Polyak, Adam, Ashual, Oron, Makarov, Iurii, Kokkinos, Filippos, Goyal, Naman, Vedaldi, Andrea, Parikh, Devi, Johnson, Justin, Taigman, Yaniv

arXiv.org Artificial IntelligenceJan-26-2023

We present MAV3D (Make-A-Video3D), a Generative models have seen tremendous recent progress, method for generating three-dimensional dynamic and can now generate realistic images from natural language scenes from text descriptions. Our approach uses prompts (Ramesh et al., 2022; Gafni et al., 2022; Rombach a 4D dynamic Neural Radiance Field (NeRF), et al., 2022; Saharia et al., 2022; Yu et al., 2022; Sheynin which is optimized for scene appearance, density, et al., 2022). This success has been extended beyond and motion consistency by querying a Text-to-2D images both temporally to synthesize videos (Singer Video (T2V) diffusion-based model. The dynamic et al., 2022; Ho et al., 2022) and spatially to produce 3D video output generated from the provided text can shapes (Poole et al., 2022; Lin et al., 2022; Nichol et al., be viewed from any camera location and angle, 2022b). However, these two categories of generative models and can be composited into any 3D environment.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2301.1128

Country:

Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
Asia > Japan > Honshū > Chūbu > Nagano Prefecture > Nagano (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Object-level 3D Semantic Mapping using a Network of Smart Edge Sensors

Hau, Julian, Bultmann, Simon, Behnke, Sven

arXiv.org Artificial IntelligenceNov-21-2022

Autonomous robots that interact with their environment require a detailed semantic scene model. For this, volumetric semantic maps are frequently used. The scene understanding can further be improved by including object-level information in the map. In this work, we extend a multi-view 3D semantic mapping system consisting of a network of distributed smart edge sensors with object-level information, to enable downstream tasks that need object-level input. Objects are represented in the map via their 3D mesh model or as an object-centric volumetric sub-map that can model arbitrary object geometry when no detailed 3D model is available. We propose a keypoint-based approach to estimate object poses via PnP and refinement via ICP alignment of the 3D object model with the observed point cloud segments. Object instances are tracked to integrate observations over time and to be robust against temporary occlusions. Our method is evaluated on the public Behave dataset where it shows pose estimation accuracy within a few centimeters and in real-world experiments with the sensor network in a challenging lab environment where multiple chairs and a table are tracked through the scene online, in real time even under high occlusions.

artificial intelligence, machine learning, point cloud segment, (13 more...)

arXiv.org Artificial Intelligence

2211.11354

Country:

Europe > Germany > North Rhine-Westphalia > Cologne Region > Bonn (0.04)
Europe > Italy > Campania > Naples (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Unsupervised Object Learning via Common Fate

Tangemann, Matthias, Schneider, Steffen, von Kügelgen, Julius, Locatello, Francesco, Gehler, Peter, Brox, Thomas, Kümmerer, Matthias, Bethge, Matthias, Schölkopf, Bernhard

arXiv.org Machine LearningOct-13-2021

In human vision, the Principle of Common Fate of Gestalt Psychology (Wertheimer, 2012) has been shown to play an important role for object learning (Spelke, 1990). It posits that elements that are moving together tend to be perceived as one--a perceptual bias that may have evolved to be able to recognize camouflaged predators (Troscianko et al., 2009). In our work, we show that this principle can be successfully used also for machine vision by using it in a multi-stage object learning approach (Figure 1): First, we use unsupervised motion segmentation to obtain a candidate segmentation of a video frame. Second, we train generative object and background models on this segmentation. While the regions obtained by the motion segmentation are caused by objects moving in 3D, only visible parts can be segmented. To learn the actual objects (i.e., the causes), a crucial task for the object model is learning to generalize beyond the occlusions present in its input data. To measure success, we provide a dataset including object ground truth. As the last stage, we show that the learned object and background models can be combined into a flexible scene model that allows sampling manipulated novel scenes. Thus, in contrast to existing object-centric models trained end-to-end, our work aims at decomposing object learning into evaluable subproblems and testing the potential of exploiting object motions for building scalable object-centric models that allow for causally meaningful interventions in generation.

dataset, object model, segmentation, (14 more...)

arXiv.org Machine Learning

2110.06562

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
North America > United States > New York (0.04)
(6 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.67)

Add feedback

Aligned Scene Modeling of a Robot's Vista Space — An Evaluation

Swadzba, Agnes (Bielefeld University) | Wachsmuth, Sven (Bielefeld University)

AAAI ConferencesAug-8-2011

One kind of meaningful structures in indoor rooms are supporting structures like tables and cupboards. A robot will need to know these structures for a natural interaction with the human and the environment. As bottom-up detection of such structures is a challenging problem, we propose to estimate potential supporting structures from a spatial description like ``a bowl on the table''. As language and cognition schematize the space in the same way it is possible to estimate the representation of the space underlying a scene description. To do so, we introduce the aligned modeling approach which consists of rules transforming a sequence of object relations into a set of trees and a methodology to ground the abstract representation of the scene layout in the current perception using detectors for small movable objects and an extraction of planar surfaces. An analysis of 30 descriptions shows the robustness of our approach to a variety of description strategies and object detection errors.

artificial intelligence, relation, spatial reasoning, (17 more...)

AAAI Conferences

Workshops at the Twenty-Fifth AAAI Conference on Artificial Intelligence

Country: Europe > Germany (0.04)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.68)

Add feedback

Large Margin Learning of Upstream Scene Understanding Models

Zhu, Jun, Li, Li-jia, Fei-fei, Li, Xing, Eric P.

Neural Information Processing SystemsDec-31-2010

Upstream supervised topic models have been widely used for complicated scene understanding. However, existing maximum likelihood estimation (MLE) schemes can make the prediction model learning independent of latent topic discovery and result in an imbalanced prediction rule for scene classification. This paper presents a joint max-margin and max-likelihood learning method for upstream scene understanding models, in which latent topic discovery and prediction model estimation are closely coupled and well-balanced. The optimization problem is efficiently solved with a variational EM procedure, which iteratively solves an online loss-augmented SVM. We demonstrate the advantages of the large-margin approach on both an 8-category sports dataset and the 67-class MIT indoor scene dataset for scene categorization.

artificial intelligence, machine learning, scene model, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Industry: Leisure & Entertainment (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.87)

Add feedback