AITopics | Dickinson, Sven

Collaborating Authors

Dickinson, Sven

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Probabilistic Directed Distance Fields for Ray-Based Shape Representations

Aumentado-Armstrong, Tristan, Tsogkas, Stavros, Dickinson, Sven, Jepson, Allan

arXiv.org Artificial IntelligenceApr-13-2024

In modern computer vision, the optimal representation of 3D shape continues to be task-dependent. One fundamental operation applied to such representations is differentiable rendering, as it enables inverse graphics approaches in learning frameworks. Standard explicit shape representations (voxels, point clouds, or meshes) are often easily rendered, but can suffer from limited geometric fidelity, among other issues. On the other hand, implicit representations (occupancy, distance, or radiance fields) preserve greater fidelity, but suffer from complex or inefficient rendering processes, limiting scalability. In this work, we devise Directed Distance Fields (DDFs), a novel neural shape representation that builds upon classical distance fields. The fundamental operation in a DDF maps an oriented point (position and direction) to surface visibility and depth. This enables efficient differentiable rendering, obtaining depth with a single forward pass per pixel, as well as differential geometric quantity extraction (e.g., surface normals), with only additional backward passes. Using probabilistic DDFs (PDDFs), we show how to model inherent discontinuities in the underlying field. We then apply DDFs to several applications, including single-shape fitting, generative modelling, and single-image 3D reconstruction, showcasing strong performance with simple architectural components via the versatility of our representation. Finally, since the dimensionality of DDFs permits view-dependent geometric artifacts, we conduct a theoretical investigation of the constraints necessary for view consistency. We find a small set of field properties that are sufficient to guarantee a DDF is consistent, without knowing, for instance, which shape the field is expressing.

artificial intelligence, image understanding, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2404.09081

Country:

North America > United States (1.00)
North America > Canada > Ontario > Toronto (0.14)
Oceania > Australia > Western Australia > North West Shelf (0.14)
Asia > Middle East > Israel > Mediterranean Sea (0.14)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Vision > Image Understanding (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Fast-Grasp'D: Dexterous Multi-finger Grasp Generation Through Differentiable Simulation

Turpin, Dylan, Zhong, Tao, Zhang, Shutong, Zhu, Guanglei, Liu, Jingzhou, Singh, Ritvik, Heiden, Eric, Macklin, Miles, Tsogkas, Stavros, Dickinson, Sven, Garg, Animesh

arXiv.org Artificial IntelligenceJun-13-2023

Multi-finger grasping relies on high quality training data, which is hard to obtain: human data is hard to transfer and synthetic data relies on simplifying assumptions that reduce grasp quality. By making grasp simulation differentiable, and contact dynamics amenable to gradient-based optimization, we accelerate the search for high-quality grasps with fewer limiting assumptions. We present Grasp'D-1M: a large-scale dataset for multi-finger robotic grasping, synthesized with Fast- Grasp'D, a novel differentiable grasping simulator. Grasp'D- 1M contains one million training examples for three robotic hands (three, four and five-fingered), each with multimodal visual inputs (RGB+depth+segmentation, available in mono and stereo). Grasp synthesis with Fast-Grasp'D is 10x faster than GraspIt! and 20x faster than the prior Grasp'D differentiable simulator. Generated grasps are more stable and contact-rich than GraspIt! grasps, regardless of the distance threshold used for contact generation. We validate the usefulness of our dataset by retraining an existing vision-based grasping pipeline on Grasp'D-1M, and showing a dramatic increase in model performance, predicting grasps with 30% more contact, a 33% higher epsilon metric, and 35% lower simulated displacement. Additional details at https://dexgrasp.github.io.

artificial intelligence, graspit, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2306.08132

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots > Manipulation (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.54)

Add feedback

Disentangling Geometric Deformation Spaces in Generative Latent Shape Models

Aumentado-Armstrong, Tristan, Tsogkas, Stavros, Dickinson, Sven, Jepson, Allan

arXiv.org Artificial IntelligenceMar-18-2023

A complete representation of 3D objects requires characterizing the space of deformations in an interpretable manner, from articulations of a single instance to changes in shape across categories. In this work, we improve on a prior generative model of geometric disentanglement for 3D shapes, wherein the space of object geometry is factorized into rigid orientation, non-rigid pose, and intrinsic shape. The resulting model can be trained from raw 3D shapes, without correspondences, labels, or even rigid alignment, using a combination of classical spectral geometry and probabilistic disentanglement of a structured latent representation space. Our improvements include more sophisticated handling of rotational invariance and the use of a diffeomorphic flow network to bridge latent and spectral space. The geometric structuring of the latent space imparts an interpretable characterization of the deformation space of an object. Furthermore, it enables tasks like pose transfer and pose-aware retrieval without requiring supervision. We evaluate our model on its generative modelling, representation learning, and disentanglement performance, showing improved rotation invariance and intrinsic-extrinsic factorization quality over the prior model.

artificial intelligence, machine learning, rotation, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/s11263-023-01750-9

2103.00142

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States (0.14)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)

Add feedback

3D Object Detection and Viewpoint Estimation with a Deformable 3D Cuboid Model

Fidler, Sanja, Dickinson, Sven, Urtasun, Raquel

Neural Information Processing SystemsFeb-14-2020, 21:58:10 GMT

artificial intelligence, cuboid model, viewpoint estimation, (1 more...)

Neural Information Processing Systems

Genre: Research Report (0.43)

Technology:

Information Technology > Artificial Intelligence > Vision (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.43)

Add feedback

3D Object Detection and Viewpoint Estimation with a Deformable 3D Cuboid Model

Fidler, Sanja, Dickinson, Sven, Urtasun, Raquel

Neural Information Processing SystemsDec-31-2012

This paper addresses the problem of category-level 3D object detection. Given a monocular image, our aim is to localize the objects in 3D by enclosing them with tight oriented 3D bounding boxes. We propose a novel approach that extends the well-acclaimed deformable part-based model[Felz.] to reason in 3D. Our model represents an object class as a deformable 3D cuboid composed of faces and parts, which are both allowed to deform with respect to their anchors on the 3D box. We model the appearance of each face in fronto-parallel coordinates, thus effectively factoring out the appearance variation induced by viewpoint. Our model reasons about face visibility patters called aspects. We train the cuboid model jointly and discriminatively and share weights across all aspects to attain efficiency. Inference then entails sliding and rotating the box in 3D and scoring object hypotheses. While for inference we discretize the search space, the variables are continuous in our model. We demonstrate the effectiveness of our approach in indoor and outdoor scenarios, and show that our approach outperforms the state-of-the-art in both 2D[Felz09] and 3D object detection[Hedau12].

artificial intelligence, detection, inductive learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.34)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
(2 more...)

Add feedback

Video In Sentences Out

Barbu, Andrei, Bridge, Alexander, Burchill, Zachary, Coroian, Dan, Dickinson, Sven, Fidler, Sanja, Michaux, Aaron, Mussman, Sam, Narayanaswamy, Siddharth, Salvi, Dhaval, Schmidt, Lara, Shangguan, Jiangnan, Siskind, Jeffrey Mark, Waggoner, Jarrell, Wang, Song, Wei, Jinlian, Yin, Yifan, Zhang, Zhiqi

arXiv.org Artificial IntelligenceApr-12-2012

We present a system that produces sentential descriptions of video: who did what to whom, and where and how they did it. Action class is rendered as a verb, participant objects as noun phrases, properties of those objects as adjectival modifiers in those noun phrases,spatial relations between those participants as prepositional phrases, and characteristics of the event as prepositional-phrase adjuncts and adverbial modifiers. Extracting the information needed to render these linguistic entities requires an approach to event recognition that recovers object tracks, the track-to-role assignments, and changing body posture.

action class, artificial intelligence, natural language, (20 more...)

arXiv.org Artificial Intelligence

1204.2742

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > South Carolina > Richland County > Columbia (0.14)
North America > United States > Indiana > Tippecanoe County (0.14)

Industry: Government > Military (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)

Add feedback