AITopics | Object-Oriented Architecture

Collaborating Authors

Object-Oriented Architecture

News Overviews Instructional Materials AI-Alerts Classics

Software-based Automatic Differentiation is Flawed

Johnson, Daniel, Maxfield, Trevor, Jin, Yongxu, Fedkiw, Ronald

arXiv.org Artificial IntelligenceMay-5-2023

Various software efforts embrace the idea that object oriented programming enables a convenient implementation of the chain rule, facilitating so-called automatic differentiation via backpropagation. Such frameworks have no mechanism for simplifying the expressions (obtained via the chain rule) before evaluating them. As we illustrate below, the resulting errors tend to be unbounded.

artificial intelligence, machine learning, object-oriented architecture, (16 more...)

arXiv.org Artificial Intelligence

2305.03863

Country: North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.91)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.55)

Add feedback

Neural Radiance Field Codebooks

Wallingford, Matthew, Kusupati, Aditya, Fang, Alex, Ramanujan, Vivek, Kembhavi, Aniruddha, Mottaghi, Roozbeh, Farhadi, Ali

arXiv.org Artificial IntelligenceApr-30-2023

Compositional representations of the world are a promising step towards enabling high-level scene understanding and efficient transfer to downstream tasks. Learning such representations for complex scenes and tasks remains an open challenge. Towards this goal, we introduce Neural Radiance Field Codebooks (NRC), a scalable method for learning object-centric representations through novel view reconstruction. NRC learns to reconstruct scenes from novel views using a dictionary of object codes which are decoded through a volumetric renderer. This enables the discovery of reoccurring visual and geometric patterns across scenes which are transferable to downstream tasks. We show that NRC representations transfer well to object navigation in THOR, outperforming 2D and 3D representation learning methods by 3.1% success rate. We demonstrate that our approach is able to perform unsupervised segmentation for more complex synthetic (THOR) and real scenes (NYU Depth) better than prior methods (29% relative improvement). Finally, we show that NRC improves on the task of depth ordering by 5.5% accuracy in THOR. Parsing the world at the abstraction of objects is a key characteristic of human perception and reasoning (Rosch et al., 1976; Johnson et al., 2003). Such object-centric representations enable us to infer attributes such as geometry, affordances, and physical properties of objects solely from perception (Spelke, 1990). For example, upon perceiving a cup for the first time one can easily infer how to grasp it, know that it is designed for holding liquid, and estimate the force needed to lift it. Learning such models of the world without explicit supervision remains an open challenge. Unsupervised decomposition of the visual world into objects has been a long-standing challenge (Shi & Malik, 2000). More recent work focuses on reconstructing images from sparse encodings as an objective for learning object-centric representations (Burgess et al., 2019; Greff et al., 2019; Locatello et al., 2020; Lin et al., 2020; Monnier et al., 2021; Smirnov et al., 2021).

artificial intelligence, machine learning, object-oriented architecture, (16 more...)

arXiv.org Artificial Intelligence

2301.04101

Country:

North America > United States (0.14)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.47)

Add feedback

An Extensible Multimodal Multi-task Object Dataset with Materials

Standley, Trevor, Gao, Ruohan, Chen, Dawn, Wu, Jiajun, Savarese, Silvio

arXiv.org Artificial IntelligenceApr-29-2023

We present EMMa, an Extensible, Multimodal dataset of Amazon product listings that contains rich Material annotations. It contains more than 2.8 million objects, each with image(s), listing text, mass, price, product ratings, and position in Amazon's product-category taxonomy. Objects are annotated with one or more materials from this taxonomy. With the numerous attributes available for each object, we develop a Smart Labeling framework to quickly add new binary labels to all objects with very little manual labeling effort, making the dataset extensible. Each object attribute in our dataset can be included in either the model inputs or outputs, leading to combinatorial possibilities in task configurations. For example, we can train a model to predict the object category from the listing text, or the mass and price from the product listing image. EMMa offers a new benchmark for multi-task learning in computer vision and NLP, and allows practitioners to efficiently add new tasks and object attributes at scale. Perhaps the biggest problem faced by machine learning practitioners today is that of producing labeled datasets for their specific needs. Manually labeling large amounts of data is time-consuming and costly (Deng et al., 2009; Lin et al., 2014; Kuznetsova et al., 2020). Furthermore, it is often not possible to communicate how numerous ambiguous corner cases should be handled (e.g., is a hole puncher "sharp"?) to the human annotators we typically rely on to produce these labels. Could we solve this problem with the aid of machine learning? We hypothesized that we could accurately add new properties to every instance in a semi-automated fashion if given a rich dataset with substantial information about every instance. Consequently, we developed EMMa, a large, object-centric, multimodal, and multi-task dataset. We show that EMMa can be easily extended to contain any number of new object labels using a Smart Labeling technique we developed for large multi-task and multimodal datasets. Multi-task datasets contain labels for more than one attribute for each instance, whereas multimodal datasets contain data from more than one modality, such as images, text, audio, and tabular data. Derived from Amazon product listings, EMMa contains images, text, and a number of useful attributes, such as materials, mass, price, product category, and product ratings. Each attribute can be used as either a model input or a model output.

artificial intelligence, machine learning, object-oriented architecture, (18 more...)

arXiv.org Artificial Intelligence

2305.14352

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry:

Media (1.00)
Leisure & Entertainment (1.00)
Health & Medicine (0.93)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Compositional 3D Human-Object Neural Animation

Hou, Zhi, Yu, Baosheng, Tao, Dacheng

arXiv.org Artificial IntelligenceApr-27-2023

Human-object interactions (HOIs) are crucial for human-centric scene understanding applications such as human-centric visual generation, AR/VR, and robotics. Since existing methods mainly explore capturing HOIs, rendering HOI remains less investigated. In this paper, we address this challenge in HOI animation from a compositional perspective, i.e., animating novel HOIs including novel interaction, novel human and/or novel object driven by a novel pose sequence. Specifically, we adopt neural human-object deformation to model and render HOI dynamics based on implicit neural representations. To enable the interaction pose transferring among different persons and objects, we then devise a new compositional conditional neural radiance field (or CC-NeRF), which decomposes the interdependence between human and object using latent codes to enable compositionally animation control of novel HOIs. Experiments show that the proposed method can generalize well to various novel HOI animation settings. Our project page is https://zhihou7.github.io/CHONA/

artificial intelligence, machine learning, object-oriented architecture, (16 more...)

arXiv.org Artificial Intelligence

2304.1407

Country:

North America > United States > Tennessee > Davidson County > Nashville (0.05)
North America > Canada > Quebec > Montreal (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(3 more...)

Genre:

Research Report (0.64)
Overview (0.47)

Technology:

Information Technology > Graphics > Animation (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.46)
(2 more...)

Add feedback

TAX-Pose: Task-Specific Cross-Pose Estimation for Robot Manipulation

Pan, Chuer, Okorn, Brian, Zhang, Harry, Eisner, Ben, Held, David

arXiv.org Artificial IntelligenceApr-20-2023

How do we imbue robots with the ability to efficiently manipulate unseen objects and transfer relevant skills based on demonstrations? End-to-end learning methods often fail to generalize to novel objects or unseen configurations. Instead, we focus on the task-specific pose relationship between relevant parts of interacting objects. We conjecture that this relationship is a generalizable notion of a manipulation task that can transfer to new objects in the same category; examples include the relationship between the pose of a pan relative to an oven or the pose of a mug relative to a mug rack. We call this task-specific pose relationship "cross-pose" and provide a mathematical definition of this concept. We propose a vision-based system that learns to estimate the cross-pose between two objects for a given manipulation task using learned cross-object correspondences. The estimated cross-pose is then used to guide a downstream motion planner to manipulate the objects into the desired pose relationship (placing a pan into the oven or the mug onto the mug rack). We demonstrate our method's capability to generalize to unseen objects, in some cases after training on only 10 demonstrations in the real world. Results show that our system achieves state-of-the-art performance in both simulated and real-world experiments across a number of tasks. Supplementary information and videos can be found at https://sites.google.com/view/tax-pose/home.

artificial intelligence, machine learning, object-oriented architecture, (20 more...)

arXiv.org Artificial Intelligence

2211.09325

Country:

Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.50)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.46)
Information Technology > Artificial Intelligence > Vision > Video Understanding (0.41)

Add feedback

Reason from Context with Self-supervised Learning

Liu, Xiao, Sikarwar, Ankur, Kreiman, Gabriel, Shi, Zenglin, Zhang, Mengmi

arXiv.org Artificial IntelligenceApr-11-2023

Self-supervised learning (SSL) learns to capture discriminative visual features useful for knowledge transfers. To better accommodate the object-centric nature of current downstream tasks such as object recognition and detection, various methods have been proposed to suppress contextual biases or disentangle objects from contexts. Nevertheless, these methods may prove inadequate in situations where object identity needs to be reasoned from associated context, such as recognizing or inferring tiny or obscured objects. As an initial effort in the SSL literature, we investigate whether and how contextual associations can be enhanced for visual reasoning within SSL regimes, by (a) proposing a new Self-supervised method with external memories for Context Reasoning (SeCo), and (b) introducing two new downstream tasks, lift-the-flap and object priming, addressing the problems of "what" and "where" in context reasoning. In both tasks, SeCo outperformed all state-of-the-art (SOTA) SSL methods by a significant margin. Our network analysis revealed that the proposed external memory in SeCo learns to store prior contextual knowledge, facilitating target identity inference in the lift-the-flap task. Moreover, we conducted psychophysics experiments and introduced a Human benchmark in Object Priming dataset (HOP). Our results demonstrate that SeCo exhibits human-like behaviors.

external memory, machine learning, object-oriented architecture, (18 more...)

arXiv.org Artificial Intelligence

2211.12817

Country:

North America > United States (0.04)
Asia > Singapore (0.04)

Genre: Research Report > New Finding (0.86)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
(2 more...)

Add feedback

Head-tail Loss: A simple function for Oriented Object Detection and Anchor-free models

Gallés, Pau, Chen, Xi

arXiv.org Artificial IntelligenceApr-10-2023

This paper presents a new loss function for the prediction of oriented bounding boxes, named head-tail-loss. The loss function consists in minimizing the distance between the prediction and the annotation of two key points that are representing the annotation of the object. The first point is the center point and the second is the head of the object. However, for the second point, the minimum distance between the prediction and either the head or tail of the groundtruth is used. On this way, either prediction is valid (with the head pointing to the tail or the tail pointing to the head). At the end the importance is to detect the direction of the object but not its heading. The new loss function has been evaluated on the DOTA and HRSC2016 datasets and has shown potential for elongated objects such as ships and also for other types of objects with different shapes.

detection, machine learning, object-oriented architecture, (14 more...)

arXiv.org Artificial Intelligence

2304.04503

Country: Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (0.65)

Industry: Leisure & Entertainment > Sports (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.31)

Add feedback

An Object-Oriented Framework for the Simulation of Neural Nets

Neural Information Processing SystemsApr-6-2023, 19:11:25 GMT

The field of software simulators for neural networks has been ex(cid:173) panding very rapidly in the last years but their importance is still being underestimated. They must provide increasing levels of as(cid:173) sistance for the design, simulation and analysis of neural networks. With our object-oriented framework (SESAME) we intend to show that very high degrees of transparency, manageability and flexibil(cid:173) ity for complex experiments can be obtained. SESAME's basic de(cid:173) sign philosophy is inspired by the natural way in which researchers explain their computational models. Experiments are performed with networks of building blocks, which can be extended very eas(cid:173) ily.

neural net, object-oriented framework, simulation, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.81)

Add feedback

Neural Basis of Object-Centered Representations

Neural Information Processing SystemsApr-6-2023, 17:49:05 GMT

We present a neural model that can perform eye movements to a particular side of an object regardless of the position and orienta(cid:173) tion of the object in space, a generalization of a task which has been recently used by Olson and Gettner [4] to investigate the neu(cid:173) ral structure of object-centered representations. Our model uses an intermediate representation in which units have oculocentric recep(cid:173) tive fields- just like collicular neurons- whose gain is modulated by the side of the object to which the movement is directed, as well as the orientation of the object. We show that these gain modulations are consistent with Olson and Gettner's single cell recordings in the supplementary eye field. This demonstrates that it is possible to perform an object-centered task without a representation involv(cid:173) ing an object-centered map, viz., without neurons whose receptive fields are defined in object-centered coordinates. We also show that the same approach can account for object-centered neglect, a situ(cid:173) ation in which patients with a right parietal lesion neglect the left side of objects regardless of the orientation of the objects.

cid, neural basis, object-centered representation, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (1.00)

Add feedback

Contextual Modulation of Target Saliency

Neural Information Processing SystemsApr-6-2023, 16:38:25 GMT

The most popular algorithms for object detection require the use of exhaustive spatial and scale search procedures. In such approaches, an object is defined by means of local features. We present results with real images showing that the proposed scheme is able to accurately predict likely object classes, locations and sizes.

contextual modulation, detection, target saliency

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.75)

Add feedback