Goto

Collaborating Authors

 Object-Oriented Architecture


Open-vocabulary Attribute Detection

arXiv.org Artificial Intelligence

Vision-language modeling has enabled open-vocabulary tasks where predictions can be queried using any text prompt in a zero-shot manner. Existing open-vocabulary tasks focus on object classes, whereas research on object attributes is limited due to the lack of a reliable attribute-focused evaluation benchmark. This paper introduces the Open-Vocabulary Attribute Detection (OVAD) task and the corresponding OVAD benchmark. The objective of the novel task and benchmark is to probe object-level attribute information learned by vision-language models. To this end, we created a clean and densely annotated test set covering 117 attribute classes on the 80 object classes of MS COCO. It includes positive and negative annotations, which enables open-vocabulary evaluation. Overall, the benchmark consists of 1.4 million annotations. For reference, we provide a first baseline method for open-vocabulary attribute detection. Moreover, we demonstrate the benchmark's value by studying the attribute detection performance of several foundation models. Project page https://ovad-benchmark.github.io


Bridging the Gap to Real-World Object-Centric Learning

arXiv.org Artificial Intelligence

Humans naturally decompose their environment into entities at the appropriate level of abstraction to act in the world. Allowing machine learning algorithms to derive this decomposition in an unsupervised way has become an important line of research. However, current methods are restricted to simulated data or require additional information in the form of motion or depth in order to successfully discover objects. In this work, we overcome this limitation by showing that reconstructing features from models trained in a self-supervised manner is a sufficient training signal for object-centric representations to arise in a fully unsupervised way. Our approach, DINOSAUR, significantly out-performs existing image-based object-centric learning models on simulated data and is the first unsupervised object-centric model that scales to real-world datasets such as COCO and PASCAL VOC. DINOSAUR is conceptually simple and shows competitive performance compared to more involved pipelines from the computer vision literature.


FewSOL: A Dataset for Few-Shot Object Learning in Robotic Environments

arXiv.org Artificial Intelligence

We introduce the Few-Shot Object Learning (FewSOL) dataset for object recognition with a few images per object. We captured 336 real-world objects with 9 RGB-D images per object from different views. Object segmentation masks, object poses and object attributes are provided. In addition, synthetic images generated using 330 3D object models are used to augment the dataset. We investigated (i) few-shot object classification and (ii) joint object segmentation and few-shot classification with the state-of-the-art methods for few-shot learning and meta-learning using our dataset. The evaluation results show that there is still a large margin to be improved for few-shot object classification in robotic environments. Our dataset can be used to study a set of few-shot object recognition problems such as classification, detection and segmentation, shape reconstruction, pose estimation, keypoint correspondences and attribute recognition. The dataset and code are available at https://irvlutd.github.io/FewSOL.


Finding Things in the Unknown: Semantic Object-Centric Exploration with an MAV

arXiv.org Artificial Intelligence

Exploration of unknown space with an autonomous mobile robot is a well-studied problem. In this work we broaden the scope of exploration, moving beyond the pure geometric goal of uncovering as much free space as possible. We believe that for many practical applications, exploration should be contextualised with semantic and object-level understanding of the environment for task-specific exploration. Here, we study the task of both finding specific objects in unknown space as well as reconstructing them to a target level of detail. We therefore extend our environment reconstruction to not only consist of a background map, but also object-level and semantically fused submaps. Importantly, we adapt our previous objective function of uncovering as much free space as possible in as little time as possible with two additional elements: first, we require a maximum observation distance of background surfaces to ensure target objects are not missed by image-based detectors because they are too small to be detected. Second, we require an even smaller maximum distance to the found objects in order to reconstruct them with the desired accuracy. We further created a Micro Aerial Vehicle (MAV) semantic exploration simulator based on Habitat in order to quantitatively demonstrate how our framework can be used to efficiently find specific objects as part of exploration. Finally, we showcase this capability can be deployed in real-world scenes involving our drone equipped with an Intel RealSense D455 RGB-D camera.


How to use Normalizing Flows part3(Machine Learning)

#artificialintelligence

Abstract: This work introduces a new task of instance-incremental scene graph generation: Given an empty room of the point cloud, representing it as a graph and automatically increasing novel instances. A graph denoting the object layout of the scene is finally generated. It is an important task since it helps to guide the insertion of novel 3D objects into a real-world scene in vision-based applications like augmented reality. It is also challenging because the complexity of the real-world point cloud brings difficulties in learning object layout experiences from the observation data (non-empty rooms with labeled semantics). We model this task as a conditional generation problem and propose a 3D autoregressive framework based on normalizing flows (3D-ANF) to address it.


Python Object Oriented Programming (OOPs)

#artificialintelligence

If you already know Python basics, then this course is the next step in your Python learning path to becoming a Python programmer. In Python, object-oriented Programming (OOPs) is a programming paradigm that uses objects and classes in programming. It aims to implement real-world entities like inheritance, polymorphisms, encapsulation, etc. in the programming. The central concept of OOPs is to bind the data and the functions that work on that together as a single unit so that no other part of the code can access this data. A class is a collection of objects.


How to code in Python(using paradigms) - DEV Community 👩 💻👨 💻

#artificialintelligence

Programming Paradigms are the different approaches to solving computational problems through programming. In this article, we will be talking about programming Paradigms, why they're an important part of programming, the different programming paradigms that can be applied using python, and how to apply them. Before we delve into programming paradigms, it is crucial to understand the meaning of Paradigms in its basic form, unrelated to computer science, paradigms are essentially the models, guidelines or patterns by which certain objectives are achieved, analogically, they can be likened to how scaffolding serves as the basic structure for buildings. Programming Paradigms are the different styles which a program can be written in a certain programming language, they are the different ways in which code in a given programming language (like Python, Java, JavaScript, etc) can be organised. In simple words, every programming language has a special way (methodologies) in which it's code can be structured and run and these are called programming paradigms, some programming languages only support the use of one paradigm, these are called single paradigm languages while others support multiple paradigms, these are called multi paradigm languages.


Python Programming language Training Institute

#artificialintelligence

Python is a simple and powerful programming language. It has efficient high-level data structures and effective approach towards object-oriented programming. The language possesses syntax and dynamic typing, making it a supreme choice for rapid application development and scripting on almost all the platforms. It can be used for everything from software development, web development, Data Analytics and scientific applications. The programming language is used across some leading companies such as Google, Yahoo, NASA, CERN, etc. Tech giants such as Facebook, Google, and Amazon etc use Python for their business analysis.


A Large-Scale Multilingual Study of Visual Constraints on Linguistic Selection of Descriptions

arXiv.org Artificial Intelligence

We present a large, multilingual study into how vision constrains linguistic choice, covering four languages and five linguistic properties, such as verb transitivity or use of numerals. We propose a novel method that leverages existing corpora of images with captions written by native speakers, and apply it to nine corpora, comprising 600k images and 3M captions. We study the relation between visual input and linguistic choices by training classifiers to predict the probability of expressing a property from raw images, and find evidence supporting the claim that linguistic properties are constrained by visual context across languages. We complement this investigation with a corpus study, taking the test case of numerals. Specifically, we use existing annotations (number or type of objects) to investigate the effect of different visual conditions on the use of numeral expressions in captions, and show that similar patterns emerge across languages. Our methods and findings both confirm and extend existing research in the cognitive literature. We additionally discuss possible applications for language generation.


Joint stereo 3D object detection and implicit surface reconstruction

arXiv.org Artificial Intelligence

We present a new learning-based framework S-3D-RCNN that can recover accurate object orientation in SO(3) and simultaneously predict implicit shapes for outdoor rigid objects from stereo RGB images. In contrast to previous studies that map local appearance to observation angles, we explore a progressive approach by extracting meaningful Intermediate Geometrical Representations (IGRs) to estimate egocentric object orientation. This approach features a deep model that transforms perceived intensities to object part coordinates, which are mapped to a 3D representation encoding object orientation in the camera coordinate system. To enable implicit shape estimation, the IGRs are further extended to model visible object surface with a point-based representation and explicitly addresses the unseen surface hallucination problem. Extensive experiments validate the effectiveness of the proposed IGRs and S-3D-RCNN achieves superior 3D scene understanding performance using existing and proposed new metrics on the KITTI benchmark. Code and pre-trained models will be available at this https URL.