object class
Export Reviews, Discussions, Author Feedback and Meta-Reviews
This paper addresses the problem of generating 3D object proposals given a stereo image pair from an autonomous driving vehicle. The paper proposes a set of features for a 3D cuboid over a point cloud and ground plane derived from the stereo image pair. The features include point cloud density, free space, object height prior, and object height relative to its surroundings. Note that the features are dependant on knowledge of the object class (other "objectness" proposal methods are agnostic to the object class). A structural SVM is trained to predict the "objectness" of the 3D cuboid proposal.
Methodology for a Statistical Analysis of Influencing Factors on 3D Object Detection Performance
Kuznietsov, Anton, Schweickard, Dirk, Peters, Steven
In autonomous driving, object detection is an essential task to perceive the environment by localizing and classifying objects. Most object detection algorithms rely on deep learning for their superior performance. However, their black box nature makes it challenging to ensure safety. In this paper, we propose a first-of-its-kind methodology for statistical analysis of the influence of various factors related to the objects to detect or the environment on the detection performance of both LiDAR- and camera-based 3D object detectors. We perform a univariate analysis between each of the factors and the detection error in order to compare the strength of influence. To better identify potential sources of detection errors, we also analyze the performance in dependency of the influencing factors and examine the interdependencies between the different influencing factors. Recognizing the factors that influence detection performance helps identify robustness issues in the trained object detector and supports the safety approval of object detection systems.
Unveiling Objects with SOLA: An Annotation-Free Image Search on the Object Level for Automotive Data Sets
Rigoll, Philipp, Langner, Jacob, Sax, Eric
Huge image data sets are the fundament for the development of the perception of automated driving systems. A large number of images is necessary to train robust neural networks that can cope with diverse situations. A sufficiently large data set contains challenging situations and objects. For testing the resulting functions, it is necessary that these situations and objects can be found and extracted from the data set. While it is relatively easy to record a large amount of unlabeled data, it is far more difficult to find demanding situations and objects. However, during the development of perception systems, it must be possible to access challenging data without having to perform lengthy and time-consuming annotations. A developer must therefore be able to search dynamically for specific situations and objects in a data set. Thus, we designed a method which is based on state-of-the-art neural networks to search for objects with certain properties within an image. For the ease of use, the query of this search is described using natural language. To determine the time savings and performance gains, we evaluated our method qualitatively and quantitatively on automotive data sets.
Unsupervised Learning of a Probabilistic Grammar for Object Detection and Parsing
We describe an unsupervised method for learning a probabilistic grammar of an object from a set of training examples. Our approach is invariant to the scale and rotation of the objects. We illustrate our approach using thirteen objects from the Caltech 101 database. In addition, we learn the model of a hybrid object class where we do not know the specific object or its position, scale or pose. This is illustrated by learning a hybrid class consisting of faces, motorbikes, and airplanes.
ScrewNet: Category-Independent Articulation Model Estimation From Depth Images Using Screw Theory
Jain, Ajinkya, Lioutikov, Rudolf, Niekum, Scott
Robots in human environments will need to interact with a wide variety of articulated objects such as cabinets, drawers, and dishwashers while assisting humans in performing day-to-day tasks. Existing methods either require objects to be textured or need to know the articulation model category a priori for estimating the model parameters for an articulated object. We propose ScrewNet, a novel approach that estimates an object's articulation model directly from depth images without requiring a priori knowledge of the articulation model category. ScrewNet uses screw theory to unify the representation of different articulation types and perform category-independent articulation model estimation. We evaluate our approach on two benchmarking datasets and compare its performance with a current state-of-the-art method. Results demonstrate that ScrewNet can successfully estimate the articulation models and their parameters for novel objects across articulation model categories with better on average accuracy than the prior state-of-the-art method.
Nothing but NumPy: Understanding & Creating Binary Classification Neural Networks withโฆ
Nothing but Numpy is a continuation of my neural network series. To view the previous blog in this series or for a refresher on neural networks you may click here. This post continues from Understanding and Creating Neural Networks with Computational Graphs from Scratch. It's easy to feel lost when you have twenty browser tabs open trying to understand a complex concept and most of the writeups you come across regurgitate the same shallow explanations. In this second installment of Nothing but NumPy, I'll again strive to give the reader a deeper understanding of neural networks as we delve deeper into a specific kind of neural network called a "Binary Classification Neural Network".
Visualizing and Understanding Generative Adversarial Networks (Extended Abstract)
Bau, David, Zhu, Jun-Yan, Strobelt, Hendrik, Zhou, Bolei, Tenenbaum, Joshua B., Freeman, William T., Torralba, Antonio
The ability of generative adversarial networks to render nearly photorealistic images leads us to ask: What does a GAN know? For example, when a GAN generates a door on a building but not in a tree (Figure 1a), we wish to understand whether such structure emerges as pure pixel patterns without explicitrepresentation, or if the GAN contains internal variables that correspond to human-perceived objects such as doors, buildings, and trees. And when a GAN generates an unrealistic image (Figure 1f), we want to know if the mistake is caused by specific variables in the network. We present a method for visualizing and understanding GANs at different levels of abstraction, from each neuron, to each object, to the relationship between different objects. Beginning witha Progressive GAN (Karras et al., 2018) trained to generate scenes (Figure 1b), we first identify a group of interpretable units that are related to semantic classes (Figure 1a,Figure 2). These units' featuremaps closely match the semantic segmentation of a particular object class (e.g., doors). Then, we directly intervene within the network to identify sets of units that cause a type of object to disappear (Figure1c) or appear (Figure 1d). Finally, we study contextual relationships by observing where we can insert the object concepts in new images and how this intervention interacts with other objects in the image (Figure 1d, Figure 8). This framework enables several applications: comparing internal representationsacross different layers, GAN variants, and datasets (Figure 2); debugging and improving GANs by locating and ablating artifact-causing units (Figure 1e,f,g); understanding contextual relationships between objects in natural scenes (Figure 8,Figure 9); and manipulating images with interactive object-level control (video).
Taskonomy: Disentangling Task Transfer Learning
Zamir, Amir, Sax, Alexander, Shen, William, Guibas, Leonidas, Malik, Jitendra, Savarese, Silvio
Do visual tasks have a relationship, or are they unrelated? For instance, could having surface normals simplify estimating the depth of an image? Intuition answers these questions positively, implying existence of a structure among visual tasks. Knowing this structure has notable values; it is the concept underlying transfer learning and provides a principled way for identifying redundancies across tasks, e.g., to seamlessly reuse supervision among related tasks or solve many tasks in one system without piling up the complexity. We proposes a fully computational approach for modeling the structure of space of visual tasks. This is done via finding (first and higher-order) transfer learning dependencies across a dictionary of twenty six 2D, 2.5D, 3D, and semantic tasks in a latent space. The product is a computational taxonomic map for task transfer learning. We study the consequences of this structure, e.g. nontrivial emerged relationships, and exploit them to reduce the demand for labeled data. For example, we show that the total number of labeled datapoints needed for solving a set of 10 tasks can be reduced by roughly 2/3 (compared to training independently) while keeping the performance nearly the same. We provide a set of tools for computing and probing this taxonomical structure including a solver that users can employ to devise efficient supervision policies for their use cases.
Unsurpervised Learning in Hybrid Cognitive Architectures
Vinokurov, Yury (Carnegie Mellon University) | Lebiere, Christian (Carnegie Mellon University) | Wyatte, Dean ( University of Colorado, Boulder ) | Herd, Seth (University of Colorado, Boulder) | O' (University of Colorado, Boulder) | Reilly, Randall
We present a model of unsupervised learning in the hybrid SAL (Synthesis of ACT-R and Leabra) architecture. This model follows the hypothesis that higher evaluative cognitive mechanisms can serve to provide training signals for perceptual learning. This addresses the problem that supervised learning seems necessary for strong perceptual performance, but explicit feedback is rare in the real world and difficult to provide for artificial learning systems. The hybrid model couples the perceptual strengths of Leabra with ACT-R's cognitive mechanisms, specifically its declarative memory, to evolve its own symbolic representations of objects encountered in the world. This is accomplished by presenting the objects to the Leabra visual system and committing the resulting representation to ACT-R's declarative memory. Subsequent presentations are either recalled as instances of a previous object category, in which case the positive association with the representation is rehearsed by Leabra, or they cause ACT-R to generate new category labels, which are also subject to the same rehearsal. The rehearsals drive the network's representations to convergence for a given category; at the same time, rehearsals on the ACT-R side reinforce the chunks that encode the associations between representation and label. In this way, the hybrid model bootstraps itself into learning new categories and their associated features; this framework provides a potential approach to solving the symbol grounding problem. We outline the operations of the hybrid model, evaluate its performance on the CU3D-100 (cu3d.colorado.edu) image set, and discuss further potential improvements to the model, including the integration of motor functions as a way of providing an internal feedback signal to augment and guide a purely bottom-up unsupervised system.