Goto

Collaborating Authors

 Object-Oriented Architecture


Mix and Reason: Reasoning over Semantic Topology with Data Mixing for Domain Generalization

arXiv.org Artificial Intelligence

Domain generalization (DG) enables generalizing a learning machine from multiple seen source domains to an unseen target one. The general objective of DG methods is to learn semantic representations that are independent of domain labels, which is theoretically sound but empirically challenged due to the complex mixture of common and domain-specific factors. Although disentangling the representations into two disjoint parts has been gaining momentum in DG, the strong presumption over the data limits its efficacy in many real-world scenarios. In this paper, we propose Mix and Reason (MiRe), a new DG framework that learns semantic representations via enforcing the structural invariance of semantic topology. MiRe consists of two key components, namely, Category-aware Data Mixing (CDM) and Adaptive Semantic Topology Refinement (ASTR). CDM mixes two images from different domains in virtue of activation maps generated by two complementary classification losses, making the classifier focus on the representations of semantic objects. ASTR introduces relation graphs to represent semantic topology, which is progressively refined via the interactions between local feature aggregation and global cross-domain relational reasoning. Experiments on multiple DG benchmarks validate the effectiveness and robustness of the proposed MiRe.


Object-Category Aware Reinforcement Learning

arXiv.org Artificial Intelligence

Object-oriented reinforcement learning (OORL) is a promising way to improve the sample efficiency and generalization ability over standard RL. Recent works that try to solve OORL tasks without additional feature engineering mainly focus on learning the object representations and then solving tasks via reasoning based on these object representations. However, none of these works tries to explicitly model the inherent similarity between different object instances of the same category. Objects of the same category should share similar functionalities; therefore, the category is the most critical property of an object. Following this insight, we propose a novel framework named Object-Category Aware Reinforcement Learning (OCARL), which utilizes the category information of objects to facilitate both perception and reasoning. OCARL consists of three parts: (1) Category-Aware Unsupervised Object Discovery (UOD), which discovers the objects as well as their corresponding categories; (2) Object-Category Aware Perception, which encodes the category information and is also robust to the incompleteness of (1) at the same time; (3) Object-Centric Modular Reasoning, which adopts multiple independent and object-category-specific networks when reasoning based on objects. Our experiments show that OCARL can improve both the sample efficiency and generalization in the OORL domain.


REMS: Middleware for Robotics Education and Development

arXiv.org Artificial Intelligence

This paper introduces REMS, a robotics middleware and control framework that is designed to introduce the Zen of Python to robotics and to improve robotics education and development flow. Although existing middleware can serve hardware abstraction and modularity, setting up environments and learning middleware-specific syntax and procedures are less viable in education. They can curb opportunities to understand robotics concepts, theories, and algorithms. Robotics is a field of integration; students and developers from various backgrounds will be involved in programming. Establishing Pythonic and object-oriented robotic framework in a natural way can enhance modular and abstracted programming for better readability, reusability, and simplicity, but also supports useful and practical skills generally in coding. REMS is to be a valuable robot educational medium not just as a tool and to be a platform from one robot to multi-agent across hardware, simulation, and analytical model implementations.


LOCL: Learning Object-Attribute Composition using Localization

arXiv.org Artificial Intelligence

Human visual reasoning allows us to leverage prior visual experience to recognize previously unseen Object-Attribute (O-A) relationships. Predicting such complex relationships of novel O-A compositions - referred to as Composition Zero Shot Learning (CZSL) [17, 19, 21, 22, 25, 28, 33, 36]-is an active area of research. There has been significant progress on CZSL methods in recent years, however, as our experiments demonstrate, their performance degrades in natural cluttered scenes, as illustrated in Fig.1. The main reason in these cases is the interference from the other potential confusing elements. For example, in Figure 1(B.1), the SOTA methods are not able to detect the object of interest given its size relative to image; and while the bird is the object of interest in Figure 1(B.2), the surrounding context dominated by the green leaves results in an incorrect association of the color attribute to the object. The poor performance of the SOTA methods can be attributed to the dominant confounding elements thereby impeding the right O-A composition prediction. This in turn is due to the bias towards seen O-A composition during training time. Generalization to more realistic cases as seen in Figure 1(B) is crucial for the widespread use of CZSL.


GenSDF: Two-Stage Learning of Generalizable Signed Distance Functions

arXiv.org Artificial Intelligence

We investigate the generalization capabilities of neural signed distance functions (SDFs) for learning 3D object representations for unseen and unlabeled point clouds. Existing methods can fit SDFs to a handful of object classes and boast fine detail or fast inference speeds, but do not generalize well to unseen shapes. We introduce a two-stage semi-supervised meta-learning approach that transfers shape priors from labeled to unlabeled data to reconstruct unseen object categories. The first stage uses an episodic training scheme to simulate training on unlabeled data and meta-learns initial shape priors. The second stage then introduces unlabeled data with disjoint classes in a semi-supervised scheme to diversify these priors and achieve generalization. We assess our method on both synthetic data and real collected point clouds. Experimental results and analysis validate that our approach outperforms existing neural SDF methods and is capable of robust zero-shot inference on 100+ unseen classes. Code can be found at https://github.com/princeton-computational-imaging/gensdf.


Computer Vision - Richard Szeliski

#artificialintelligence

As humans, we perceive the three-dimensional structure of the world around us with apparent ease. Think of how vivid the three-dimensional percept is when you look at a vase of flowers sitting on the table next to you. You can tell the shape and translucency of each petal through the subtle patterns of light and shading that play across its surface and effortlessly segment each flower from the background of the scene (Figure 1.1). Looking at a framed group por- trait, you can easily count (and name) all of the people in the picture and even guess at their emotions from their facial appearance. Perceptual psychologists have spent decades trying to understand how the visual system works and, even though they can devise optical illusions1 to tease apart some of its principles (Figure 1.3), a complete solution to this puzzle remains elusive (Marr 1982; Palmer 1999; Livingstone 2008).


[100%OFF] Entry-Level, Associate & Professional Python Programming

#artificialintelligence

Are you ready to take the PCEP – Certified Entry-Level Python Programmer exam? The first two exams are in the form of practice tests and consists of 200 questions that may appear during the Certified Entry-Level Python Programmer exam. Where necessary, explanations are added to the questions. This course allows you to confirm your proficiency and give you the confidence you need to earn the PCEP – Certified Entry-Level Python Programmer certification. PCEP – Certified Entry-Level Python Programmer certification shows that the individual is familiar with universal computer programming concepts like data types, containers, functions, conditions, loops, as well as Python programming language syntax, semantics, and the runtime environment.


View-Invariant Localization using Semantic Objects in Changing Environments

arXiv.org Artificial Intelligence

This paper proposes a novel framework for real-time localization and egomotion tracking of a vehicle in a reference map. The core idea is to map the semantic objects observed by the vehicle and register them to their corresponding objects in the reference map. While several recent works have leveraged semantic information for cross-view localization, the main contribution of this work is a view-invariant formulation that makes the approach directly applicable to any viewpoint configuration for which objects are detectable. Another distinctive feature is robustness to changes in the environment/objects due to a data association scheme suited for extreme outlier regimes (e.g., 90% association outliers). To demonstrate our framework, we consider an example of localizing a ground vehicle in a reference object map using only cars as objects. While only a stereo camera is used for the ground vehicle, we consider reference maps constructed a priori from ground viewpoints using stereo cameras and Lidar scans, and georeferenced aerial images captured at a different date to demonstrate the framework's robustness to different modalities, viewpoints, and environment changes. Evaluations on the KITTI dataset show that over a 3.7 km trajectory, localization occurs in 36 sec and is followed by real-time egomotion tracking with an average position error of 8.5 m in a Lidar reference map, and on an aerial object map where 77% of objects are outliers, localization is achieved in 71 sec with an average position error of 7.9 m.


23-year-old rapper Kee Riches fatally shot in Compton over weekend

Los Angeles Times

Kee Riches, a 23-year-old L.A. rapper, was shot and killed in Compton on Saturday night. Riches, whose real name is Kian Nellum, was shot along with another man -- 29-year-old Robert Leflore Jr. -- around 9:40 p.m. on the 1500 block of S. Chester Avenue in Compton, according to the L.A. County Sheriff's Department and L.A. County Medical Examiner-Coroner records. Tributes poured in across the artist's social media accounts upon word of his death. The "2 Live" and "Westside Lady" rapper was known in the area for his love of his community and drive to build it up, much like slain rapper Nipsey Hussle, who was gunned down in 2019. Riches previously told L.A. Taco that the Crenshaw hero, whom he described as "the embodiment of a street soldier, a real hustler," left a similar impact on his own life.


Closing the Loop: Graph Networks to Unify Semantic Objects and Visual Features for Multi-object Scenes

arXiv.org Artificial Intelligence

In Simultaneous Localization and Mapping (SLAM), Loop Closure Detection (LCD) is essential to minimize drift when recognizing previously visited places. Visual Bag-of-Words (vBoW) has been an LCD algorithm of choice for many state-of-the-art SLAM systems. It uses a set of visual features to provide robust place recognition but fails to perceive the semantics or spatial relationship between feature points. Previous work has mainly focused on addressing these issues by combining vBoW with semantic and spatial information from objects in the scene. However, they are unable to exploit spatial information of local visual features and lack a structure that unifies semantic objects and visual features, therefore limiting the symbiosis between the two components. This paper proposes SymbioLCD2, which creates a unified graph structure to integrate semantic objects and visual features symbiotically. Our novel graph-based LCD system utilizes the unified graph structure by applying a Weisfeiler-Lehman graph kernel with temporal constraints to robustly predict loop closure candidates. Evaluation of the proposed system shows that having a unified graph structure incorporating semantic objects and visual features improves LCD prediction accuracy, illustrating that the proposed graph structure provides a strong symbiosis between these two complementary components. It also outperforms other Machine Learning algorithms - such as SVM, Decision Tree, Random Forest, Neural Network and GNN based Graph Matching Networks. Furthermore, it has shown good performance in detecting loop closure candidates earlier than state-of-the-art SLAM systems, demonstrating that extended semantic and spatial awareness from the unified graph structure significantly impacts LCD performance.