Goto

Collaborating Authors

 Object-Oriented Architecture


Tips for making the most of 64-bit architectures in langage design, libraries or garbage collection

arXiv.org Artificial Intelligence

The 64-bit architectures that have become standard today offer unprecedented low-level programming possibilities. For the first time in the history of computing, the size of address registers far exceeded the physical capacity of their bus.After a brief reminder of the possibilities offered by the small size of addresses compared to the available 64 bits,we develop three concrete examples of how the vacant bits of these registers can be used.Among these examples, two of them concern the implementation of a library for a new statically typed programming language.Firstly, the implementation of multi-precision integers, with the aim of improving performance in terms of both calculation speed and RAM savings.The second example focuses on the library's handling of UTF-8 character strings.Here, the idea is to make indexing easier by ignoring the physical size of each UTF-8 characters.Finally, the third example is a possible enhancement of garbage collectors, in particular the mark \& sweep for the object marking phase.


Model of models -- Part 1

arXiv.org Artificial Intelligence

This paper proposes a new cognitive model, acting as the main component of an AGI agent. The model is introduced in its mature intelligence state, and as an extension of previous models, DENN, and especially AKREM, by including operational models (frames/classes) and will. This model's core assumption is that cognition is about operating on accumulated knowledge, with the guidance of an appropriate will. Also, we assume that the actions, part of knowledge, are learning to be aligned with will, during the evolution phase that precedes the mature intelligence state. In addition, this model is mainly based on the duality principle in every known intelligent aspect, such as exhibiting both top-down and bottom-up model learning, generalization verse specialization, and more. Furthermore, a holistic approach is advocated for AGI designing, and cognition under constraints or efficiency is proposed, in the form of reusability and simplicity. Finally, reaching this mature state is described via a cognitive evolution from infancy to adulthood, utilizing a consolidation principle. The final product of this cognitive model is a dynamic operational memory of models and instances. Lastly, some examples and preliminary ideas for the evolution phase to reach the mature state are presented.


Unifying Foundation Models with Quadrotor Control for Visual Tracking Beyond Object Categories

arXiv.org Artificial Intelligence

Visual control enables quadrotors to adaptively navigate using real-time sensory data, bridging perception with action. Yet, challenges persist, including generalization across scenarios, maintaining reliability, and ensuring real-time responsiveness. This paper introduces a perception framework grounded in foundation models for universal object detection and tracking, moving beyond specific training categories. Integral to our approach is a multi-layered tracker integrated with the foundation detector, ensuring continuous target visibility, even when faced with motion blur, abrupt light shifts, and occlusions. Complementing this, we introduce a model-free controller tailored for resilient quadrotor visual tracking. Our system operates efficiently on limited hardware, relying solely on an onboard camera and an inertial measurement unit. Through extensive validation in diverse challenging indoor and outdoor environments, we demonstrate our system's effectiveness and adaptability. In conclusion, our research represents a step forward in quadrotor visual tracking, moving from task-specific methods to more versatile and adaptable operations.


Learn Python for less than $25 through October 15th

PCWorld

Python is the world's most popular programming language not just because it's relatively easy to learn, but because it's used in so many applications and it's highly scalable. If you've ever wanted to learn to code, starting with Python is a great idea. And it's an even better idea now because during Deal Days, you can get The Premium Python Programming Certification Bundle for $23.97 (reg. This bundle includes ten courses from top instructors like Joe Rahl (4.6/5-star instructor rating) and Edouard Renard (4.6/5-star rating). Starting with the absolute basics of Python, you'll learn basic coding principles, explore Object-Oriented Programming (OOP), and much more as you slowly level up your skills.


Soda: An Object-Oriented Functional Language for Specifying Human-Centered Problems

arXiv.org Artificial Intelligence

We present Soda (Symbolic Objective Descriptive Analysis), a language that helps to treat qualities and quantities in a natural way and greatly simplifies the task of checking their correctness. We present key properties for the language motivated by the design of a descriptive language to encode complex requirements on computer systems, and we explain how these key properties must be addressed to model these requirements with simple definitions. We give an overview of a tool that helps to describe problems in an easy way that we consider more transparent and less error-prone.


Towards Robust Robot 3D Perception in Urban Environments: The UT Campus Object Dataset

arXiv.org Artificial Intelligence

We introduce the UT Campus Object Dataset (CODa), a mobile robot egocentric perception dataset collected on the University of Texas Austin Campus. Our dataset contains 8.5 hours of multimodal sensor data: synchronized 3D point clouds and stereo RGB video from a 128-channel 3D LiDAR and two 1.25MP RGB cameras at 10 fps; RGB-D videos from an additional 0.5MP sensor at 7 fps, and a 9-DOF IMU sensor at 40 Hz. We provide 58 minutes of ground-truth annotations containing 1.3 million 3D bounding boxes with instance IDs for 53 semantic classes, 5000 frames of 3D semantic annotations for urban terrain, and pseudo-ground truth localization. We repeatedly traverse identical geographic locations for a wide range of indoor and outdoor areas, weather conditions, and times of the day. Using CODa, we empirically demonstrate that: 1) 3D object detection performance in urban settings is significantly higher when trained using CODa compared to existing datasets even when employing state-of-the-art domain adaptation approaches, 2) sensor-specific fine-tuning improves 3D object detection accuracy and 3) pretraining on CODa improves cross-dataset 3D object detection performance in urban settings compared to pretraining on AV datasets. Using our dataset and annotations, we release benchmarks for 3D object detection and 3D semantic segmentation using established metrics. In the future, the CODa benchmark will include additional tasks like unsupervised object discovery and re-identification. We publicly release CODa on the Texas Data Repository, pre-trained models, dataset development package, and interactive dataset viewer on our website at https://amrl.cs.utexas.edu/coda. We expect CODa to be a valuable dataset for research in egocentric 3D perception and planning for autonomous navigation in urban environments.


ConSOR: A Context-Aware Semantic Object Rearrangement Framework for Partially Arranged Scenes

arXiv.org Artificial Intelligence

Object rearrangement is the problem of enabling a robot to identify the correct object placement in a complex environment. Prior work on object rearrangement has explored a diverse set of techniques for following user instructions to achieve some desired goal state. Logical predicates, images of the goal scene, and natural language descriptions have all been used to instruct a robot in how to arrange objects. In this work, we argue that burdening the user with specifying goal scenes is not necessary in partially-arranged environments, such as common household settings. Instead, we show that contextual cues from partially arranged scenes (i.e., the placement of some number of pre-arranged objects in the environment) provide sufficient context to enable robots to perform object rearrangement \textit{without any explicit user goal specification}. We introduce ConSOR, a Context-aware Semantic Object Rearrangement framework that utilizes contextual cues from a partially arranged initial state of the environment to complete the arrangement of new objects, without explicit goal specification from the user. We demonstrate that ConSOR strongly outperforms two baselines in generalizing to novel object arrangements and unseen object categories. The code and data can be found at https://github.com/kartikvrama/consor.


LGMCTS: Language-Guided Monte-Carlo Tree Search for Executable Semantic Object Rearrangement

arXiv.org Artificial Intelligence

We introduce a novel approach to the executable semantic object rearrangement problem. In this challenge, a robot seeks to create an actionable plan that rearranges objects within a scene according to a pattern dictated by a natural language description. Unlike existing methods such as StructFormer and StructDiffusion, which tackle the issue in two steps by first generating poses and then leveraging a task planner for action plan formulation, our method concurrently addresses pose generation and action planning. We achieve this integration using a Language-Guided Monte-Carlo Tree Search (LGMCTS). Quantitative evaluations are provided on two simulation datasets, and complemented by qualitative tests with a real robot.


Preferential Multi-Target Search in Indoor Environments using Semantic SLAM

arXiv.org Artificial Intelligence

In recent years, the demand for service robots capable of executing tasks beyond autonomous navigation has grown. In the future, service robots will be expected to perform complex tasks like 'Set table for dinner'. High-level tasks like these, require, among other capabilities, the ability to retrieve multiple targets. This paper delves into the challenge of locating multiple targets in an environment, termed 'Find my Objects.' We present a novel heuristic designed to facilitate robots in conducting a preferential search for multiple targets in indoor spaces. Our approach involves a Semantic SLAM framework that combines semantic object recognition with geometric data to generate a multi-layered map. We fuse the semantic maps with probabilistic priors for efficient inferencing. Recognizing the challenges introduced by obstacles that might obscure a navigation goal and render standard point-to-point navigation strategies less viable, our methodology offers resilience to such factors. Importantly, our method is adaptable to various object detectors, RGB-D SLAM techniques, and local navigation planners. We demonstrate the 'Find my Objects' task in real-world indoor environments, yielding quantitative results that attest to the effectiveness of our methodology. This strategy can be applied in scenarios where service robots need to locate, grasp, and transport objects, taking into account user preferences. For a brief summary, please refer to our video: https://tinyurl.com/PrefTargetSearch


Object-oriented mapping in dynamic environments

arXiv.org Artificial Intelligence

Grid maps, especially occupancy grid maps, are ubiquitous in many mobile robot applications. To simplify the process of learning the map, grid maps subdivide the world into a grid of cells, whose occupancies are independently estimated using only measurements in the perceptual field of the particular cell. However, the world consists of objects that span multiple cells, which means that measurements falling onto a cell provide evidence on the occupancy of other cells belonging to the same object. This correlation is not captured by current models. In this work, we present a way to generalize the update of grid maps relaxing the assumption of independence by modeling the relationship between the measurements and the occupancy of each cell as a set of latent variables, and jointly estimating those variables and the posterior of the map. Additionally, we propose a method to estimate the latent variables by clustering based on semantic labels and an extension to the Normal Distributions Transfer Occupancy Map (NDT-OM) to facilitate the proposed map update method. We perform comprehensive experiments of map creation and localization with real world data sets, and show that the proposed method creates better maps in highly dynamic environments compared to state-of-the-art methods. Finally, we demonstrate the ability of the proposed method to remove occluded objects from the map in a lifelong map update scenario.