Goto

Collaborating Authors

Comparative study of 3D object detection frameworks based on LiDAR data and sensor fusion techniques

arXiv.org Artificial Intelligence

Estimating and understanding the surroundings of the vehicle precisely forms the basic and crucial step for the autonomous vehicle. The perception system plays a significant role in providing an accurate interpretation of a vehicle's environment in real-time. Generally, the perception system involves various subsystems such as localization, obstacle (static and dynamic) detection, and avoidance, mapping systems, and others. For perceiving the environment, these vehicles will be equipped with various exteroceptive (both passive and active) sensors in particular cameras, Radars, LiDARs, and others. These systems are equipped with deep learning techniques that transform the huge amount of data from the sensors into semantic information on which the object detection and localization tasks are performed. For numerous driving tasks, to provide accurate results, the location and depth information of a particular object is necessary. 3D object detection methods, by utilizing the additional pose data from the sensors such as LiDARs, stereo cameras, provides information on the size and location of the object. Based on recent research, 3D object detection frameworks performing object detection and localization on LiDAR data and sensor fusion techniques show significant improvement in their performance. In this work, a comparative study of the effect of using LiDAR data for object detection frameworks and the performance improvement seen by using sensor fusion techniques are performed. Along with discussing various state-of-the-art methods in both the cases, performing experimental analysis, and providing future research directions.


Mobile Augmented Reality: User Interfaces, Frameworks, and Intelligence

arXiv.org Artificial Intelligence

Mobile Augmented Reality (MAR) integrates computer-generated virtual objects with physical environments for mobile devices. MAR systems enable users to interact with MAR devices, such as smartphones and head-worn wearables, and performs seamless transitions from the physical world to a mixed world with digital entities. These MAR systems support user experiences by using MAR devices to provide universal accessibility to digital contents. Over the past 20 years, a number of MAR systems have been developed, however, the studies and design of MAR frameworks have not yet been systematically reviewed from the perspective of user-centric design. This article presents the first effort of surveying existing MAR frameworks (count: 37) and further discusses the latest studies on MAR through a top-down approach: 1) MAR applications; 2) MAR visualisation techniques adaptive to user mobility and contexts; 3) systematic evaluation of MAR frameworks including supported platforms and corresponding features such as tracking, feature extraction plus sensing capabilities; and 4) underlying machine learning approaches supporting intelligent operations within MAR systems. Finally, we summarise the development of emerging research fields, current state-of-the-art, and discuss the important open challenges and possible theoretical and technical directions. This survey aims to benefit both researchers and MAR system developers alike.


Visual-Inertial-Semantic Scene Representation for 3-D Object Detection

arXiv.org Artificial Intelligence

We describe a system to detect objects in three-dimensional space using video and inertial sensors (accelerometer and gyrometer), ubiquitous in modern mobile platforms from phones to drones. Inertials afford the ability to impose class-specific scale priors for objects, and provide a global orientation reference. A minimal sufficient representation, the posterior of semantic (identity) and syntactic (pose) attributes of objects in space, can be decomposed into a geometric term, which can be maintained by a localization-and-mapping filter, and a likelihood function, which can be approximated by a discriminatively-trained convolutional neural network. The resulting system can process the video stream causally in real time, and provides a representation of objects in the scene that is persistent: Confidence in the presence of objects grows with evidence, and objects previously seen are kept in memory even when temporarily occluded, with their return into view automatically predicted to prime re-detection.


Indoor Semantic Scene Understanding using Multi-modality Fusion

arXiv.org Artificial Intelligence

Seamless Human-Robot Interaction is the ultimate goal of developing service robotic systems. For this, the robotic agents have to understand their surroundings to better complete a given task. Semantic scene understanding allows a robotic agent to extract semantic knowledge about the objects in the environment. In this work, we present a semantic scene understanding pipeline that fuses 2D and 3D detection branches to generate a semantic map of the environment. The 2D mask proposals from state-of-the-art 2D detectors are inverse-projected to the 3D space and combined with 3D detections from point segmentation networks. Unlike previous works that were evaluated on collected datasets, we test our pipeline on an active photo-realistic robotic environment - BenchBot. Our novelty includes rectification of 3D proposals using projected 2D detections and modality fusion based on object size. This work is done as part of the Robotic Vision Scene Understanding Challenge (RVSU). The performance evaluation demonstrates that our pipeline has improved on baseline methods without significant computational bottleneck.


TMBuD: A dataset for urban scene building detection

arXiv.org Artificial Intelligence

Computer Vision (CV) aims to create computational models that can mimic the human visual system. From an engineering point of view, CV aims to build autonomous systems which could perform some of the tasks that the human visual system is able to accomplish [1]. Urban scenarios reconstruction and understanding of it is an area of research with several applications nowadays: entertainment industry, computer gaming, movie making, digital mapping for mobile devices, digital mapping for car navigation, urban planning, driving. Understanding urban scenarios has become much more important with the evolution of Augmented Reality (AR). AR is successfully exploited in many domains nowadays, one of them being culture and tourism, an area in which the authors of this paper carried multiple research projects [2], [3], [4].