Goto

Collaborating Authors

 data redundancy


MS-Mapping: Multi-session LiDAR Mapping with Wasserstein-based Keyframe Selection

arXiv.org Artificial Intelligence

Large-scale multi-session LiDAR mapping plays a crucial role in various applications but faces significant challenges in data redundancy and pose graph scalability. This paper present MS-Mapping, a novel multi-session LiDAR mapping system that combines an incremental mapping scheme with support for various LiDAR-based odometry, enabling high-precision and consistent map assembly in large-scale environments. Our approach introduces a real-time keyframe selection method based on the Wasserstein distance, which effectively reduces data redundancy and pose graph complexity. We formulate the LiDAR point cloud keyframe selection problem using a similarity method based on Gaussian mixture models (GMM) and tackle the real-time challenge by employing an incremental voxel update method. Extensive experiments on large-scale campus scenes and over \SI{12.8}{km} of public and self-collected datasets demonstrate the efficiency, accuracy, and consistency of our map assembly approach. To facilitate further research and development in the community, we make our code https://github.com/JokerJohn/MS-Mapping and datasets publicly available.


A Probabilistic Model for Data Redundancy in the Feature Domain

arXiv.org Artificial Intelligence

In this paper, we use a probabilistic model to estimate the number of uncorrelated features in a large dataset. Our model allows for both pairwise feature correlation (collinearity) and interdependency of multiple features (multicollinearity) and we use the probabilistic method to obtain upper and lower bounds of the same order, for the size of a feature set that exhibits low collinearity and low multicollinearity. We also prove an auxiliary result regarding mutually good constrained sets that is of independent interest.


Survey: Exploiting Data Redundancy for Optimization of Deep Learning

arXiv.org Artificial Intelligence

Data redundancy is ubiquitous in the inputs and intermediate results of Deep Neural Networks (DNN). It offers many significant opportunities for improving DNN performance and efficiency and has been explored in a large body of work. These studies have scattered in many venues across several years. The targets they focus on range from images to videos and texts, and the techniques they use to detect and exploit data redundancy also vary in many aspects. There is not yet a systematic examination and summary of the many efforts, making it difficult for researchers to get a comprehensive view of the prior work, the state of the art, differences and shared principles, and the areas and directions yet to explore. This article tries to fill the void. It surveys hundreds of recent papers on the topic, introduces a novel taxonomy to put the various techniques into a single categorization framework, offers a comprehensive description of the main methods used for exploiting data redundancy in improving multiple kinds of DNNs on data, and points out a set of research opportunities for future to explore.


Data Preparation and Raw Data in Machine Learning - KDnuggets

#artificialintelligence

With the massive amounts of data available, analytics and machine learning (ML) are increasingly used to extract critical information that can be used for many different things. For example, it is possible to automatically identify credit card fraud by studying how users interact with an eCommerce website. But this only works if all the relevant data has been prepared in a way that makes it amenable to machine learning algorithms. In this article, I will describe the data preparation techniques for machine learning. Raw data refers to the unprocessed, original data that is collected by computers.


The Day Big Data Died

#artificialintelligence

Game of Thrones, touchscreen devices, and superhero movies: those things have little in common apart from the fact that they went through a meteoric rise in the 2010s. Still, if you believe what AI experts have to say, they pale in comparison to the largest tech phenomenon of the decade: Big Data. The'old' generation of data scientists still remember the days they were telling their manager (usually, someone with no expertise in machine learning) that gathering data would take months. And, honestly, it really did. For example, in the early days of the digital era, continuously sharing data on social media was still uncommon behavior, and it would take months for companies to collect what can be collected in just days (if not hours) today.


The Day Big Data Died

#artificialintelligence

Game of Thrones, touchscreen devices, and superhero movies: those things have little in common apart from the fact that they went through a meteoric rise in the 2010s. Still, if you believe what AI experts have to say, they pale in comparison to the largest tech phenomenon of the decade: Big Data. The'old' generation of data scientists still remember the days they were telling their manager (usually, someone with no expertise in machine learning) that gathering data would take months. And, honestly, it really did. For example, in the early days of the digital era, continuously sharing data on social media was still uncommon behavior, and it would take months for companies to collect what can be collected in just days (if not hours) today.