Goto

Collaborating Authors

 Information Fusion


Alteryx Masterclass For Data Analytics, ETL And Reporting

#artificialintelligence

A Verifiable Certificate of Completion is presented to all students who undertake this Alteryx course. Why should you choose this course? This is a complete tutorial on Alteryx which can be completed within a weekend. Data Analysis and Analytics process automation are the most sought-after skills for Data analysis roles in all the companies. Alteryx designer core certification portrays one of the most desired skills in the market.


Alliant's Launch With Eyeota as Exclusive Data Exchange Partner

#artificialintelligence

Alliant, The Audience Company, announced a partnership with Eyeota, the leading data partner to global enterprises, giving Eyeota access to a brand-new set of Alliant's industry leading Brand Propensity audiences. Through the partnership Eyeota will be the exclusive data exchange partner for new audiences covering the gaming and insurance categories. Marketing Technology News: MarTech Interview with Anu Shukla, Co-founder & Executive Chairwoman at Botco.ai These insurance and gaming audiences are part of a larger set of 350 new Brand Propensity audiences that Alliant is launching simultaneously. These new audiences are unique in that they are modeled from credit card and bank card transactional data from over 50 financial institutions, ensuring that the insights are based on real-world purchasing behavior.


MultiBench: Multiscale Benchmarks for Multimodal Representation Learning

arXiv.org Artificial Intelligence

Learning multimodal representations involves integrating information from multiple heterogeneous sources of data. It is a challenging yet crucial area with numerous real-world applications in multimedia, affective computing, robotics, finance, human-computer interaction, and healthcare. Unfortunately, multimodal research has seen limited resources to study (1) generalization across domains and modalities, (2) complexity during training and inference, and (3) robustness to noisy and missing modalities. In order to accelerate progress towards understudied modalities and tasks while ensuring real-world robustness, we release MultiBench, a systematic and unified large-scale benchmark spanning 15 datasets, 10 modalities, 20 prediction tasks, and 6 research areas. MultiBench provides an automated end-to-end machine learning pipeline that simplifies and standardizes data loading, experimental setup, and model evaluation. To enable holistic evaluation, MultiBench offers a comprehensive methodology to assess (1) generalization, (2) time and space complexity, and (3) modality robustness. MultiBench introduces impactful challenges for future research, including scalability to large-scale multimodal datasets and robustness to realistic imperfections. To accompany this benchmark, we also provide a standardized implementation of 20 core approaches in multimodal learning. Simply applying methods proposed in different research areas can improve the state-of-the-art performance on 9/15 datasets. Therefore, MultiBench presents a milestone in unifying disjoint efforts in multimodal research and paves the way towards a better understanding of the capabilities and limitations of multimodal models, all the while ensuring ease of use, accessibility, and reproducibility. MultiBench, our standardized code, and leaderboards are publicly available, will be regularly updated, and welcomes inputs from the community.


Efficient exact computation of the conjunctive and disjunctive decompositions of D-S Theory for information fusion: Translation and extension

arXiv.org Artificial Intelligence

Dempster-Shafer Theory (DST) generalizes Bayesian probability theory, offering useful additional information, but suffers from a high computational burden. A lot of work has been done to reduce the complexity of computations used in information fusion with Dempster's rule. Yet, few research had been conducted to reduce the complexity of computations for the conjunctive and disjunctive decompositions of evidence, which are at the core of other important methods of information fusion. In this paper, we propose a method designed to exploit the actual evidence (information) contained in these decompositions in order to compute them. It is based on a new notion that we call focal point, derived from the notion of focal set. With it, we are able to reduce these computations up to a linear complexity in the number of focal sets in some cases. In a broader perspective, our formulas have the potential to be tractable when the size of the frame of discernment exceeds a few dozen possible states, contrary to the existing litterature. This article extends (and translates) our work published at the french conference GRETSI in 2019.


#335: Autonomous Aircraft by Xwing, with Maxime Gariel

Robohub

Abate talks to Maxime Gariel, CTO of Xwing about the autonomous flight technology they are developing. At Xwing, they retrofit traditional aircraft to include multiple sensors such as cameras, lidar, and radar. Using sensor fusion algorithms, they create an exceptionally accurate model of the environment. This model of the environment and advanced path planning and control algorithms allow the plane to autonomously navigate in the airport, take off, fly to a destination, and land, all without a person on board. Maxime Gariel Maxime Gariel is the CTO of Xwing, a San Francisco based startup whose mission is to dramatically increase human mobility using fully autonomous aerial vehicles.


Pentaho for ETL & Data Integration Masterclass 2021- PDI 9.0

#artificialintelligence

The ETL (extract, transform, load) process is the most popular method of collecting data from multiple sources and loading it into a centralized data warehouse. ETL is an essential component of data warehousing and analytics. Pentaho has phenomenal ETL, data analysis, metadata management and reporting capabilities. Pentaho is faster than other ETL tools (including Talend). Pentaho has a user-friendly GUI which is easier and takes less time to learn.


Multi-Modality Task Cascade for 3D Object Detection

arXiv.org Artificial Intelligence

Point clouds and RGB images are naturally complementary modalities for 3D visual understanding - the former provides sparse but accurate locations of points on objects, while the latter contains dense color and texture information. Despite this potential for close sensor fusion, many methods train two models in isolation and use simple feature concatenation to represent 3D sensor data. This separated training scheme results in potentially sub-optimal performance and prevents 3D tasks from being used to benefit 2D tasks that are often useful on their own. To provide a more integrated approach, we propose a novel Multi-Modality Task Cascade network (MTC-RCNN) that leverages 3D box proposals to improve 2D segmentation predictions, which are then used to further refine the 3D boxes. We show that including a 2D network between two stages of 3D modules significantly improves both 2D and 3D task performance. Moreover, to prevent the 3D module from over-relying on the overfitted 2D predictions, we propose a dual-head 2D segmentation training and inference scheme, allowing the 2nd 3D module to learn to interpret imperfect 2D segmentation predictions. Evaluating our model on the challenging SUN RGB-D dataset, we improve upon state-of-the-art results of both single modality and fusion networks by a large margin ($\textbf{+3.8}$ mAP@0.5). Code will be released $\href{https://github.com/Divadi/MTC_RCNN}{\text{here.}}$


How Data Scientists Can Troubleshoot ETL Issues Like a Data Engineer

#artificialintelligence

In the example ETL pipeline below, three data files are transformed, loaded into a staging table, and finally aggregated into a final table. A common issue for ETL failures is missing data files for the latest day's run. If the data comes from an external source, check with the provider and confirm if the files are running late. If the data is internal such as application events or the company website activity, confirm with the team responsible if there were issues that could've caused delayed or missing data. Once you get the missing data your ETL issue is resolved.


Humans as Path-Finders for Safe Navigation

arXiv.org Artificial Intelligence

One of the most important barriers toward a widespread use of mobile robots in unstructured and human populated work environments is the ability to plan a safe path. In this paper, we propose to delegate this activity to a human operator that walks in front of the robot marking with her/his footsteps the path to be followed. The implementation of this approach requires a high degree of robustness in locating the specific person to be followed (the leader). We propose a three phase approach to fulfil this goal: 1. identification and tracking of the person in the image space, 2. sensor fusion between camera data and laser sensors, 3. point interpolation with continuous curvature curves. The approach is described in the paper and extensively validated with experimental results.


Learn the best practices for building data pipelines

#artificialintelligence

According to Oracle, feature extraction is an attribute reduction process, which results in a much smaller and richer set of attributes. Depending on the requirements, identifying and extracting informative and compact data sets (for an ML model) may need structured data like numbers and dates or unstructured data like categorical features and raw text. If the data volume is large, the feature extraction can be handled separately, and the generated features can be stored in the storage layer. The format of the stored features is ready for direct consumption by the ML training process in the next phase. The feature extraction can be done for a wide range of applications like simple ETL process, model prediction pipeline, or retraining the model based on new data to improve the model accuracy.