Goto

Collaborating Authors

 Information Fusion


IMF: Interactive Multimodal Fusion Model for Link Prediction

arXiv.org Artificial Intelligence

Link prediction aims to identify potential missing triples in knowledge graphs. To get better results, some recent studies have introduced multimodal information to link prediction. However, these methods utilize multimodal information separately and neglect the complicated interaction between different modalities. In this paper, we aim at better modeling the inter-modality information and thus introduce a novel Interactive Multimodal Fusion (IMF) model to integrate knowledge from different modalities. To this end, we propose a two-stage multimodal fusion framework to preserve modality-specific knowledge as well as take advantage of the complementarity between different modalities. Instead of directly projecting different modalities into a unified space, our multimodal fusion module limits the representations of different modalities independent while leverages bilinear pooling for fusion and incorporates contrastive learning as additional constraints. Furthermore, the decision fusion module delivers the learned weighted average over the predictions of all modalities to better incorporate the complementarity of different modalities. Our approach has been demonstrated to be effective through empirical evaluations on several real-world datasets. The implementation code is available online at https://github.com/HestiaSky/IMF-Pytorch.


Data Integrations Engineer at Riskified - New York

#artificialintelligence

Riskified empowers merchants and shoppers to realize the full potential of eCommerce by making it safe, accessible, and frictionless. Our global team helps the world's most-innovative eCommerce merchants eliminate risk and uncertainty from their business. Merchants integrate Riskified's machine learning platform to create trusted customer relationships, driving higher sales while reducing costs. Riskified has reviewed hundreds of millions of transactions and approved billions of dollars of revenue for global brands and fast-growing businesses across industries, including Wayfair, Wish, Peloton, Gucci, and many more. As of July 29th, 2021, Riskified has begun trading on NYSE under the ticker RSKD.


Pedestrain detection for low-light vision proposal

arXiv.org Artificial Intelligence

The demand for pedestrian detection has created a challenging problem for various visual tasks such as image fusion. As infrared images can capture thermal radiation information, image fusion between infrared and visible images could significantly improve target detection under environmental limitations. In our project, we would approach by pre-processing our dataset with image fusion technique, then using Vision Transformer (ViT) [5] model to detect pedestrians from the fused images. During the evaluation procedure, a comparison would be made between YOLOv5 and the revised ViT model's performance on our fused images.


TypeT5: Seq2seq Type Inference using Static Analysis

arXiv.org Artificial Intelligence

There has been growing interest in automatically predicting missing type annotations in programs written in Python and JavaScript. While prior methods have achieved impressive accuracy when predicting the most common types, they often perform poorly on rare or complex types. In this paper, we present a new type inference method that treats type prediction as a code infilling task by leveraging CodeT5, a state-of-the-art seq2seq pre-trained language model for code. Our method uses static analysis to construct dynamic contexts for each code element whose type signature is to be predicted by the model. We also propose an iterative decoding scheme that incorporates previous type predictions in the model's input context, allowing information exchange between related code elements. Our evaluation shows that the proposed approach, TypeT5, not only achieves a higher overall accuracy (particularly on rare and complex types) but also produces more coherent results with fewer type errors -- while enabling easy user intervention.


Visuo-Haptic Object Perception for Robots: An Overview

arXiv.org Artificial Intelligence

The object perception capabilities of humans are impressive, and this becomes even more evident when trying to develop solutions with a similar proficiency in autonomous robots. While there have been notable advancements in the technologies for artificial vision and touch, the effective integration of these two sensory modalities in robotic applications still needs to be improved, and several open challenges exist. Taking inspiration from how humans combine visual and haptic perception to perceive object properties and drive the execution of manual tasks, this article summarises the current state of the art of visuo-haptic object perception in robots. Firstly, the biological basis of human multimodal object perception is outlined. Then, the latest advances in sensing technologies and data collection strategies for robots are discussed. Next, an overview of the main computational techniques is presented, highlighting the main challenges of multimodal machine learning and presenting a few representative articles in the areas of robotic object recognition, peripersonal space representation and manipulation. Finally, informed by the latest advancements and open challenges, this article outlines promising new research directions.


Efficient Image-Text Retrieval via Keyword-Guided Pre-Screening

arXiv.org Artificial Intelligence

Under the flourishing development in performance, current image-text retrieval methods suffer from $N$-related time complexity, which hinders their application in practice. Targeting at efficiency improvement, this paper presents a simple and effective keyword-guided pre-screening framework for the image-text retrieval. Specifically, we convert the image and text data into the keywords and perform the keyword matching across modalities to exclude a large number of irrelevant gallery samples prior to the retrieval network. For the keyword prediction, we transfer it into a multi-label classification problem and propose a multi-task learning scheme by appending the multi-label classifiers to the image-text retrieval network to achieve a lightweight and high-performance keyword prediction. For the keyword matching, we introduce the inverted index in the search engine and create a win-win situation on both time and space complexities for the pre-screening. Extensive experiments on two widely-used datasets, i.e., Flickr30K and MS-COCO, verify the effectiveness of the proposed framework. The proposed framework equipped with only two embedding layers achieves $O(1)$ querying time complexity, while improving the retrieval efficiency and keeping its performance, when applied prior to the common image-text retrieval methods. Our code will be released.


PIEKF-VIWO: Visual-Inertial-Wheel Odometry using Partial Invariant Extended Kalman Filter

arXiv.org Artificial Intelligence

Invariant Extended Kalman Filter (IEKF) has been successfully applied in Visual-inertial Odometry (VIO) as an advanced achievement of Kalman filter, showing great potential in sensor fusion. In this paper, we propose partial IEKF (PIEKF), which only incorporates rotation-velocity state into the Lie group structure and apply it for Visual-Inertial-Wheel Odometry (VIWO) to improve positioning accuracy and consistency. Specifically, we derive the rotation-velocity measurement model, which combines wheel measurements with kinematic constraints. The model circumvents the wheel odometer's 3D integration and covariance propagation, which is essential for filter consistency. And a plane constraint is also introduced to enhance the position accuracy. A dynamic outlier detection method is adopted, leveraging the velocity state output. Through the simulation and real-world test, we validate the effectiveness of our approach, which outperforms the standard Multi-State Constraint Kalman Filter (MSCKF) based VIWO in consistency and accuracy.


Data Integration Signup

#artificialintelligence

Get free data integration software for projects & organizations of any size. The Informatica platform has the data integration tools you need to get started & scale up.


DB/ETL Developer at dentsu international - Thane, India

#artificialintelligence

Merkle, a dentsu company, is a leading data-driven customer experience management enterprise specializing in delivering unique, personalized customer experiences across platforms and devices. Its expertise in data, technology, and analytics enables it to gain insights into consumer behavior, driving hyper-personalized marketing strategies. Merkle's consulting, creative, media, analytics, data, identity, CX/commerce, technology, and loyalty & promotions capabilities combine to drive improved marketing results and a competitive edge. With over 14,000 employees, Merkle is headquartered in Columbia, Maryland, with more than 50 additional offices worldwide. Its innovative solutions enable brands to build meaningful customer relationships and enhanced customer experience.


Multimodal Data Integration for Oncology in the Era of Deep Neural Networks: A Review

arXiv.org Artificial Intelligence

Cancer has relational information residing at varying scales, modalities, and resolutions of the acquired data, such as radiology, pathology, genomics, proteomics, and clinical records. Integrating diverse data types can improve the accuracy and reliability of cancer diagnosis and treatment. There can be disease-related information that is too subtle for humans or existing technological tools to discern visually. Traditional methods typically focus on partial or unimodal information about biological systems at individual scales and fail to encapsulate the complete spectrum of the heterogeneous nature of data. Deep neural networks have facilitated the development of sophisticated multimodal data fusion approaches that can extract and integrate relevant information from multiple sources. Recent deep learning frameworks such as Graph Neural Networks (GNNs) and Transformers have shown remarkable success in multimodal learning. This review article provides an in-depth analysis of the state-of-the-art in GNNs and Transformers for multimodal data fusion in oncology settings, highlighting notable research studies and their findings. We also discuss the foundations of multimodal learning, inherent challenges, and opportunities for integrative learning in oncology. By examining the current state and potential future developments of multimodal data integration in oncology, we aim to demonstrate the promising role that multimodal neural networks can play in cancer prevention, early detection, and treatment through informed oncology practices in personalized settings.