Goto

Collaborating Authors

 Information Fusion


Knowledge Perceived Multi-modal Pretraining in E-commerce

arXiv.org Artificial Intelligence

In this paper, we address multi-modal pretraining of product data in the field of E-commerce. Current multi-modal pretraining methods proposed for image and text modalities lack robustness in the face of modality-missing and modality-noise, which are two pervasive problems of multi-modal product data in real E-commerce scenarios. To this end, we propose a novel method, K3M, which introduces knowledge modality in multi-modal pretraining to correct the noise and supplement the missing of image and text modalities. The modal-encoding layer extracts the features of each modality. The modal-interaction layer is capable of effectively modeling the interaction of multiple modalities, where an initial-interactive feature fusion model is designed to maintain the independence of image modality and text modality, and a structure aggregation module is designed to fuse the information of image, text, and knowledge modalities. We pretrain K3M with three pretraining tasks, including masked object modeling (MOM), masked language modeling (MLM), and link prediction modeling (LPM). Experimental results on a real-world E-commerce dataset and a series of product-based downstream tasks demonstrate that K3M achieves significant improvements in performances than the baseline and state-of-the-art methods when modality-noise or modality-missing exists.


Efficient Online Estimation of Causal Effects by Deciding What to Observe

arXiv.org Machine Learning

Researchers often face data fusion problems, where multiple data sources are available, each capturing a distinct subset of variables. While problem formulations typically take the data as given, in practice, data acquisition can be an ongoing process. In this paper, we aim to estimate any functional of a probabilistic model (e.g., a causal effect) as efficiently as possible, by deciding, at each time, which data source to query. We propose online moment selection (OMS), a framework in which structural assumptions are encoded as moment conditions. The optimal action at each step depends, in part, on the very moments that identify the functional of interest. Our algorithms balance exploration with choosing the best action as suggested by current estimates of the moments. We propose two selection strategies: (1) explore-then-commit (OMS-ETC) and (2) explore-then-greedy (OMS-ETG), proving that both achieve zero asymptotic regret as assessed by MSE. We instantiate our setup for average treatment effect estimation, where structural assumptions are given by a causal graph and data sources may include subsets of mediators, confounders, and instrumental variables.


Bootstrap a Modern Data Stack in 5 minutes with Terraform - KDnuggets

#artificialintelligence

Modern Data Stack (MDS) is a stack of technologies that makes a modern data warehouse perform 10โ€“10,000x better than a legacy data warehouse. Ultimately, an MDS saves time, money, and effort. The four pillars of an MDS are a data connector, a cloud data warehouse, a data transformer, and a BI & data exploration tool. Easy integration is made possible with managed and open-source tools that pre-build hundreds of ready-to-use connectors. What used to take a team of data engineers to build and maintain regularly can now be replaced with a tool for simple use cases.


The best consultancy for business, with sales and marketing data insights too: we review

#artificialintelligence

With digital marketing, good, clean, and insightful data is a key pillar which a business stands to drive growth and profits. Having clear and precise data-driven outcomes should be a priority for all marketers. When used in tandem with well-defined marketing and sales goals, and various marketing tools and techniques, companies will discover that their lead to sale conversion process can be far less cumbersome and more rewarding. Possessing clean data will help marketers identify detailed segments based on user attributes, past behaviours, interactions, and other necessary data points. Data can be leveraged for highly targeted campaigns which will drive marketing return on investment (ROI).


A guide to ETL Testing

#artificialintelligence

Even though the above diagram is a bit of simplification, this is how most ETL workflows may look like. To put simply, ETL is an automated process to move data from source systems to target systems, involving various stages for Extract, Transform and Load sub-processes, without data-loss and while maintaining data-integrity. This also, is usually referred to as data-migration. The objective of ETL is to have a clean, classified, enriched and curated data at one place (data warehouse or data lake). Machine-learning models and analytic tools are run against this data to fetch useful information and predictions, based on which business decisions can be taken.


Multimodal Co-learning: Challenges, Applications with Datasets, Recent Advances and Future Directions

arXiv.org Artificial Intelligence

Multimodal deep learning systems which employ multiple modalities like text, image, audio, video, etc., are showing better performance in comparison with individual modalities (i.e., unimodal) systems. Multimodal machine learning involves multiple aspects: representation, translation, alignment, fusion, and co-learning. In the current state of multimodal machine learning, the assumptions are that all modalities are present, aligned, and noiseless during training and testing time. However, in real-world tasks, typically, it is observed that one or more modalities are missing, noisy, lacking annotated data, have unreliable labels, and are scarce in training or testing and or both. This challenge is addressed by a learning paradigm called multimodal co-learning. The modeling of a (resource-poor) modality is aided by exploiting knowledge from another (resource-rich) modality using transfer of knowledge between modalities, including their representations and predictive models. Co-learning being an emerging area, there are no dedicated reviews explicitly focusing on all challenges addressed by co-learning. To that end, in this work, we provide a comprehensive survey on the emerging area of multimodal co-learning that has not been explored in its entirety yet. We review implementations that overcome one or more co-learning challenges without explicitly considering them as co-learning challenges. We present the comprehensive taxonomy of multimodal co-learning based on the challenges addressed by co-learning and associated implementations. The various techniques employed to include the latest ones are reviewed along with some of the applications and datasets. Our final goal is to discuss challenges and perspectives along with the important ideas and directions for future work that we hope to be beneficial for the entire research community focusing on this exciting domain.


Ray 1.5 looks to simplify data exchanges, refactored architecture keeps jobs going

#artificialintelligence

Distributed execution framework Ray 1.5 is ready for downloading, providing devs in the machine learning space with a first look at new data โ€ฆ


Sapper - As simple as connecting the dots

#artificialintelligence

Sapper makes it easy to automate processes, experiences and solutions to help you adapt to todayโ€™s digital age. We have consolidated the most common tasks of integrating applications, data, preparing data for analytics or interacting with Bots into one simple product The Sapper solution is rooted in the teamโ€™s deep expertise in the latest AI, Automation and Cloud technologies and strong partnerships. Its as simple as connecting the dots. Imagine, Develop & Automate your Automation, ai.


ETL Testing in a nutshell

#artificialintelligence

Even though the above diagram is a bit of simplification, this is how most ETL workflows may look like. To put it simply, ETL is an automated process to move data from source systems to target systems, involving various stages for Extract, Transform and Load sub-processes, without data-loss and while maintaining data-integrity. This also, is usually referred to as data-migration. The objective of ETL is to have a clean, classified, enriched and curated data at one place (data warehouse or data lake). Machine Learning models and analytic tools are run against this data to fetch useful information and predictions, based on which business decisions can be taken.


6 Key Components of a Successful Data Strategy

#artificialintelligence

I've already mentioned data catalogs as one strategic tool. By necessity, they're provisioned by IT and data management teams, who know how to work with the various features in data catalog software and how to set up and deploy them. We can make a useful distinction between tools provisioned in this way by IT and tools adopted by end users. Both have an important role to play in a data strategy, complementing rather than contradicting each other. Data management tools are almost always the domain of IT.