Goto

Collaborating Authors

Data Integration


What's possible in a zero-ETL future?

ZDNet

Integrating data across an organization can give you a better picture of your customers, streamline your operations, and help teams make better, faster decisions. Often, organizations gather data from different sources, using a variety of tools and systems such as data ingestion services. Data is often stored in silos, which means it has to be moved into a data lake or data warehouse before analytics, artificial intelligence (AI), or machine learning (ML) workloads can be run. And before that data is ready for analysis, it needs to be combined, cleaned, and normalized--a process otherwise known as extract, transform, load (ETL)--which can be laborious and error-prone. At AWS, our goal is to make it easier for organizations to connect to all of their data, and to do it with the speed and agility our customers need.


DIAS–Earth Environment Data Integration and Analysis System

Communications of the ACM

Our group has been developing and operating a platform to acquire, archive, and manage various data related to the Earth's environment to make it available to researchers across a wide range of fields. Development of this system began in the 1980s to receive, archive, and distribute Asian satellite image data. Currently, the system covers a variety of data including weather, climate change, disaster prevention, biodiversity, health, and agriculture. Today, the Data Integration and Analysis System (DIASa) is a large-scale analysis platform with huge storage and more than 10,000 registered users (half of them in Japan and the other half primarily in Asia). As shown in the accompanying figure, users can easily use data collected by the common collection API through the common use API, and they can operate services at the application layer.


BI Developer at JFrog - Netanya/Tel Aviv, Israel

#artificialintelligence

At JFrog, we're reinventing DevOps to help the world's greatest companies innovate -- and we want you along for the ride. This is a special place with a unique combination of brilliance, spirit and just all-around great people. Here, if you're willing to do more, your career can take off. And since software plays a central role in everyone's lives, you'll be part of an important mission. Thousands of customers, including the majority of the Fortune 100, trust JFrog to manage, accelerate, and secure their software delivery from code to production -- a concept we call "liquid software."


Reviews: Kalman Filter, Sensor Fusion, and Constrained Regression: Equivalences and Insights

Neural Information Processing Systems

Rebuttal acknowledged, thank you for the additional clarifications. Indeed, given a flat prior for $x_{t 1}$ (i.e., Gaussian with "infinite" variance), we have two independent observations: - the influence of the past (prediction term) - the influence of the current measurement (filtering term) both have Gaussian likelihood. So the posterior density of $x_{t 1}$ is proportional to a product of three Gaussian-shaped terms. The two different ways in which these terms can be folded into each other (using standard Gaussian conjugacy rules) lead to Thm 1. I believe that the linear-algebraic formulation the authors use just hides the fact that we are multiplying Gaussian PDFs in different ways.


Reviews: Kalman Filter, Sensor Fusion, and Constrained Regression: Equivalences and Insights

Neural Information Processing Systems

The results of this paper could be of interest to NeurIPS. However, the author(s) should try and address most of the concerns raised by the reviewers, specially a review of existing work on Kalman filtering with infinite variance.


Python for a Leading AI-Powered Content Creation Platform for Fashion

#artificialintelligence

Join a leading venture-backed technology company that has won numerous awards and has been featured by Women's Wear Daily, CBInsights, Fox, and VentureBeat, among others. Help iconic brands scale their editorial vision by utilizing the company's machine-learning platform, which uses AI to streamline and expand content creation for omnichannel brands and retailers. This platform offers shoppers visual guidance on product incorporation via "complete the look" suggestions on eCommerce, email, and in-store, driving sales and improving the customer experience. As a member of this talented team, you will have the opportunity to work with renowned and influential brands worldwide, such as Rogers, Adidas, Perry Ellis, and many more. What BEONers Love about this Project "When I share any recommendation, library, or technology, the team always hears what I have to say, so my contributions are welcome and heard, we are a company that process more than 100million requests monthly, and within our technologies, we are involved with: Python, Machine Learning, ETL Processes, Cloud Services (GCP)."


Review for NeurIPS paper: BayReL: Bayesian Relational Learning for Multi-omics Data Integration

Neural Information Processing Systems

Summary and Contributions: In this paper, the authors propose a Bayesian representation learning framework that can infer links between heterogeneous graphs generated from multi-omics datasets. The main idea is to use the underlying relationship information within each dataset (or view) by modeling it as a graph. The method has 4 steps - (1) to embed the nodes of each view-specific graph into in the same latent space (2) generate a multi-view adjacency tensor using the similarity scores for node embeddings across views (3) Infer prior latent variables from the node embeddings and multi-view graphs and posterior from the view-specific data (4) Finally, perform variational inference to optimize model parameters and variational parameters. The paper attempts to solve an important problem of multi-omics data integration by learning relationships that can exist between different modalities by modeling them as multi-view link prediction. This work could be useful to the broader ML community.


Review for NeurIPS paper: BayReL: Bayesian Relational Learning for Multi-omics Data Integration

Neural Information Processing Systems

The paper proposes a Bayesian formulation for the integration of multi omics datasets by combining within-view and between-view interactions. Although the paper is conceptually related to prior work, the reviewers appreciate the contributions made, which are both timely and relevant to the neurips community. Overall, this is a solid submission and the authors defend the concerns raised convincingly in their rebuttal.