Information Fusion
Uberflip Deploys Matillion ETL for Snowflake
Matillion, the leading provider of data transformation for cloud data warehouses (CDWs) announced that Uberflip has deployed Matillion ETL for Snowflake. By implementing cloud-native ETL, Uberflip reduced data preparation time from five weeks to just one day, helping their product, marketing, and sales teams to rapidly deliver better business value for customers. Matillion ETL delivered repeatable and scalable processes and models for data orchestration, and decreased required development time, freeing up valuable engineering resources. Uberflip is a leading content experience platform and software that enables marketers to create digital experiences with content for every stage of the buyer journey. To better serve internal teams and external customers, Uberflip's data scientists needed a solution that could help them rapidly extract, load, and transform data to scale analysis within the product, empower internal teams for self-service, and get real-time, accurate data from all sources.
The New Data Engineering Stack
Remember the time when the software development industry realized that a single person can take on multiple technologies glued tightly with each other and came up with the notion of a Full Stack Developer -- someone who does data modelling, writes backend code and also does front end work. Something similar has happened to the data industry with the birth of a Data Engineer almost half a decade ago. For many, the Full Stack Developer remains a mythical creature because of the never-ending list of technologies that cover frontend, backend and data. One of the reasons for that could be the fact that visualisation (business intelligence) has become a massive field in its own right. A Data Engineer is supposed to build systems to make data available, make it useable, move it from one place to another and so on.
Data Integration: The vital baking ingredient in your AI strategy - Journey to AI Blog
When people dream about becoming a baker or a pastry chef, they often think about the delicious pastries they’ll create, delighting their patrons with towering cakes wrapped in impossibly smooth fondant. But very rarely does anyone start off by thinking about the preparation involved in baking… Without being able to use freshly milled flour for baking, for example, you would actually never be able to eat a good piece of cake or a crusty loaf of bread. To produce those delicious pastries, a lot of preparation must happen before the actual baking process begins. The same parallel can be made between AI and Data Integration. Let me explain: The business challenge: As an example, let’s examine a regional U.S. retailer who recently decided to modernize its supply chain management, including supply chain availability, fulfillment, and online cart. To accomplish this objective, the retailer decided to implement the Onera Decision Engine in Google Cloud Platform (GCP). The Onera Decision Engine…
Appliance-Level Monitoring with Micro-Moment Smart Plugs
Alsalemi, Abdullah, Himeur, Yassine, Bensaali, Faycal, Amira, Abbes
Human population are striving against energy-related issues that not only affects society and the development of the world, but also causes global warming. A variety of broad approaches have been developed by both industry and the research community. However, there is an ever increasing need for comprehensive, end-to-end solutions aimed at transforming human behavior rather than device metrics and benchmarks. In this paper, a micro-moment-based smart plug system is proposed as part of a larger multi-appliance energy efficiency program. The smart plug, which includes two sub-units: the power consumption unit and environmental monitoring unit collect energy consumption of appliances along with contextual information, such as temperature, humidity, luminosity and room occupancy respectively. The plug also allows home automation capability. With the accompanying mobile application, end-users can visualize energy consumption data along with ambient environmental information. Current implementation results show that the proposed system delivers cost-effective deployment while maintaining adequate computation and wireless performance.
Five steps to jumpstart your data integration journey - Journey to AI Blog
As coined by British mathematician Clive Humby, “data is the new oil.” Like oil, data is valuable but it must be refined in order to provide value. Organizations need to collect, organize, and analyze their data across multi-cloud, hybrid cloud, and data lakes. Yet traditional ETL tools support only a limited number of delivery styles and involve a significant amount of hand-coding. In turn, enterprises are increasingly looking for machine-learning-powered integration tools to synchronize data for analytics, improve employee productivity, and prepare data for analytics. To achieve this, we will examine five steps an analyst at a fictitious financial services company, named Raviga, will take to be successful in their data integration project using IBM DataStage on IBM Cloud Pak for Data. 1. Ingest the Data Data must be ingested before it can be digested. First, the Raviga analyst needs access to its data sources before it can procure analytics. They need to access their data from multiple sources…
Why the Future of ETL Is Not ELT, But EL(T) - KDnuggets
How we store and manage data has completely changed over the last decade. We moved from an ETL world to an ELT world, with companies like Fivetran pushing the trend. However, we don't think it is going to stop there; ELT is a transition in our mind towards EL(T) (with EL decoupled from T). And to understand this, we need to discern the underlying reasons for this trend, as they might show what's in store for the future. This is what we will be doing in this article. Historically, the data pipeline process consisted of extracting, transforming, and loading data into a warehouse or a data lake.
Fusing Optical and SAR time series for LAI gap filling with multioutput Gaussian processes
Pipia, Luca, Muñoz-Marí, Jordi, Amin, Eatidal, Belda, Santiago, Camps-Valls, Gustau, Verrelst, Jochem
The availability of satellite optical information is often hampered by the natural presence of clouds, which can be problematic for many applications. Persistent clouds over agricultural fields can mask key stages of crop growth, leading to unreliable yield predictions. Synthetic Aperture Radar (SAR) provides all-weather imagery which can potentially overcome this limitation, but given its high and distinct sensitivity to different surface properties, the fusion of SAR and optical data still remains an open challenge. In this work, we propose the use of Multi-Output Gaussian Process (MOGP) regression, a machine learning technique that learns automatically the statistical relationships among multisensor time series, to detect vegetated areas over which the synergy between SAR-optical imageries is profitable. For this purpose, we use the Sentinel-1 Radar Vegetation Index (RVI) and Sentinel-2 Leaf Area Index (LAI) time series over a study area in north west of the Iberian peninsula. Through a physical interpretation of MOGP trained models, we show its ability to provide estimations of LAI even over cloudy periods using the information shared with RVI, which guarantees the solution keeps always tied to real measurements. Results demonstrate the advantage of MOGP especially for long data gaps, where optical-based methods notoriously fail. The leave-one-image-out assessment technique applied to the whole vegetation cover shows MOGP predictions improve standard GP estimations over short-time gaps (R$^2$ of 74\% vs 68\%, RMSE of 0.4 vs 0.44 $[m^2m^{-2}]$) and especially over long-time gaps (R$^2$ of 33\% vs 12\%, RMSE of 0.5 vs 1.09 $[m^2m^{-2}]$).
5 Most essential skills to become a data scientist in 2021
Data Science has become an emerging and hottest job role in 2020. With the increase in demand for skilled professionals, more and more people have started taking up data science course. If you want to become a data scientist in 2021, you need to develop a set of skills. Here are the most essential skills to become a successful data scientist in near future. The latest version, Python 3 has become the default choice of language for data science.
Pentaho Data Integration tool for ETL & Data warehousing
The ETL (extract, transform, load) process is the most popular method of collecting data from multiple sources and loading it into a centralized data warehouse. ETL is an essential component of data warehousing and analytics. Pentaho has phenomenal ETL, data analysis, metadata management and reporting capabilities. Pentaho is faster than other ETL tools (including Talend). Its GUI is easier and takes less time to learn.