information fusion

Artificial Intelligence (AI) and Machine Learning Oracle


The platform consists of tools for every step in the modern machine learning lifecycle--from developing machine-learning models to building intelligent applications to integrating machine learning outputs into business intelligence and visualization tools--so it's accessible by business teams. The platform features a portfolio of data management solutions which offer unmatched ability to store and process data at any scale--and data integration tools to ensure data in any format can be accessed for machine learning model building. The platform runs on top of Oracle Cloud Infrastructure, which is optimized for running AI workloads, offering high-speed network fabric and a wide range of GPU and CPU compute options for small- to large-scale model building, training, and production deployments.

AWS Lake Formation Automates Data Lake Management - SDxCentral


Amazon Web Services (AWS) launched general availability of its fully-managed Lake Formation platform designed to help organizations better manage their data lakes. The service helps with the building, securing, and managing of those data repositories. Lake Formation, which was initially announced at the AWS re:Invent show late last year, is built on AWS' Glue extract, transform, and load (ETL) service. It automates the provisioning and configuring of storage; crawls the data to extract schema and metadata tags; automatically optimizes the partitioning of the data; and transforms the data into formats like Apache Parquet and ORC for easier analytics. Data can be ingested from different sources using pre-defined templates.

Data Integration and Machine Learning: 3 Real-World Use Cases


Learn how applying the concept of machine learning to capacity management can make the process more effective and efficient. Machine learning involves computers assimilating information and then drawing conclusions from that data without being explicitly programmed to do so. This technology has significant positive implications for businesses. Yet, machine learning can be improved even further. The answer lies in data integration.

How dataops improves data, analytics, and machine learning


Have you noticed that most organizations are trying to do a lot more with their data? Businesses are investing heavily in data science programs, self-service business intelligence tools, artificial intelligence programs, and organizational efforts to promote data-driven decision making. Some are developing customer facing applications by embedding data visualizations into web and mobile products or collecting new forms of data from sensors (Internet of Things), wearables, and third-party APIs. Still others are harnessing intelligence from unstructured data sources such as documents, images, videos, and spoken language. Much of the work around data and analytics is on delivering value from it.

Introducing Dagster - Nick Schrock - Medium


Today the team at Elementl is proud to announce an early release of Dagster, an open-source library for building systems like ETL processes and ML pipelines. We believe they are, in reality, a single class of software system. We call them data applications. Dagster is a library for building these data applications. We define a data application as a graph of functional computations that produce and consume data assets.

Multi-Label Product Categorization Using Multi-Modal Fusion Models Machine Learning

In this study, we investigated multi-modal approaches using images, descriptions, and title to categorize e-commerce products on Specifically, we examined late fusion models, where the modalities are fused at the decision level. Products were each assigned multiple labels, and the hierarchy in the labels were flattened and filtered. For our individual baseline models, we modified a CNN architecture to classify the description and title, and then modified Keras' ResNet-50 to classify the images, achieving F1 scores of 77.0%, 82.7%, and 61.0%, respectively. In comparison, our tri-modal late fusion model can classify products more accurately than single modal models can, improving the F1 score to 88.2%. Each modality complemented the shortcomings of the other modalities, demonstrating that increasing the number of modalities can be an effective method for improving the accuracy of multi-label classification problems.

Extract, Shoehorn, and Load

Communications of the ACM

A lot of data is moved from system to system in an important and increasing part of the computing landscape. This is traditionally known as ETL (extract, transform, and load). While many systems are extremely good at this process, the source for the extraction and the destination for the load frequently have different representations for their data. It is common for this transformation to squeeze, truncate, or pad the data to make it fit into the target. This is really like using a shoehorn to fit into a shoe that is too small.

Talend and Qubole Serverless Platform for Machine Learning: Choosing Between a Cab vs Your Own Car - Talend Real-Time Open Source Data Integration Software


Before going to the world of integration, machine learning, etc., I would like to discuss with all of you about a scenario many of you might experience when you live in a mega city. I lived in the London suburbs for almost 2 years (and it's a city quite close to my heart too), so let me use London as this story's background. When I moved to London, one question which came to my mind was whether I should buy a car or not. The public transport system in London is quite dense and amazing (Oh!!! I just love the amazing London Underground and I miss it in Toronto).

CloverDX Drinks and Data Meetup - London


Data integration software and ETL tools provided by the CloverDX platform (formerly known as CloverETL) offer solutions for data management tasks such as data integration, data migration, or data quality. CloverDX is a vital part of enterprise solutions such as data warehousing, business intelligence (BI) or master data management (MDM). CloverDX Designer (formerly known as CloverETL Designer) is a visual data transformation designer that helps define data flows and transformations in a quick, visual, and intuitive way. CloverDX Server (formerly known as CloverETL Server) is an enterprise ETL and data integration runtime environment. It offers a set of enterprise features such as automation, monitoring, user management, real-time ETL, data API services, clustering, or cloud data integration.

Data virtualization use cases cover more integration tasks


Gartner predicts that 60% of organizations will deploy data virtualization software as part of their data integration tool set by 2020. That's a big jump from the adoption rate of about 35% the consulting and market research company cited in a November 2018 report on the data virtualization market. But the technology "is rapidly gaining momentum," a group of four Gartner analysts wrote in the report. The analysts said data virtualization use cases are on the rise partly because IT teams are struggling to physically integrate a growing number of data silos, as relational database management system (DBMS) environments are augmented by big data systems and other new data sources. They also pointed to increased technology maturity that has removed deployment barriers for data virtualization users.