AITopics | data lineage

Collaborating Authors

data lineage

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The many layers of data lineage. What can we learn from google maps to…

#artificialintelligenceJan-15-2023, 02:15:43 GMT

Having a map showing how data evolves from its sources to its destination is the dream of any organisation. Like the gold rush, everyone is after that tool connecting together columns, tables and dashboards within the warehouse. But like gold, this visualisation has been always considered a privilege in the data ecosystem. Defining the lineage has been a manual task not accessible to everyone. Usually, only the ones working daily with the data transformation processes are aware of the actual flow of data -- and typically this lineage is a mix between what's in their minds, documented information and digging into different tools' metadata.

artificial intelligence, lineage, warehouse, (18 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence (0.70)

Add feedback

What is Data Governance? Top Data Governance Tools for Data Science and Machine Learning Research in 2022

#artificialintelligenceNov-11-2022, 23:25:27 GMT

The process of developing internal data standards and enacting rules governing who has access to data and how it is utilized for analytical applications and business operations is known as data governance. A good data governance program guarantees that data is reliable, consistent, and accessible and that its use complies with applicable rules and regulations regarding data protection. In addition to master data management (MDM) projects, it frequently includes data quality improvement initiatives. Software of this type offers features that facilitate the formulation of data governance policies, the construction of business glossaries and data catalogs, data mapping and classification, workflow management, collaboration, and process documentation. Software for data governance can be used in conjunction with MDM, metadata management, and data quality solutions. Data governance aims to promote confident decisions supported by solid data resources. Building policies that define data ownership, duties, and delegates are the goal of data governance.

data catalog, data governance, platform, (12 more...)

#artificialintelligence

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Quality (1.00)
Information Technology > Data Science > Data Integration (1.00)
(2 more...)

Add feedback

Why Data Cleaning Is Failing Your ML Models - And What To Do About It

#artificialintelligenceNov-2-2022, 02:11:55 GMT

Precise endeavors must be done to exacting standards in clean environments. Surgeons scrub in, rocket scientists work in clean rooms, and data scientists…well we try our best. We've all heard the platitude, "garbage in, garbage out," so we spend most of our time doing the most tedious part of the job: data cleaning. Unfortunately, no matter how hard we scrub, poor data quality is often too pervasive and invasive for a quick shower. Our research across the data stacks of more than 150 organizations shows an average of 70 impactful data incidents a year for every 1,000 tables in an environment.

data quality, data warehouse, dataset, (15 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Quality > Data Cleaning (0.62)

Add feedback

A "Glass Box" Approach to Responsible Machine Learning - insideBIGDATA

#artificialintelligenceJul-28-2022, 18:20:10 GMT

Machine learning doesn't always have to be an abstruse technology. The multi-parameter and hyper-parameter methodology of complex deep neural networks, for example, is only one type of this cognitive computing manifestation. There are other machine learning varieties (and even some involving deep neural networks) in which the results of models, how they were determined, and which intricacies influenced them, are much more transparent. It all depends on how well organizations understand their data provenance. Comprehending just about everything that happened to training data for models, as well as that for the production data models encounter, is integral to explaining, refining, and improving their results.

data provenance, minnick, responsible machine learning, (10 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.98)

Add feedback

Tracer : a machine learning approach to data lineage

#artificialintelligenceJul-2-2022, 03:20:45 GMT

The data lineage problem entails inferring the source of a data item. Unfortunately, most of the existing work in this area relies either on metadata, code analysis or data annotations. In contrast, our primary focus is to present a machine learning solution that uses the data itself to infer the lineage. This thesis will formally define the data lineage problem, specify the underlying assumptions under which we solved it, as well as provide a detailed description of how our system works.

data lineage, tracer

#artificialintelligence

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Model and data lineage in machine learning experimentation

#artificialintelligenceSep-21-2021, 23:42:14 GMT

Modern quantitative finance is based around the approach of pattern recognition in historical data. This approach requires teams of scientists to work in a collaborative and regulated setting in order to develop models that can be used to make trading predictions. With the growing influence of this field, both participants and regulators are looking to put in place mechanisms to understand how and why models have been developed, for reasons such as regulatory compliance and model reproducibility. We refer to this tractability problem as lineage. The challenge of reproducibility and lineage in machine learning (ML) is three-fold: code lineage, data lineage, and model lineage.

artifact, information, lineage, (16 more...)

#artificialintelligence

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Industry:

Banking & Finance (0.47)
Retail > Online (0.40)
Law (0.34)
Government (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Data Lineage in Machine Learning: Methods and Best Practices - neptune.ai

#artificialintelligenceAug-29-2021, 14:30:27 GMT

Data is supposed to be an organization's most treasured asset. However, it wasn't this way until recently, so very few people have experience in handling data and leveraging it to create more value. As managers are becoming more data-fluent, many organizations are adopting the practice of tracking data lineage, which has become steady support for driving organizations towards data efficiency. Data lineage is the story behind the data. It tracks the data from its creation point to the points of consumption.

data lineage, metadata, transformation, (11 more...)

#artificialintelligence

Industry: Information Technology > Security & Privacy (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.86)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.70)

Add feedback

How to Break Data Silos to Drive Enterprise-Wide AI

#artificialintelligenceJun-21-2021, 14:00:06 GMT

Not many people miss having to manually sort files, label papers, or search for lost forms in huge filing cabinets. That's because all these tasks have become way easier, faster, and more enjoyable since they've become digitized – computers and the internet have revolutionized the way businesses approach organization and task management. Similar to how computers and the internet made monotonous tasks faster and easier in every department, AI will transform work in every industry in the 21st century. Machine learning will automate away the most time-consuming and repetitive tasks across a company, along with offering predictions that will allow businesses to make better decisions ahead of time. Introducing these revolutionary processes takes time and specialized knowledge.

break data silo, data scientist, drive enterprise-wide ai, (2 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.62)

Add feedback

How to Break Data Silos to Drive Enterprise-Wide AI - Splice Machine

#artificialintelligenceJun-14-2021, 07:15:08 GMT

data scientist, drive enterprise-wide ai, feature store, (12 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.70)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.36)

Add feedback

What characterises the HANA SQL Data Warehouse?

#artificialintelligenceMar-17-2021, 19:45:58 GMT

As known from many articles and publications, SAP offers three solutions for data warehousing. The SAP Business Warehouse (BW) was first published in 1997 and has therefore been a constant figure in the SAP Data Warehouse range for more than two decades. With HANA as a database platform, the HANA SQL Data Warehouse approach has been developing since 2015, which initially consisted of loosely coupled tools, but has since evolved into an open, yet highly integrated set of tools and methods, that can also be used to develop large data warehouse systems. Since 2019, the Data Warehouse Cloud has been completing the SAP solution as a SaaS solution. These three approaches are not in competition.

data warehouse, data warehousing, warehouse, (13 more...)

#artificialintelligence

Industry: Information Technology > Software (0.34)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Integration (0.50)
Information Technology > Communications > Web (0.34)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.31)

Add feedback