AITopics | data catalog

Collaborating Authors

data catalog

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Exploring LLM Capabilities in Extracting DCAT-Compatible Metadata for Data Cataloging

Busch, Lennart, Tebernum, Daniel, Velarde, Gissel

arXiv.org Artificial IntelligenceJul-11-2025

Efficient data exploration is crucial as data becomes increasingly important for accelerating processes, improving forecasts and developing new business models. Data consumers often spend 25-98 % of their time searching for suitable data due to the exponential growth, heterogeneity and distribution of data. Data catalogs can support and accelerate data exploration by using metadata to answer user queries. However, as metadata creation and maintenance is often a manual process, it is time-consuming and requires expertise. This study investigates whether LLMs can automate metadata maintenance of text-based data and generate high-quality DCAT-compatible metadata. We tested zero-shot and few-shot prompting strategies with LLMs from different vendors for generating metadata such as titles and keywords, along with a fine-tuned model for classification. Our results show that LLMs can generate metadata comparable to human-created content, particularly on tasks that require advanced semantic understanding. Larger models outperformed smaller ones, and fine-tuning significantly improves classification accuracy, while few-shot prompting yields better results in most cases. Although LLMs offer a faster and reliable way to create metadata, a successful application requires careful consideration of task-specific criteria and domain context.

large language model, llama 3, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.5220/0013458500003967

2507.05282

Country:

North America > United States (1.00)
Europe (1.00)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Leveraging Retrieval Augmented Generative LLMs For Automated Metadata Description Generation to Enhance Data Catalogs

Singh, Mayank, Kumar, Abhijeet, Donaparthi, Sasidhar, Karambelkar, Gayatri

arXiv.org Artificial IntelligenceMar-11-2025

Data catalogs serve as repositories for organizing and accessing diverse collection of data assets, but their effectiveness hinges on the ease with which business users can look-up relevant content. Unfortunately, many data catalogs within organizations suffer from limited searchability due to inadequate metadata like asset descriptions. Hence, there is a need of content generation solution to enrich and curate metadata in a scalable way. This paper explores the challenges associated with metadata creation and proposes a unique prompt enrichment idea of leveraging existing metadata content using retrieval based fewshot technique tied with generative large language models (LLM). The literature also considers finetuning an LLM on existing content and studies the behavior of few-shot pretrained LLM (Llama, GPT3.5) vis-à-vis few-shot finetuned LLM (Llama2-7b) by evaluating their performance based on accuracy, factual grounding, and toxicity. Our preliminary results exhibit more than 80% Rouge-1 F1 for the generated content. This implied 87%- 88% of instances accepted as is or curated with minor edits by data stewards. By automatically generating descriptions for tables and columns in most accurate way, the research attempts to provide an overall framework for enterprises to effectively scale metadata curation and enrich its data catalog thereby vastly improving the data catalog searchability and overall usability. NTRODUCTION In the modern digital ecosystem, locating relevant data has become increasingly challenging due to the rapid expansion of data assets.

data catalog, information, metadata, (14 more...)

arXiv.org Artificial Intelligence

2503.09003

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

LEDD: Large Language Model-Empowered Data Discovery in Data Lakes

An, Qi, Ying, Chihua, Zhu, Yuqing, Xu, Yihao, Zhang, Manwei, Wang, Jianmin

arXiv.org Artificial IntelligenceFeb-20-2025

Data discovery in data lakes with ever increasing datasets has long been recognized as a big challenge in the realm of data management, especially for semantic search of and hierarchical global catalog generation of tables. While large language models (LLMs) facilitate the processing of data semantics, challenges remain in architecting an end-to-end system that comprehensively exploits LLMs for the two semantics-related tasks. In this demo, we propose LEDD, an end-to-end system with an extensible architecture that leverages LLMs to provide hierarchical global catalogs with semantic meanings and semantic table search for data lakes. Specifically, LEDD can return semantically related tables based on natural-language specification. These features make LEDD an ideal foundation for downstream tasks such as model training and schema linking for text-to-SQL tasks. LEDD also provides a simple Python interface to facilitate the extension and the replacement of data discovery algorithms.

algorithm, data lake, ledd, (13 more...)

arXiv.org Artificial Intelligence

2502.15182

Country:

Europe > Germany > Berlin (0.05)
Asia > China > Beijing > Beijing (0.05)
Asia > China > Fujian Province > Xiamen (0.05)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Metadata driven development realises "smart manufacturing" of data ecosystems – blog 3 - Solita Data

#artificialintelligenceJan-23-2023, 16:15:54 GMT

This is the third part of the blog series. The 1st blog focused on the maturity model and explained how the large monolith data warehouses were created. The 2nd blog focused on metadata driven development or "smart manufacturing" of data ecosystems. This 3rd blog will talk about reverse engineering or how existing data assets can be discovered to accelerate the development of new data products. Companies have increasing pressure to start addressing the data silos to reduce cost, improve agility & accelerate innovation, but they struggle to deliver value from their data assets. Many companies have hundreds of systems, containing thousands of databases hundreds of thousands of tables, millions of columns, and millions of lines of code across many different technologies. The starting point is a "data spaghetti" that nobody knows well.

artificial intelligence, data asset, data mining, (17 more...)

#artificialintelligence

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence (1.00)

Add feedback

AWS launches DataZone, a new ML-based data management service • TechCrunch

#artificialintelligenceNov-30-2022, 02:56:14 GMT

At its re:Invent conference, AWS today announced Amazon DataZone, a new data management service that can help enterprises catalog, discover, share and -- most importantly -- govern their data. The nifty part here is that AWS is using machine learning to help businesses build these data catalogs and generate the metadata to make it searchable. "To unlock the full power, the full value of data, we need to make it easy for the right people and applications to find, access and share the right data when they need it -- and to keep data safe and secure," AWS CEO Adam Selipsky said in today's keynote. The tool will provide users with fine-grained controls to manage and govern this data. That's long been a major problem for enterprises, but it has only gotten harder as the amount of data has increased, ensuring that the right users have access to the right data, without compromising personally identifiable information, for example.

aw launch datazone, datazone, techcrunch, (3 more...)

#artificialintelligence

Technology:

Information Technology > Information Management (0.76)
Information Technology > Artificial Intelligence > Machine Learning (0.76)

Add feedback

What is Data Governance? Top Data Governance Tools for Data Science and Machine Learning Research in 2022

#artificialintelligenceNov-11-2022, 23:25:27 GMT

The process of developing internal data standards and enacting rules governing who has access to data and how it is utilized for analytical applications and business operations is known as data governance. A good data governance program guarantees that data is reliable, consistent, and accessible and that its use complies with applicable rules and regulations regarding data protection. In addition to master data management (MDM) projects, it frequently includes data quality improvement initiatives. Software of this type offers features that facilitate the formulation of data governance policies, the construction of business glossaries and data catalogs, data mapping and classification, workflow management, collaboration, and process documentation. Software for data governance can be used in conjunction with MDM, metadata management, and data quality solutions. Data governance aims to promote confident decisions supported by solid data resources. Building policies that define data ownership, duties, and delegates are the goal of data governance.

data catalog, data governance, platform, (12 more...)

#artificialintelligence

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Quality (1.00)
Information Technology > Data Science > Data Integration (1.00)
(2 more...)

Add feedback

Data Discovery for ML Engineers / DataScienceCentral.com

#artificialintelligenceApr-5-2022, 23:43:03 GMT

Real-world production ML systems consist of two main components: data and code. Data is clearly the leader, and rapidly taking center stage. Data defines the quality of almost any ML-based product, more so than code or any other aspect. In Feature Store as a Foundation for Machine Learning, we have discussed how feature stores are an integral part of the machine learning workflow. They improve the ROI of data engineering, reduce cost per model, and accelerate model-to-market by simplifying feature definition and extraction.

data catalog, data discovery, feature store, (9 more...)

#artificialintelligence

Genre: Workflow (0.71)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.47)

Add feedback

Alation Acquires Artificial Intelligence Vendor Lyngo Analytics

#artificialintelligenceOct-15-2021, 04:40:12 GMT

WIRE)--Alation Inc., the leader in enterprise data intelligence solutions, today announced the acquisition of Lyngo Analytics, a Los Altos, Calif.-based data insights company. The acquisition will elevate the business user experience within the data catalog, scale data intelligence, and help organizations drive data culture. Lyngo Analytics CEO and co-founder Jennifer Wu and CTO and co-founder Joachim Rahmfeld will join the company. Lyngo Analytics uses a natural language interface to empower users to discover data and insights by asking questions using simple, familiar business terms. Alation offers the most intelligent and user-friendly machine-learning data catalog on the market.

acquisition, alation, lyngo analytic, (10 more...)

#artificialintelligence

Country:

North America > United States > California > Santa Clara County > Los Altos (0.26)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.07)
North America > United States > California > San Mateo County > Redwood City (0.06)

Genre:

Press Release (1.00)
Financial News (0.78)

Industry: Banking & Finance > Capital Markets (0.32)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.98)
Information Technology > Artificial Intelligence > Machine Learning (0.78)

Add feedback

Data Catalog

#artificialintelligenceAug-24-2021, 10:55:34 GMT

Data is key for the success of any business, and this is more relevant than ever before in the current crisis that industry and mankind are facing. Data insights will be a key driver in dealing with the situation of COVID-19 and it will be instrumental in finding the cure as well. Data insights are also important for the financial industry to read the current and upcoming market trends as events unfold every day. After spending two decades of my career in the financial industry, I have realized that most firms lag in data maturity, and this crisis is revealing many loopholes in their governance process. As I start my journey into retail and transportation with my recent client, I am realizing that never before was data so important for the retail sector, and especially for grocers as it is now.

collaborate, data catalog, metadata, (13 more...)

#artificialintelligence

Industry:

Health & Medicine > Therapeutic Area > Immunology (0.94)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.71)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Communications > Social Media (0.71)
Information Technology > Data Science (0.70)
Information Technology > Databases (0.69)

Add feedback

Application of Artificial Intelligence in Business Transformation

#artificialintelligenceAug-1-2021, 06:45:05 GMT

Artificial Intelligence or AI is the use of algorithms that simulates human behavior to perform cognitive functions. Use AI to solve problems through interaction, learning, visual perception, planning, reasoning, and natural language processing. Artificial Intelligence is a broad and generic term like any other computer software program which engages human-like processes. Thus, calling an application AI might be correct but will not cover its specifics. The most widely available notion about AI is emerging out of a sci-fi movie.

application, artificial intelligence, business transformation, (11 more...)

#artificialintelligence

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (0.94)
Government > Military > Cyberwarfare (0.32)

Technology: Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.36)

Add feedback