AITopics

2510.17843

Country: Asia > China (0.29)

Genre:

Research Report (0.64)
Workflow (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Rafiei, Davood, Heisler, Morgan Lindsay, Zhang, Weiwei, Pourreza, Mohammadreza, Zhang, Yong

Do LLMs Align with My Task? Evaluating Text-to-SQL via Dataset Alignment

arXiv.org Artificial IntelligenceOct-7-2025

Supervised Fine-Tuning (SFT) is an effective method for adapting Large Language Models (LLMs) on downstream tasks. However, variability in training data can hinder a model's ability to generalize across domains. This paper studies the problem of dataset alignment for Natural Language to SQL (NL2SQL or text to SQL), examining how well SFT training data matches the structural characteristics of target queries and how this alignment impacts model performance. We hypothesize that alignment can be accurately estimated by comparing the distributions of structural SQL features across the training set, target data, and the model's predictions prior to SFT. Through comprehensive experiments on three large cross-domain NL2SQL benchmarks and multiple model families, we show that structural alignment is a strong predictor of fine-tuning success. When alignment is high, SFT yields substantial gains in accuracy and SQL generation quality; when alignment is low, improvements are marginal or absent. These findings highlight the importance of alignment-aware data selection for effective fine-tuning and generalization in NL2SQL tasks.

large language model, machine learning, natural language, (19 more...)

2510.04919

Country:

Asia (0.68)
North America > Canada > Alberta (0.28)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

WIREDMar-19-2025, 15:27:27 GMT

Nvidia Bets Big on Synthetic Data

Nvidia has acquired synthetic data firm Gretel for nine figures, according to two people with direct knowledge of the deal. The acquisition price exceeds Gretel's most recent valuation of 320 million, the sources say, though the exact terms of the purchase remain unknown. Gretel and its team of approximately 80 employees will be folded into Nvidia, where its technology will be deployed as part of the chip giant's growing suite of cloud-based, generative AI services for developers. The acquisition comes as Nvidia has been rolling out synthetic data generation tools, so that developers can train their own AI models and fine-tune them for specific apps. In theory, synthetic data could create a near-infinite supply of AI training data and help solve the data scarcity problem that has been looming over the AI industry since ChatGPT went mainstream in 2022--although experts say using synthetic data in generative AI comes with its own risks.

developer, machine learning, natural language, (11 more...)

WIRED

Industry: Information Technology > Hardware (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.50)

#artificialintelligenceApr-9-2023, 08:10:51 GMT

Synthetic Data Is About To Transform Artificial Intelligence

Imagine if it were possible to produce infinite amounts of the world's most valuable resource, cheaply and quickly. What dramatic economic transformations and opportunities would result? This is a reality today. It is called synthetic data. Synthetic data is not a new idea, but it is now approaching a critical inflection point in terms of real-world impact. It is poised to upend the entire value chain and technology stack for artificial intelligence, with immense economic implications. Data is the lifeblood of modern artificial intelligence. Getting the right data is both the most important and the most challenging part of building powerful AI.

dataset, fidelity, synthetic data, (15 more...)

Industry:

Banking & Finance (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.95)
Information Technology > Security & Privacy (0.69)

Technology:

Information Technology > Artificial Intelligence > Vision (0.96)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.47)

Xie, Qianqian, Huang, Jimin, Saha, Tulika, Ananiadou, Sophia

GRETEL: Graph Contrastive Topic Enhanced Language Model for Long Document Extractive Summarization

arXiv.org Artificial IntelligenceAug-21-2022

Recently, neural topic models (NTMs) have been incorporated into pre-trained language models (PLMs), to capture the global semantic information for text summarization. However, in these methods, there remain limitations in the way they capture and integrate the global semantic information. In this paper, we propose a novel model, the graph contrastive topic enhanced language model (GRETEL), that incorporates the graph contrastive topic model with the pre-trained language model, to fully leverage both the global and local contextual semantics for long document extractive summarization. To better capture and incorporate the global semantic information into PLMs, the graph contrastive topic model integrates the hierarchical transformer encoder and the graph contrastive learning to fuse the semantic information from the global document context and the gold summary. To this end, GRETEL encourages the model to efficiently extract salient sentences that are topically related to the gold summary, rather than redundant sentences that cover sub-optimal topics. Experimental results on both general domain and biomedical datasets demonstrate that our proposed method outperforms SOTA methods.

gold summary, representation, summarization, (15 more...)

2208.09982

Genre: Research Report > Promising Solution (0.34)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.94)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

#artificialintelligenceJun-13-2022, 01:15:05 GMT

Synthetic Data Is About To Transform Artificial Intelligence

These people do not exist. These faces were artificially generated using a form of deep learning ... [ ] known as generative adversarial networks (GANs). Synthetic data like this is becoming increasingly indistinguishable from real-world data. Imagine if it were possible to produce infinite amounts of the world's most valuable resource, cheaply and quickly. What dramatic economic transformations and opportunities would result? This is a reality today. It is called synthetic data. Synthetic data is not a new idea, but it is now approaching a critical inflection point in terms of real-world impact.

dataset, fidelity, synthetic data, (15 more...)

Industry:

Banking & Finance (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.95)
Information Technology > Security & Privacy (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.47)

Prado-Romero, Mario Alfonso, Stilo, Giovanni

GRETEL: A unified framework for Graph Counterfactual Explanation Evaluation

arXiv.org Artificial IntelligenceJun-6-2022

Machine Learning (ML) systems are a building part of the modern tools which impact our daily life in several application domains. Due to their black-box nature, those systems are hardly adopted in application domains (e.g. health, finance) where understanding the decision process is of paramount importance. Explanation methods were developed to explain how the ML model has taken a specific decision for a given case/instance. Graph Counterfactual Explanations (GCE) is one of the explanation techniques adopted in the Graph Learning domain. The existing works of Graph Counterfactual Explanations diverge mostly in the problem definition, application domain, test data, and evaluation metrics, and most existing works do not compare exhaustively against other counterfactual explanation techniques present in the literature. We present GRETEL, a unified framework to develop and test GCE methods in several settings. GRETEL is a highly extensible evaluation framework which promotes the Open Science and the evaluations reproducibility by providing a set of well-defined mechanisms to integrate and manage easily: both real and synthetic datasets, ML models, state-of-the-art explanation techniques, and evaluation measures. To present GRETEL, we show the experiments conducted to integrate and test several synthetic and real datasets with several existing explanation techniques and base ML models.

graph counterfactual explanation evaluation, machine learning, natural language, (3 more...)

doi: 10.1145/3511808.3557608

2206.02957

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

#artificialintelligenceMay-2-2022, 13:42:53 GMT

Career roadmap: Machine learning scientist

Like machine learning engineers, machine learning scientists are in high demand in today's job market. That's because organizations are eager to adopt machine learning-powered tools to enhance the value of their data and analytics and add automation to processes. Amy Steier, principal machine learning scientist at the developer tools provider, Gretel.ai. Demand for machine learning technologies is on the rise, according to market research. Potential applications include customer segmentation and investment prediction in the financial services sector; image analytics, drug discovery and personalized treatment in healthcare; and inventory planning and cross-channel marketing in retail.

gretel, scientist, steier, (15 more...)

Country: North America > United States > California > San Diego County > San Diego (0.05)

Industry:

Information Technology > Security & Privacy (1.00)
Education > Educational Setting (0.71)
Banking & Finance (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Weber, Marcus, Fackeldey, Konstantin

The Mathematics of Comparing Objects

arXiv.org Artificial IntelligenceJan-14-2022

After reading two different crime stories, an artificial intelligence concludes that in both stories the police has found the murderer just by random.

algorithm, characteristic, mathematics, (14 more...)

2201.07032

Country:

Europe > Germany > Berlin (0.04)
Europe > Germany > Lower Saxony > Gottingen (0.04)

Genre: Research Report (0.82)

Industry:

Media > Film (0.93)
Leisure & Entertainment (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)

#artificialintelligenceDec-30-2021, 21:56:03 GMT

Gretel.ai - Privacy Engineering as a Service

Keeping the pace with development velocity requires faster access to data. Gretel is accelerating access to data with data privacy tools that bypass blockers and fuel Machine Learning and AI applications.

gretel, privacy engineering

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.55)