AITopics | automated data science

Collaborating Authors

automated data science

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Large Language Models for Automated Data Science: Introducing CAAFE for Context-Aware Automated Feature Engineering

Neural Information Processing SystemsDec-26-2025, 07:42:17 GMT

As the field of automated machine learning (AutoML) advances, it becomes increasingly important to incorporate domain knowledge into these systems.We present an approach for doing so by harnessing the power of large language models (LLMs). Specifically, we introduce Context-Aware Automated Feature Engineering (CAAFE), a feature engineering method for tabular datasets that utilizes an LLM to iteratively generate additional semantically meaningful features for tabular datasets based on the description of the dataset. The method produces both Python code for creating new features and explanations for the utility of the generated features.Despite being methodologically simple, CAAFE improves performance on 11 out of 14 datasets -- boosting mean ROC AUC performance from 0.798 to 0.822 across all dataset - similar to the improvement achieved by using a random forest instead of logistic regression on our datasets. Furthermore, CAAFE is interpretable by providing a textual explanation for each generated feature.CAAFE paves the way for more extensive semi-automation in data science tasks and emphasizes the significance of context-aware solutions that can extend the scope of AutoML systems to semantic AutoML. We release our code, a simple demo and a python package .

automated data science, context-aware automated feature engineering, dataset, (7 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.86)

Add feedback

Large Language Models for Automated Data Science: Introducing CAAFE for Context-Aware Automated Feature Engineering

Neural Information Processing SystemsJan-19-2025, 14:07:21 GMT

automated data science, context-aware automated feature engineering, dataset, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

DS-Agent: Automated Data Science by Empowering Large Language Models with Case-Based Reasoning

Guo, Siyuan, Deng, Cheng, Wen, Ying, Chen, Hechang, Chang, Yi, Wang, Jun

arXiv.org Artificial IntelligenceMay-28-2024

In this work, we investigate the potential of large language models (LLMs) based agents to automate data science tasks, with the goal of comprehending task requirements, then building and training the best-fit machine learning models. Despite their widespread success, existing LLM agents are hindered by generating unreasonable experiment plans within this scenario. To this end, we present DS-Agent, a novel automatic framework that harnesses LLM agent and case-based reasoning (CBR). In the development stage, DS-Agent follows the CBR framework to structure an automatic iteration pipeline, which can flexibly capitalize on the expert knowledge from Kaggle, and facilitate consistent performance improvement through the feedback mechanism. Moreover, DS-Agent implements a low-resource deployment stage with a simplified CBR paradigm to adapt past successful solutions from the development stage for direct code generation, significantly reducing the demand on foundational capabilities of LLMs. Empirically, DS-Agent with GPT-4 achieves 100\% success rate in the development stage, while attaining 36\% improvement on average one pass rate across alternative LLMs in the deployment stage. In both stages, DS-Agent achieves the best rank in performance, costing \$1.60 and \$0.13 per run with GPT-4, respectively. Our data and code are open-sourced at https://github.com/guosyjlu/DS-Agent.

automated data science, submission, torch, (14 more...)

arXiv.org Artificial Intelligence

2402.17453

Country:

Europe > Austria > Vienna (0.14)
Asia > China > Jilin Province (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning (1.00)

Add feedback

Large Language Models for Automated Data Science: Introducing CAAFE for Context-Aware Automated Feature Engineering

Hollmann, Noah, Müller, Samuel, Hutter, Frank

arXiv.org Artificial IntelligenceSep-28-2023

As the field of automated machine learning (AutoML) advances, it becomes increasingly important to incorporate domain knowledge into these systems. We present an approach for doing so by harnessing the power of large language models (LLMs). Specifically, we introduce Context-Aware Automated Feature Engineering (CAAFE), a feature engineering method for tabular datasets that utilizes an LLM to iteratively generate additional semantically meaningful features for tabular datasets based on the description of the dataset. The method produces both Python code for creating new features and explanations for the utility of the generated features. Despite being methodologically simple, CAAFE improves performance on 11 out of 14 datasets -- boosting mean ROC AUC performance from 0.798 to 0.822 across all dataset - similar to the improvement achieved by using a random forest instead of logistic regression on our datasets. Furthermore, CAAFE is interpretable by providing a textual explanation for each generated feature. CAAFE paves the way for more extensive semi-automation in data science tasks and emphasizes the significance of context-aware solutions that can extend the scope of AutoML systems to semantic AutoML. We release our $\href{https://github.com/automl/CAAFE}{code}$, a simple $\href{https://colab.research.google.com/drive/1mCA8xOAJZ4MaB_alZvyARTMjhl6RZf0a}{demo}$ and a $\href{https://pypi.org/project/caafe/}{python\ package}$.

automated data science, context-aware automated feature engineering, language model, (1 more...)

arXiv.org Artificial Intelligence

2305.03403

Genre: Research Report (0.89)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Automated Data Science and Machine Learning Platforms Market – increasing demand with …

#artificialintelligenceDec-31-2022, 01:15:52 GMT

Automated Data Science and Machine Learning Platforms report aims to facilitate business growth with in-depth understanding of business …

automated data science

#artificialintelligence

Industry: Media > News (0.72)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Top 10 Automated Data Science and Machine Learning Platforms in 2020

#artificialintelligenceMar-20-2020, 05:33:20 GMT

The employment of Data Science and Machine Learning technologies is at a peak. We can see several software and tools with various innovative features in the market that serve us with the efficiency of new-age data technologies that can potentially increase a business's efficiency and value proposition. With continuous evolution at scale such solutions too, get revamped with time. Now is the era for automated data science and machine learning software that not only enhance the operational proficiency of such tools but also assist data scientists with great potential. They help automate the repetitive and mundane tasks within the ML or data science processes without compromising model performance and productivity. Therefore, here is the list of top 10 automated data science and machine learning software presented by some key players of the respective market.

data science, machine learning platform, science and machine learning platform, (12 more...)

#artificialintelligence

Industry: Health & Medicine (0.30)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Get Ahead of Automated Machine Learning (AutoML) to Accelerate Your AI Roadmap

#artificialintelligenceFeb-18-2019, 15:02:19 GMT

Being great at data science to keep your business ahead of the competition curve is finally becoming more affordable and less complex to manage as open source technology becomes commonplace. In this POV, we'll explore the emergence of Automated Machine Learning (AutoML) which is making it much more feasible to use machine learning algorithms to develop machine learning algorithms. This is how quickly the AI industry is progressing today. We are already seeing the data science community explore ways to make analytics and machine learning tasks cheaper, faster, easier and increasingly automous and self-remediating. Business leaders should prepare for automated data science to become commonplace – not necessarily as a way to entirely replace data scientists, but to boost significantly their capabilities and provide a starting point to ML. AutoML is a step in this direction.

automl, data science, data scientist, (11 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

'Automated Data Science' to offer a competitive edge to enterprises - CIOL

#artificialintelligenceNov-16-2018, 03:54:05 GMT

According to a recent Indian jobs study, data science is one of the topmost and fastest growing fields in India and its relevance is increasing in almost every sector. Reports from NASSCOM suggests that India's data industry would reach $16 billion by 2025 from the present level of $2 billion. At the core of it, data science is the science of examining raw data and applying statistical techniques for the purpose of drawing business related conclusions and predicting business outcomes. In every organization, there are opportunities to implement data science and transform the way business is carried out. Leading analysts like Gartner and Forrester have quoted 2018 as a milestone year for organizations, with over 70% of them expected to leverage data science for Business Optimization.

artificial intelligence, data science, machine learning, (8 more...)

#artificialintelligence

Country: Asia > India (0.48)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.55)

Add feedback

Cartoon: Data Scientist was the sexiest job of the 21st century until …

#artificialintelligenceJul-15-2018, 01:41:11 GMT

We revisit our popular cartoon, which has not lost any relevancy. A few years ago the Harvard Business Review article by Thomas Davenport and DJ Patil proclaimed Data Scientist: The Sexiest Job of the 21st Century But here is what may be coming ... Data Scientist: "I thought I had the sexiest job of the 21st century" This cartoon was ably drawn by Jon Carter. Here are more KDnuggets posts on Data Science automation Automated Machine Learning vs Automated Data Science The Current State of Automated Machine Learning Automated Data Science & Machine Learning: An Interview with the Auto-sklearn Team Contest Winner: Winning the AutoML Challenge with Auto-sklearn Contest 2nd Place: Automating Data Science Data Science Automation: Debunking Misconceptions and KDnuggets tags Automated Data Science, Here is KDnuggets Big Data, Data Mining, and Data Science Cartoon page More recent KDnuggets Cartoons Cartoon: FIFA World Cup Football and Machine Learning Cartoon: GDPR first effect on Privacy ...

artificial intelligence, data mining, machine learning, (16 more...)

#artificialintelligence

Industry:

Information Technology (1.00)
Leisure & Entertainment > Sports > Soccer (0.99)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.62)

Add feedback

Contest 2nd Place: Automated Data Science and Machine Learning in Digital Advertising

#artificialintelligenceNov-6-2016, 19:15:25 GMT

Editor's note: This blog post was an entrant in the recent KDnuggets Automated Data Science and Machine Learning blog contest, where it tied for second place. Digital Advertising provides an exciting playground for machine learning in general and automated predictive modeling in particular. An increasing proportion of digital advertising is delivered through real-time bidding ad exchanges. Ad exchanges connect sellers of ad placements (usually websites with ad space to monetize) and buyers (usually technology firms like Dstillery, operating on behalf of consumer brands and agencies). The goals of the buyers vary.

artificial intelligence, automated data science, machine learning, (11 more...)

#artificialintelligence

Industry:

Marketing (1.00)
Information Technology > Services (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.30)

Add feedback