Goto

Collaborating Authors

 data prep


AutoPrep: Natural Language Question-Aware Data Preparation with a Multi-Agent Framework

Fan, Meihao, Fan, Ju, Tang, Nan, Cao, Lei, Li, Guoliang, Du, Xiaoyong

arXiv.org Artificial Intelligence

Answering natural language (NL) questions about tables, known as Tabular Question Answering (TQA), is crucial because it allows users to quickly and efficiently extract meaningful insights from structured data, effectively bridging the gap between human language and machine-readable formats. Many of these tables are derived from web sources or real-world scenarios, which require meticulous data preparation (or data prep) to ensure accurate responses. However, preparing such tables for NL questions introduces new requirements that extend beyond traditional data preparation. This question-aware data preparation involves specific tasks such as column augmentation and filtering tailored to particular questions, as well as question-aware value normalization or conversion, highlighting the need for a more nuanced approach in this context. Because each of the above tasks is unique, a single model (or agent) may not perform effectively across all scenarios. In this paper, we propose AutoPrep, a large language model (LLM)-based multi-agent framework that leverages the strengths of multiple agents, each specialized in a certain type of data prep, ensuring more accurate and contextually relevant responses. Given an NL question over a table, AutoPrep performs data prep through three key components. Planner: Determines a logical plan, outlining a sequence of high-level operations. Programmer: Translates this logical plan into a physical plan by generating the corresponding low-level code. Executor: Executes the generated code to process the table. To support this multi-agent framework, we design a novel Chain-of-Clauses reasoning mechanism for high-level operation suggestion, and a tool-augmented method for low-level code generation.


Global Big Data Conference

#artificialintelligence

Redbird, a New York-based enterprise analytics operating system, announced it has raised $7.6 million in an oversubscribed seed round. The Redbird platform allows non-technical users to automate and unify analytics work without writing code and connects all data sources into a no-code environment for data prep, wrangling, analysis, reporting, and data science, according to a company release. Though the company touts its platform's no-code features as friendly for non-technical users, it also aims to make life easier for data professionals: "Even for technical teams with these [data] skill sets, it can be challenging and time consuming, ultimately distracting them from higher value work. We created Redbird with the goal of making it easier for organizations who would like all of their employees to be equipped with a more unified, automated, and accessible approach to doing this type of work," said Erin Tavgac, Redbird CEO and co-founder. Data engineers may find Redbird helpful for building data integrations, managing ETL workflows, provisioning data views, and maintaining data science models.


Building AI Models for High-Frequency Streaming Data – Part Two - KDnuggets

#artificialintelligence

AI continues making headlines in the data science community, and predictive models are front and center in engineering applications such as autonomous driving and equipment monitoring. Introducing AI models into engineering systems can be challenging, however, especially when predictions must be reported in near real-time on data from multiple sensors. Many data scientists have implemented machine or deep learning algorithms on static data or in batch, but what considerations must you make when building models for a streaming environment? In this post, we will discuss these considerations. If streaming movies or music comes to mind, you've got the right idea! Data is incoming continuously, but instead of simply watching, actions must be taken based on the information.


Building AI Models for High-Frequency Streaming Data - KDnuggets

#artificialintelligence

We hear about AI everywhere. Machine learning models are now incorporated into several applications, such as medical devices and automated vehicles. These systems include many sensors, streaming data from hardware. The model is applied to the data in the stream and predictions are sent to a dashboard, database, or another device (repeatedly!). Data prep and model development challenges are exacerbated with such high-frequency, time-series data.


The predictive value of social media data - MODULE 3 - Data Prep: Preparing the Training Data

#artificialintelligence

Machine learning runs the world. It generates predictions for each individual customer, employee, voter, and suspect, and these predictions drive millions of business decisions more effectively, determining whom to call, mail, approve, test, diagnose, warn, investigate, incarcerate, set up on a date, or medicate. But, to make this work, you've got to bridge what is a prevalent gap between business leadership and technical know-how. Launching machine learning is as much a management endeavor as a technical one. Its success relies on a very particular business leadership practice.


The AI Ecosystem is a MESS

#artificialintelligence

Over the last several years, there's been a rush to find out how to integrate AI into businesses, and it's no secret that doing so could offer huge comparative advantages. But for all the hype, AI in businesses is still very much in the early phase. Our team hails from Uber, Google, Facebook and Adobe where we've seen both the positives and challenges of deploying AI across business lines. Most companies don't have the same resources to build in-house tools, deeply measure results and fund extensive research. Our goal with this blog is to use our in-depth knowledge of the AI space to make sense of the ecosystem, cut through the hype, and provide insights that can help you with your AI investment decisions across the pipeline.


Power BI - AI and Q&A updates, decomposition tree, and data prep. (Microsoft Ignite)

#artificialintelligence

Demonstration of the latest Power BI capabilities to help you to discover and explore insights in your data including: Automated Machine Learning for predictive modelling. This is Power BI's role as part of the Power Platform helping you to prepare your data across sources. At Microsoft Ignite 2019, this was session THR2285: Microsoft Power BI and the Power Platform: Dataflows, AI, and new visualizations. Justyna Lucznik is a Program Manager for the AI features of Power BI. She has worked on AI visualizations like the decomposition tree and key influencers.


ML and BI Are Coming Together, Gartner Says

#artificialintelligence

The convergence of machine learning and business intelligence is upon us, as BI tool makers increasingly are exposing ML capabilities to users, and users are performing ML activities in their BI tools. That's according to the latest Gartner report on analytics and BI tools, which was released this week. In its February 11 Magic Quadrant for Analytics and Business Intelligence (ABI) Platforms, the storied Stamford, Connecticut analyst firm did its best to quantify and qualify the trends in the sector. While BI and ML have largely existed on parallel tracks, with BI seeking to report what happened and ML seeking to predict what will happen, Gartner sees the two disciplines converging, at least as far as the toolsets are concerned. Not all ML work will occur within BI tools, of course.


AutoML on Databricks: Augmenting Data Science from Data Prep to Operationalization - The Databricks Blog

#artificialintelligence

Thousands of data science jobs are going unfilled today as global demand for the talent greatly outstrips supply. Every day, businesses pay the price of the data scientist shortage in missed opportunities and slow innovation. For organizations to realize the full potential of machine learning, data teams have to build hundreds of predictive models a year. For most enterprises, only a fraction of that number is actually achieved due to understaffed data science teams. Databricks can help data science teams be more productive by automating various steps of the data science workflow – including feature engineering, hyperparameter tuning, model search, and deployment – for a fully controlled and transparent augmented ML experience.


"Above the Trend Line" – Your Industry Rumor Central for 12/17/2019 - insideBIGDATA

#artificialintelligence

Above the Trend Line: your industry rumor central is a recurring feature of insideBIGDATA. In this column, we present a variety of short time-critical news items grouped by category such as M&A activity, people movements, funding news, industry partnerships, customer wins, rumors and general scuttlebutt floating around the big data, data science and machine learning industries including behind-the-scenes anecdotes and curious buzz. Our intent is to provide you a one-stop source of late-breaking news to help you keep abreast of this fast-paced ecosystem. We're working hard on your behalf with our extensive vendor network to give you all the latest happenings. Be sure to Tweet Above the Trend Line articles using the hashtag: #abovethetrendline.