AITopics | data science code

Collaborating Authors

data science code

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Why Stop at One Error? Benchmarking LLMs as Data Science Code Debuggers for Multi-Hop and Multi-Bug Errors

Yang, Zhiyu, Wang, Shuo, Yan, Yukun, Deng, Yang

arXiv.org Artificial IntelligenceMar-28-2025

LLMs are transforming software development, yet current code generation and code repair benchmarks mainly assess syntactic and functional correctness in simple, single-error cases. LLMs' capabilities to autonomously find and fix runtime logical errors in complex data science code remain largely unexplored. To address this gap, we introduce DSDBench: the Data Science Debugging Benchmark, the first benchmark for systematic evaluation of LLMs on multi-hop error tracing and multi-bug detection in data science code debugging. DSDBench adapts datasets from existing data science task benchmarks, such as DABench and MatPlotBench, featuring realistic data science debugging tasks with automatically synthesized multi-hop, multi-bug code snippets. DSDBench includes 1,117 annotated samples with 741 cause-effect error pairs and runtime error messages. Evaluations of state-of-the-art LLMs on DSDBench show significant performance gaps, highlighting challenges in debugging logical runtime errors in data science code. DSDBench offers a crucial resource to evaluate and improve LLMs' debugging and reasoning capabilities, enabling more reliable AI-assisted data science in the future.DSDBench is publicly available at https://github.com/KevinCL16/DSDBench.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2503.22388

Country:

Asia > Thailand > Bangkok > Bangkok (0.04)
Asia > Singapore (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.75)

Add feedback

How I Started Tracking My ML Experiments Like a Pro

#artificialintelligenceNov-25-2020, 14:50:12 GMT

Line 5: We import the mlflow library Line 6: Here, we import the relevant mlflow.sklearn This entirely depends on which package the model is built on. The complete list of available modules can be found in the official MLflow Python API documentation. Line 7: Autologging is a recently introduced experimental feature that makes the MLflow integration hassle-free. This function automatically logs all the parameters, metrics and saves the model artifacts in one place.

documentation, mlflow, tracking, (10 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.43)

Add feedback

AI Is Compelling, But AI And Data Science Operations Must Improve

#artificialintelligenceNov-20-2018, 12:33:46 GMT

AI technology is starting to work really well. Unfortunately, I've found that the management of machine learning code, data sets and models -- and the integration of these into operational processes -- falls well short of enterprise standards. This can create blockers to adoption and reduce successful outcomes, even in organizations that have adopted AI. But organizations can take specific measures to mitigate the difficulties. I'll identify some wish-list items that could improve things.

artificial intelligence, data scientist, machine learning, (12 more...)

#artificialintelligence

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.98)

Add feedback

MLflow: A platform for managing the machine learning lifecycle

#artificialintelligenceJul-18-2018, 12:22:39 GMT

Check out the "Model lifecycle management" sessions at the Strata Data Conference in New York, September 11-13, 2018. Hurry--early price ends July 27. Although machine learning (ML) can produce fantastic results, using it in practice is complex. Beyond the usual challenges in software development, machine learning developers face new challenges, including experiment management (tracking which parameters, code, and data went into a result); reproducibility (running the same code and environment later); model deployment into production; and governance (auditing models and data used throughout an organization). These workflow challenges around the ML lifecycle are often the top obstacle to using ML in production and scaling it up within an organization.

artificial intelligence, machine learning, mlflow, (14 more...)

#artificialintelligence

Country: North America > United States > New York (0.25)

Industry: Education (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Introducing MLflow: an Open Source Machine Learning Platform - The Databricks Blog

#artificialintelligenceJun-5-2018, 23:47:36 GMT

Everyone who has tried to do machine learning development knows that it is complex. Beyond the usual concerns in the software development, machine learning (ML) development comes with multiple new challenges. It's hard to track experiments. Machine learning algorithms have dozens of configurable parameters, and whether you work alone or on a team, it is difficult to track which parameters, code, and data went into each experiment to produce a model. It's hard to reproduce results.

artificial intelligence, machine learning, mlflow, (12 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Day 14 of 365 Days of Data Science Code

#artificialintelligenceFeb-3-2018, 02:40:11 GMT

From Apache Beam (Dataflow) batch and streaming to wide and deep neural networks, I've started the journey of committing data science code to Github. Disclaimer, I'm currently focused on quantity and then stretching towards code that others can use. I'll be writing mostly in Python but I am an R lover so you'll see R occasionally as well.

data science code, machine learning, social media, (2 more...)

#artificialintelligence

Industry: Information Technology > Services (0.40)

Technology:

Information Technology > Data Science (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.78)
Information Technology > Communications > Social Media (0.76)

Add feedback

Why You Should Forget 'for-loop' for Data Science Code and Embrace Vectorization

@machinelearnbotNov-29-2017, 22:05:10 GMT

We all have used for-loops for majority of the tasks which needs an iteration over a long list of elements. I am sure almost everybody, who is reading this article, wrote their first code for matrix or vector multiplication using a for-loop back in high-school or college. For-loop has served programming community long and steady. However, it comes with some baggage and is often slow in execution when it comes to processing large data sets (many millions of records as in this age of Big Data). This is particularly true for interpreted language like Python, where, if the body of your loop is simple, the interpreter overhead of the loop itself can be a substantial amount of the overhead.

data mining, machine learning, science code and embrace vectorization, (10 more...)

@machinelearnbot

Industry: Education (0.37)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.37)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.37)

Add feedback

Why you should forget 'for-loop' for data science code and embrace vectorization

@machinelearnbotNov-24-2017, 19:20:17 GMT

artificial intelligence, machine learning, science code and embrace vectorization, (7 more...)

@machinelearnbot

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.98)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.37)

Add feedback