Goto

Collaborating Authors

 plotly


VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation

Ni, Yuansheng, Nie, Ping, Zou, Kai, Yue, Xiang, Chen, Wenhu

arXiv.org Artificial Intelligence

Large language models (LLMs) often struggle with visualization tasks like plotting diagrams, charts, where success depends on both code correctness and visual semantics. Existing instruction-tuning datasets lack execution-grounded supervision and offer limited support for iterative code correction, resulting in fragile and unreliable plot generation. We present VisCode-200K, a large-scale instruction tuning dataset for Python-based visualization and self-correction. It contains over 200K examples from two sources: (1) validated plotting code from open-source repositories, paired with natural language instructions and rendered plots; and (2) 45K multi-turn correction dialogues from Code-Feedback, enabling models to revise faulty code using runtime feedback. We fine-tune Qwen2.5-Coder-Instruct on VisCode-200K to create VisCoder, and evaluate it on PandasPlotBench. VisCoder significantly outperforms strong open-source baselines and approaches the performance of proprietary models like GPT-4o-mini. We further adopt a self-debug evaluation protocol to assess iterative repair, demonstrating the benefits of feedback-driven learning for executable, visually accurate code generation.


A Visualization Framework for Exploring Multi-Agent-Based Simulations Case Study of an Electric Vehicle Home Charging Ecosystem

Christensen, Kristoffer, Jørgensen, Bo Nørregaard, Ma, Zheng Grace

arXiv.org Artificial Intelligence

Multi-agent-based simulations (MABS) of electric vehicle (EV) home charging ecosystems generate large, complex, and stochastic time-series datasets that capture interactions between households, grid infrastructure, and energy markets. These interactions can lead to unexpected system-level events, such as transformer overloads or consumer dissatisfaction, that are difficult to detect and explain through static post-processing. This paper presents a modular, Python-based dashboard framework, built using Dash by Plotly, that enables efficient, multi-level exploration and root-cause analysis of emergent behavior in MABS outputs. The system features three coordinated views (System Overview, System Analysis, and Consumer Analysis), each offering high-resolution visualizations such as time-series plots, spatial heatmaps, and agent-specific drill-down tools. A case study simulating full EV adoption with smart charging in a Danish residential network demonstrates how the dashboard supports rapid identification and contextual explanation of anomalies, including clustered transformer overloads and time-dependent charging failures. The framework facilitates actionable insight generation for researchers and distribution system operators, and its architecture is adaptable to other distributed energy resources and complex energy systems.


Drawing Pandas: A Benchmark for LLMs in Generating Plotting Code

Galimzyanov, Timur, Titov, Sergey, Golubev, Yaroslav, Bogomolov, Egor

arXiv.org Artificial Intelligence

This paper introduces the human-curated PandasPlotBench dataset, designed to evaluate language models' effectiveness as assistants in visual data exploration. Our benchmark focuses on generating code for visualizing tabular data - such as a Pandas DataFrame - based on natural language instructions, complementing current evaluation tools and expanding their scope. The dataset includes 175 unique tasks. Our experiments assess several leading Large Language Models (LLMs) across three visualization libraries: Matplotlib, Seaborn, and Plotly. We show that the shortening of tasks has a minimal effect on plotting capabilities, allowing for the user interface that accommodates concise user input without sacrificing functionality or accuracy. Another of our findings reveals that while LLMs perform well with popular libraries like Matplotlib and Seaborn, challenges persist with Plotly, highlighting areas for improvement. We hope that the modular design of our benchmark will broaden the current studies on generating visualizations. Our benchmark is available online: https://huggingface.co/datasets/JetBrains-Research/plot_bench. The code for running the benchmark is also available: https://github.com/JetBrains-Research/PandasPlotBench.


Interpreting Deep Neural Networks with the Package innsight

Koenen, Niklas, Wright, Marvin N.

arXiv.org Artificial Intelligence

The R package innsight offers a general toolbox for revealing variable-wise interpretations of deep neural networks' predictions with so-called feature attribution methods. Aside from the unified and user-friendly framework, the package stands out in three ways: It is generally the first R package implementing feature attribution methods for neural networks. Secondly, it operates independently of the deep learning library allowing the interpretation of models from any R package, including keras, torch, neuralnet, and even custom models. Despite its flexibility, innsight benefits internally from the torch package's fast and efficient array calculations, which builds on LibTorch $-$ PyTorch's C++ backend $-$ without a Python dependency. Finally, it offers a variety of visualization tools for tabular, signal, image data or a combination of these. Additionally, the plots can be rendered interactively using the plotly package.


A Solid Plan for Learning Data Science, Machine Learning, and Deep Learning - KDnuggets

#artificialintelligence

Here is a solid plan to do so. Enroll in The Data Science & Machine Learning Bootcamp in Python to start learning now. Python is the most popular language in Data Science, Machine Learning, and Deep Learning. It's fairly easy to understand. So I'd suggest that you start by familiarizing yourself with the language.


Top Python Libraries For Data Science with Free Courses

#artificialintelligence

Dask is a powerful open-source Python parallel computing framework. Dask scales Python programs from single-core local workstations to huge distributed cloud clusters. Dask provides a familiar user experience by replicating the APIs of other PyData ecosystem programs like Pandas, Scikit-learn, and NumPy. It also offers low-level APIs that allow programmers to execute bespoke algorithms concurrently.


Plotly and NVIDIA Partner to Integrate Dash and RAPIDS

#artificialintelligence

We're pleased to announce that Plotly and NVIDIA are partnering to bring GPU-accelerated Artificial Intelligence (AI) & Machine Learning (ML) to a vastly wider audience of business users. By integrating the Plotly Dash frontend with the NVIDIA RAPIDS backend, we are offering one of the highest performance AI & ML stacks available in Python today. This is all open-source and accessible in a few lines of Python code. Once you've created a Dash RAPIDS app on your desktop, get it into the hands of business users by uploading it to DEK. No IT or devops team required .


Best Python Libraries For data science In 2021

#artificialintelligence

Python is an interpreted, interactive, portable and object-oriented programming language. This open-sourced general-purpose language runs on many Unix variants, including Linux and macOS, and Windows. Python has applications in hacking, computer vision, data visualisation, 3D Machine Learning, robotics, and is a favourite of developers worldwide. Developed by Google Brain Team, TensorFlow is an open-source library used for deep learning applications. Originally developed for numerical compilations, it offers a comprehensive and flexible ecosystem of tools, libraries and community resources, enabling developers to build and deploy ML-based applications.


Top 10 Python Data Science Libraries - KDnuggets

#artificialintelligence

Python continues to lead the way when it comes to Machine Learning, AI, Deep Learning and Data Science tasks. Because of this, we've decided to start a series investigating the top Python libraries across several categories: Of course, these lists are entirely subjective as many libraries could easily place in multiple categories. Now, let's get onto the list (GitHub figures correct as of November 16th, 2018): "pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python." "Matplotlib is a Python 2D plotting library which produces publication-quality figures in a variety of hardcopy formats and interactive environments across platforms. Matplotlib can be used in Python scripts, the Python and IPython shell (à la MATLAB or Mathematica), web application servers, and various graphical user interface toolkits."


7 Dash Apps Bringing AI & ML to Sports Analytics

#artificialintelligence

To learn more about how to use Dash for AI/ML Applications in Sports Analytics, register for our upcoming webinar on April 21st with Sebastian, Plotly's Product Marketing Coordinator As a sports fan, can you imagine this moment? It's the bottom of the ninth, two outs, 3–2 count, the batter focuses as he wags his bat over the plate… Countless hours and pure devotion by the athletes, coaches, and trainers lead up to the unfolding of these epic sports dramas. Here's a secret: the real heroes at the end of these contests… are often Data Scientists! That secret is spreading more every year: If you want the trophy at the end of your season, you must leverage Data Science, Machine Learning, and Artificial Intelligence in your organization's approach -- you must grow beyond the "eye-test." The Golden State Warriors, powerhouses of the 2010's after 40 years of futility, built a numbers strategy that is being emulated across the league. The NFL even hosts $100k-prize Kaggle competitions! From attaching harnesses to Rugby players for analyzing positioning and mitigating injury to the first ever Sports Analytics academic major, experts, enthusiasts, and educators alike are learning to use cutting-edge tools to help their teams win and follow their favorite games.