Goto

Collaborating Authors

 mlflow


Evidence-Bound Autonomous Research (EviBound): A Governance Framework for Eliminating False Claims

Chen, Ruiying

arXiv.org Artificial Intelligence

LLM-based autonomous research agents report false claims: tasks marked "complete" despite missing artifacts, contradictory metrics, or failed executions. EviBound is an evidence-bound execution framework that eliminates false claims through dual governance gates requiring machine-checkable evidence. Two complementary gates enforce evidence requirements. The pre-execution Approval Gate validates acceptance criteria schemas before code runs, catching structural violations proactively. The post-execution Verification Gate validates artifacts via MLflow API queries (with recursive path checking) and optionally validates metrics when specified by acceptance criteria. Claims propagate only when backed by a queryable run ID, required artifacts, and FINISHED status. Bounded, confidence-gated retries (typically 1-2 attempts) recover from transient failures without unbounded loops. The framework was evaluated on 8 benchmark tasks spanning infrastructure validation, ML capabilities, and governance stress tests. Baseline A (Prompt-Level Only) yields 100% hallucination (8/8 claimed, 0/8 verified). Baseline B (Verification-Only) reduces hallucination to 25% (2/8 fail verification). EviBound (Dual Gates) achieves 0% hallucination: 7/8 tasks verified and 1 task correctly blocked at the approval gate, all with only approximately 8.3% execution overhead. This package includes execution trajectories, MLflow run IDs for all verified tasks, and a 4-step verification protocol. Research integrity is an architectural property, achieved through governance gates rather than emergent from model scale.


DeepTSF: Codeless machine learning operations for time series forecasting

Pelekis, Sotiris, Karakolis, Evangelos, Pountridis, Theodosios, Kormpakis, George, Lampropoulos, George, Mouzakitis, Spiros, Askounis, Dimitris

arXiv.org Artificial Intelligence

This paper presents DeepTSF, a comprehensive machine learning operations (MLOps) framework aiming to innovate time series forecasting through workflow automation and codeless modeling. DeepTSF automates key aspects of the ML lifecycle, making it an ideal tool for data scientists and MLops engineers engaged in machine learning (ML) and deep learning (DL)-based forecasting. DeepTSF empowers users with a robust and user-friendly solution, while it is designed to seamlessly integrate with existing data analysis workflows, providing enhanced productivity and compatibility. The framework offers a front-end user interface (UI) suitable for data scientists, as well as other higher-level stakeholders, enabling comprehensive understanding through insightful visualizations and evaluation metrics. The application of DeepTSF in real-life use cases of the I-NERGY project has already proven DeepTSF's efficacy in DL-based load forecasting, showcasing its significant added value in the electrical power and energy systems domain. Historically, time-series modeling has been a prominent area of interest in academic research, with diverse applications in fields such as climate modeling [42], biological sciences [60], medicine [63], and commercial decision-making domains like retail [59], finance [56], and energy [49, 48]. Traditional approaches in this field have primarily focused on parametric statistical models, utilizing domain expertise-driven techniques such as autoregressive models [13], exponential smoothing [23], and other methods that heavily relied on decomposing time series [8]. However, the advent of modern ML methods has introduced data-driven approaches for capturing temporal dynamics [37]. Among these methods, deep learning (DL) has gained significant traction, inspired by its remarkable achievements in areas like image classification [31], natural language processing [66], and reinforcement learning [34]. Deep neural networks, with their customized architectural assumptions or inductive biases [10], can effectively learn intricate data representations, eliminating the need for manual feature engineering and model design. The availability of open-source backpropagation frameworks [46, 1] has further simplified network training, allowing for flexible customization of network components and loss functions.


MLflow Empowering AI Training. MLflow is an open-source platform to…

#artificialintelligence

Artificial intelligence (AI) is intelligence -- perceiving, synthesizing, and inferring information -- demonstrated by machines. Today, AI is no longer profound technology in a science lab. Instead, it is at amateurs' fingertips to create decent artwork, generate sophisticated conversation, and perform other intelligent tasks using DALL·E, Stable Diffusion, GPT-3, ChatGPT, Point·E, Whisper, etc. Have you ever wondered how a realistic image is generated by a natural language description? The intelligence comes from Machine Learning (ML), the study of computer algorithms that can improve automatically through experience and by the use of data. These textbook algorithms are publicly available and ready to be used.


Announcing PyCaret 3.0 -- An open-source, low-code machine learning library in Python

#artificialintelligence

PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows. It is an end-to-end machine learning and model management tool that exponentially speeds up the experiment cycle and makes you more productive. Compared with the other open-source machine learning libraries, PyCaret is an alternate low-code library that can be used to replace hundreds of lines of code with a few lines only. This makes experiments exponentially fast and efficient. PyCaret is essentially a Python wrapper around several machine learning libraries and frameworks in Python.


Managing Machine Learning Lifecycles with MLflow

#artificialintelligence

Model development and experimentation is part of any machine learning lifecycle. However, without careful planning, keeping track of experiments can become tedious and challenging; especially given the number of configurations we typically deal with. MLflow is a machine learning lifecycle framework that allows ML engineers and teams to keep track of their experiments. In PART 1 of the series, we are going to focus on the first two steps -- tracking experiments and sharing code. PART 2 will be dedicated to model packaging, while PART 3 will show how the concepts outlined in the previous parts can be used in a React web application. For now, let's try to understand what MLflow is, and what it can do for us!


Act like a Machine Learning Pro in Simple Way (PyCaret + mlflow)

#artificialintelligence

Build your own ML lab and become a ML Professional to your boss in a simple way. Machine learning (ML) has been well known for a while, since a massive amount of companies want to merge their business with AI or Data Science related. Along with the data project, analysis, the funniest part would be the machine learning model.


Who needs MLflow when you have SQLite?

#artificialintelligence

I spent about six years working as a data scientist and tried to use MLflow several times (and others as well) to track my experiments; however, every time I tried using it, I abandoned it a few days after. There were a few things I didn't like: it seemed too much to have to start a web server to look at my experiments, and I found the query feature extremely limiting (if my experiments are stored in a SQL table, why not allow me to query them with SQL). I also found comparing the experiments limited. I rarely have a project where a single (or a couple of) metric(s) is enough to evaluate a model. It's mostly a combination of metrics and evaluation plots that I need to look at to assess a model.


Learn to Streamline Your Machine Learning Workflow with MLFlow

#artificialintelligence

MLflow Pipelines also implement a cache-aware executor for pipeline steps. This ensures that steps are only carried out when there has been a change in the corresponding code or configuration. In addition, executing pipelines and examining their output can be done via APIs and a command line interface (CLI) provided by MLflow.


Complete MLOps Bootcamp

#artificialintelligence

If you're looking for a comprehensive, hands-on, and project-based guide to learning MLOps (Machine Learning Operations), you've come to the right place. According to an Algorithmia survey, 85% of Machine Learning projects do not reach production. In addition, the MLOps have exponentially grown in the last years. MLOPS was estimated at $23.2 billion for 2019 and is projected to reach $126 billion by 2025. Therefore, MLOps knowledge will give you numerous professional opportunities.


How to Package and Distribute Machine Learning Models with MLFlow - KDnuggets

#artificialintelligence

One of the fundamental activities during each stage of the ML model life cycle development is collaboration. Taking an ML model from its conception to deployment requires participation and interaction between different roles involved in constructing the model. In addition, the nature of ML model development involves experimentation, tracking of artifacts and metrics, model versions, etc., which demands an effective organization for the correct maintenance of the ML model life cycle. Fortunately, there are tools for developing and maintaining a model's life cycle, such as MLflow. In this article, we will break down MLflow, its main components, and its characteristics.