custom metric
CADRE: Customizable Assurance of Data Readiness in Privacy-Preserving Federated Learning
Hiniduma, Kaveen, Li, Zilinghan, Sinha, Aditya, Madduri, Ravi, Byna, Suren
Privacy-Preserving Federated Learning (PPFL) is a decentralized machine learning approach where multiple clients train a model collaboratively. PPFL preserves the privacy and security of a client's data without exchanging it. However, ensuring that data at each client is of high quality and ready for federated learning (FL) is a challenge due to restricted data access. In this paper, we introduce CADRE (Customizable Assurance of Data Readiness) for federated learning (FL), a novel framework that allows users to define custom data readiness (DR) metrics, rules, and remedies tailored to specific FL tasks. CADRE generates comprehensive DR reports based on the user-defined metrics, rules, and remedies to ensure datasets are prepared for FL while preserving privacy. We demonstrate a practical application of CADRE by integrating it into an existing PPFL framework. We conducted experiments across six datasets and addressed seven different DR issues. The results illustrate the versatility and effectiveness of CADRE in ensuring DR across various dimensions, including data quality, privacy, and fairness. This approach enhances the performance and reliability of FL models as well as utilizes valuable resources.
MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering
Qiang, Rushi, Zhuang, Yuchen, Li, Yinghao, K, Dingu Sagar V, Zhang, Rongzhi, Li, Changhao, Wong, Ian Shu-Hei, Yang, Sherry, Liang, Percy, Zhang, Chao, Dai, Bo
We introduce MLE-Dojo, a Gym-style framework for systematically reinforcement learning, evaluating, and improving autonomous large language model (LLM) agents in iterative machine learning engineering (MLE) workflows. Unlike existing benchmarks that primarily rely on static datasets or single-attempt evaluations, MLE-Dojo provides an interactive environment enabling agents to iteratively experiment, debug, and refine solutions through structured feedback loops. Built upon 200+ real-world Kaggle challenges, MLE-Dojo covers diverse, open-ended MLE tasks carefully curated to reflect realistic engineering scenarios such as data processing, architecture search, hyperparameter tuning, and code debugging. Its fully executable environment supports comprehensive agent training via both supervised fine-tuning and reinforcement learning, facilitating iterative experimentation, realistic data sampling, and real-time outcome verification. Extensive evaluations of eight frontier LLMs reveal that while current models achieve meaningful iterative improvements, they still exhibit significant limitations in autonomously generating long-horizon solutions and efficiently resolving complex errors. Furthermore, MLE-Dojo's flexible and extensible architecture seamlessly integrates diverse data sources, tools, and evaluation protocols, uniquely enabling model-based agent tuning and promoting interoperability, scalability, and reproducibility. We open-source our framework and benchmarks to foster community-driven innovation towards next-generation MLE agents.
Jury: Evaluating performance of NLG models
Jury is an evaluation package for NLG systems. It allows using many metrics in one go. Also, it implements concurrency among evaluation metrics and supports evaluating with multiple predictions. Jury uses datasets package for metrics, and thus supports any metrics that datasets package has. Default evaluation metrics are, BLEU, METEOR and ROUGE-L.
Keras Metrics: Everything You Need To Know
Keras metrics are functions that are used to evaluate the performance of your deep learning model. Choosing a good metric for your problem is usually a difficult task. Lucky for you, this article explains all that! In Keras, metrics are passed during the compile stage as shown below. You can pass several metrics by comma separating them.
Good Data Analysis ML Universal Guides Google Developers
Deriving truth and insight from a pile of data is a powerful but error-prone job. The best data analysts and data-minded engineers develop a reputation for making credible pronouncements from data. But what are they doing that gives them credibility? I often hear adjectives like careful and methodical, but what do the most careful and methodical analysts actually do? This is not a trivial question, especially given the type of data that we regularly gather at Google. Not only do we typically work with very large data sets, but those data sets are extremely rich. That is, each row of data typically has many, many attributes. When you combine this with the temporal sequences of events for a given user, there are an enormous number of ways of looking at the data.
How to Use Metrics for Deep Learning with Keras in Python - Machine Learning Mastery
The Keras library provides a way to calculate and report on a suite of standard metrics when training deep learning models. In addition to offering standard metrics for classification and regression problems, Keras also allows you to define and report on your own custom metrics when training deep learning models. This is particularly useful if you want to keep track of a performance measure that better captures the skill of your model during training. In this tutorial, you will discover how to use the built-in metrics and how to define and use your own metrics when training deep learning models in Keras. Metrics and How to Use Custom Metrics for Deep Learning with Keras in Python Photo by Indi Samarajiva, some rights reserved.