Goto

Collaborating Authors

 sklearn


Non-Negative Matrix Factorization Using Non-Von Neumann Computers

Borle, Ajinkya, Nicholas, Charles, Chukwu, Uchenna, Miri, Mohammad-Ali, Chancellor, Nicholas

arXiv.org Artificial Intelligence

Non-negative matrix factorization (NMF) is a matrix decomposition problem with applications in unsupervised learning. The general form of this problem (along with many of its variants) is NP-hard in nature. In our work, we explore how this problem could be solved with an energy-based optimization method suitable for certain machines with non-von Neumann architectures. We used the Dirac-3, a device based on the entropy computing paradigm and made by Quantum Computing Inc., to evaluate our approach. Our formulations consist of (i) a quadratic unconstrained binary optimization model (QUBO, suitable for Ising machines) and a quartic formulation that allows for real-valued and integer variables (suitable for machines like the Dirac-3). Although current devices cannot solve large NMF problems, the results of our preliminary experiments are promising enough to warrant further research. For non-negative real matrices, we observed that a fusion approach of first using Dirac-3 and then feeding its results as the initial factor matrices to Scikit-learn's NMF procedure outperforms Scikit-learn's NMF procedure on its own, with default parameters in terms of the error in the reconstructed matrices. For our experiments on non-negative integer matrices, we compared the Dirac-3 device to Google's CP-SAT solver (inside the Or-Tools package) and found that for serial processing, Dirac-3 outperforms CP-SAT in a majority of the cases. We believe that future work in this area might be able to identify domains and variants of the problem where entropy computing (and other non-von Neumann architectures) could offer a clear advantage.


Best Practices for Machine Learning Experimentation in Scientific Applications

Michelucci, Umberto, Venturini, Francesca

arXiv.org Artificial Intelligence

Machine learning (ML) is increasingly adopted in scientific research, yet the quality and reliability of results often depend on how experiments are designed and documented. Poor baselines, inconsistent preprocessing, or insufficient validation can lead to misleading conclusions about model performance. This paper presents a practical and structured guide for conducting ML experiments in scientific applications, focussing on reproducibility, fair comparison, and transparent reporting. We outline a step-by-step workflow, from dataset preparation to model selection and evaluation, and propose metrics that account for overfitting and instability across validation folds, including the Logarithmic Overfitting Ratio (LOR) and the Composite Overfitting Score (COS). Through recommended practices and example reporting formats, this work aims to support researchers in establishing robust baselines and drawing valid evidence-based insights from ML models applied to scientific problems.


Control Variates for Slate Off-Policy Evaluation: Supplementary Text Nikos Vlassis Netflix Ashok Chandrashekar WarnerMedia Fernando Amat Gil Netflix Nathan Kallus Cornell University and Netflix

Neural Information Processing Systems

To train the lasso models we used sklearn.linear_model.Lasso (with parameters fit_intercept = False, max_iter = 500, tol = 1e-4, normalize = False, precompute = False, copy_X = False, The author was with Netflix when this work was concluded. Results are qualitatively very similar to what we obtained with tree-based models.


Lightweight Baselines for Medical Abstract Classification: DistilBERT with Cross-Entropy as a Strong Default

Liu, Jiaqi, Wang, Tong, Liu, Su, Hu, Xin, Tong, Ran, Wang, Lanruo, Xu, Jiexi

arXiv.org Artificial Intelligence

The research evaluates lightweight medical abstract classification methods to establish their maximum performance capabilities under financial budget restrictions. On the public medical abstracts corpus, we finetune BERT base and Distil BERT with three objectives cross entropy (CE), class weighted CE, and focal loss under identical tokenization, sequence length, optimizer, and schedule. DistilBERT with plain CE gives the strongest raw argmax trade off, while a post hoc operating point selection (validation calibrated, classwise thresholds) sub stantially improves deployed performance; under this tuned regime, focal benefits most. We report Accuracy, Macro F1, and WeightedF1, release evaluation artifacts, and include confusion analyses to clarify error structure. The practical takeaway is to start with a compact encoder and CE, then add lightweight calibration or thresholding when deployment requires higher macro balance.


Control Variates for Slate Off-Policy Evaluation: Supplementary Text Nikos Vlassis Netflix Ashok Chandrashekar WarnerMedia Fernando Amat Gil Netflix Nathan Kallus Cornell University and Netflix

Neural Information Processing Systems

To train the lasso models we used sklearn.linear_model.Lasso (with parameters fit_intercept = False, max_iter = 500, tol = 1e-4, normalize = False, precompute = False, copy_X = False, The author was with Netflix when this work was concluded. Results are qualitatively very similar to what we obtained with tree-based models.




Cluster Workload Allocation: A Predictive Approach Leveraging Machine Learning Efficiency

Sliwko, Leszek

arXiv.org Artificial Intelligence

This research investigates how Machine Learning (ML) algorithms can assist in workload allocation strategies by detecting tasks with node affinity operators (referred to as constraint operators), which constrain their execution to a limited number of nodes. Using real-world Google Cluster Data (GCD) workload traces and the AGOCS framework, the study extracts node attributes and task constraints, then analyses them to identify suitable node-task pairings. It focuses on tasks that can be executed on either a single node or fewer than a thousand out of 12.5k nodes in the analysed GCD cluster. Task constraint operators are compacted, pre-processed with one-hot encoding, and used as features in a training dataset. Various ML classifiers, including Artificial Neural Networks, K-Nearest Neighbours, Decision Trees, Naive Bayes, Ridge Regression, Adaptive Boosting, and Bagging, are fine-tuned and assessed for accuracy and F1-scores. The final ensemble voting classifier model achieved 98% accuracy and a 1.5-1.8% misclassification rate for tasks with a single suitable node.


Reinforcement Learning for Machine Learning Engineering Agents

Yang, Sherry, He-Yueya, Joy, Liang, Percy

arXiv.org Artificial Intelligence

Existing agents for solving tasks such as ML engineering rely on prompting powerful language models. As a result, these agents do not improve with more experience. In this paper, we show that agents backed by weaker models that improve via reinforcement learning (RL) can outperform agents backed by much larger, but static models. We identify two major challenges with RL in this setting. First, actions can take a variable amount of time (e.g., executing code for different solutions), which leads to asynchronous policy gradient updates that favor faster but suboptimal solutions. To tackle variable-duration actions, we propose duration-aware gradient updates in a distributed asynchronous RL framework to amplify high-cost but high-reward actions. Second, using only test split performance as a reward provides limited feedback. A program that is nearly correct is treated the same as one that fails entirely. To address this, we propose environment instrumentation to offer partial credit, distinguishing almost-correct programs from those that fail early (e.g., during data loading). Environment instrumentation uses a separate static language model to insert print statement to an existing program to log the agent's experimental progress, from which partial credit can be extracted as reward signals for learning. Our experimental results on MLEBench suggest that performing gradient updates on a much smaller model (Qwen2.5-3B) trained with RL outperforms prompting a much larger model (Claude-3.5-Sonnet) with agent scaffolds, by an average of 22% across 12 Kaggle tasks.


Machine Learning Experiences: A story of learning AI for use in enterprise software testing that can be used by anyone

Cohoon, Michael, Furman, Debbie

arXiv.org Artificial Intelligence

This paper details the machine learning (ML) journey of a group of people focused on software testing. It tells the story of how this group progressed through a ML workflow (similar to the CRISP-DM process). This workflow consists of the following steps and can be used by anyone applying ML techniques to a project: gather the data; clean the data; perform feature engineering on the data; splitting the data into two sets, one for training and one for testing; choosing a machine learning model; training the model; testing the model and evaluating the model performance. By following this workflow, anyone can effectively apply ML to any project that they are doing.