Goto

Collaborating Authors

 random forest classifier


SM.1 Omittedproofs SM.1.1 ProofofProposition1 Proposition1. ThefunctionmC() = 2C(Mϵ()): X [1,c]satisfiesallpropertiesofapredictive multiplicitymetricinDefinition1

Neural Information Processing Systems

For clarity, we assume|Mϵ(xi)| = m. By the information inequality [1, Theorem 2.6.3] the mutual informationI(M;Y) between the random variablesM and Y (defined in Section 3) is non-negative, i.e.,I(M;Y) 0. On the other hand, we denote the c models in R(H,ϵ) which output scores are the "vertices" of c to be m1,,mc, then H(Y|M = mk) = 0, k [c]. H(Y|M) is minimized to 0 by setting the weightspm on those c models to be 1c and the rest to be0. Since this holds for the capacity-achievingPM, which in turn is the maximimum across input distributions,theconverseresultfollows. Theconsequence ofpredictivemultiplicity isthatthe sameindividual can betreated differently due toarbitrary and unjustified choices made during the training process (e.g., parameter initialization, random seed, dropoutprobability,etc.).




A Multilayered Approach to Classifying Customer Responsiveness and Credit Risk

Afolabi, Ayomide, Ogburu, Ebere, Kimitei, Symon

arXiv.org Machine Learning

AB S TRACT This study evaluates the performance of various classifiers in three distinct models: r esponse, r isk, and r esponse - r isk, concerning credit card mail campaigns and default prediction. In the r esponse model, the Extra Trees classifier demonstrates the highest recall level (79.1%), emphasizing its effectiveness in identifying potential responders to targeted credit card offers. Conversely, in the r isk model, the Random Forest classifier exhibits remarkable specificity of 84.1%, crucial for identifying customers least likely to default. Furthermore, in the multi - class r esponse - r isk model, the Random Forest classifier achieve s the highest accuracy (83.2%), indicating its efficacy in discerning both potential responders to credit card mail campaign and low - risk credit card users . In this study, we optimized various performance metrics to solve a specific credit risk and mail responsiveness business problem.


Modelling the Doughnut of social and planetary boundaries with frugal machine learning

Vrizzi, Stefano, O'Neill, Daniel W.

arXiv.org Artificial Intelligence

The 'Doughnut' of social and planetary boundaries has emerged as a popular framework for assessing environmental and social sustainability. Here, we provide a proof-of-concept analysis that shows how machine learning (ML) methods can be applied to a simple macroeconomic model of the Doughnut. First, we show how ML methods can be used to find policy parameters that are consistent with 'living within the Doughnut'. Second, we show how a reinforcement learning agent can identify the optimal trajectory towards desired policies in the parameter space. The approaches we test, which include a Random Forest Classifier and $Q$-learning, are frugal ML methods that are able to find policy parameter combinations that achieve both environmental and social sustainability. The next step is the application of these methods to a more complex ecological macroeconomic model.




Supplementary Materials Rashomon Capacity: A Metric for Predictive Multiplicity in Classification

Neural Information Processing Systems

(since we pick the log base to be 2). We now prove the converse statements. Individual fairness aims to ensure that "similar individuals are treated similarly." Predictive multiplicity allows different predictions from competing classifiers for the samples. Notably, neural networks with very narrows or wide layers have better reproducibility in their decision regions. The fact that multiple classifiers may yield distinct predictions to a target a sample while having statistically identical average loss performance can also cause security issues in machine learning.


Disjoint Generative Models

Lautrup, Anton Danholt, Rajabinasab, Muhammad, Hyrup, Tobias, Zimek, Arthur, Schneider-Kamp, Peter

arXiv.org Artificial Intelligence

We propose a new framework for generating cross-sectional synthetic datasets via disjoint generative models. In this paradigm, a dataset is partitioned into disjoint subsets that are supplied to separate instances of generative models. The results are then combined post hoc by a joining operation that works in the absence of common variables/identifiers. The success of the framework is demonstrated through several case studies and examples on tabular data that helps illuminate some of the design choices that one may make. The principal benefit of disjoint generative models is significantly increased privacy at only a low utility cost. Additional findings include increased effectiveness and feasibility for certain model types and the possibility for mixed-model synthesis.


A Hybrid Machine Learning Framework for Optimizing Crop Selection via Agronomic and Economic Forecasting

Sindhur, Niranjan Mallikarjun, C, Pavithra, Muchikel, Nivya

arXiv.org Artificial Intelligence

Farmers in developing regions like Karnataka, India, face a dual challenge: navigating extreme market and climate volatility while being excluded from the digital revolution due to literacy barriers. This paper presents a novel decision support system that addresses both challenges through a unique synthesis of machine learning and human-computer interaction. We propose a hybrid recommendation engine that integrates two predictive models: a Random Forest classifier to assess agronomic suitability based on soil, climate, and real-time weather data, and a Long Short-Term Memory (LSTM) network to forecast market prices for agronomically viable crops. This integrated approach shifts the paradigm from "what can grow?" to "what is most profitable to grow?", providing a significant advantage in mitigating economic risk. The system is delivered through an end-to-end, voice-based interface in the local Kannada language, leveraging fine-tuned speech recognition and high-fidelity speech synthesis models to ensure accessibility for low-literacy users. Our results show that the Random Forest model achieves 98.5% accuracy in suitability prediction, while the LSTM model forecasts harvest-time prices with a low margin of error. By providing data-driven, economically optimized recommendations through an inclusive interface, this work offers a scalable and impactful solution to enhance the financial resilience of marginalized farming communities.