Goto

Collaborating Authors

 deviceprotection


ALICE: Combining Feature Selection and Inter-Rater Agreeability for Machine Learning Insights

arXiv.org Machine Learning

The use of Machine Learning models for decision-making has become the new norm not only in tech but any business field imaginable, covering any possible task at hand be it search engine recommendations, customer churn prediction, credit risk scoring, energy load forecasting, or the deployment of personalized AI assistants. This comes at a time when developing ML models has become increasingly easier with the rise of open-source, free and user-friendly Python libraries such as Keras, scikit-learn, PyTorch and as generative AI-based conversational chatbots such as ChatGPT, Gemini and Claude that can provide coding assistance -- if not ready-made code for modeling -- are evolving rapidly. Such developments yet again beg the question of interpretability in machine learning, which has been formulated in various ways in literature and been offered multiple proposed solutions such as exploring causality (see Section 2.1), explainability (see Section 2.2) or abandoning black box ML models altogether. But to make a philosophical argument, it is hard to see the benefits of highly model or domain-specific, post-hoc, or complex solutions to obtain insights into the inner-doings of machine learning models when the modeling task itself is growing ever more accessible to laypeople. Common thought on categorizing ML models in this regard would argue that parametric models descending from the fields of statistics and econometrics such as Linear or Logistic Regression are by nature more interpretable than their data-driven and non-parametric counterparts such as tree-based models or neural networks.


Churn Prediction Using Machine Learning

#artificialintelligence

One of the most famous and useful case studies of churn prediction is in the telecom industry. It is important for telecom companies to analyze all relevant customer data and develop a robust and accurate Churn Prediction model to retain customers and to form strategies for reducing customer attrition rates. In this project, Telco Customer Churn Dataset which is available at Kaggle is used. Two numerical columns: 1. MonthlyCharges: The amount charged to the customer monthly 2. TotalCharges: The total amount charged to the customer Eighteen categorical columns: 1. CustomerID: Customer ID unique for each customer 2. gender: Whether the customer is a male or a female 3. SeniorCitizen: Whether the customer is a senior citizen or not (1, 0) 4. Partner: Whether the customer has a partner or not (Yes, No) 5. Dependents: Whether the customer has dependents or not (Yes, No) 6. Tenure: Number of months the customer has stayed with the company 7. PhoneService: Whether the customer has a phone service or not (Yes, No) 8. MultipleLines: Whether the customer has multiple lines or not (Yes, No, No phone service) 9. InternetService: Customer's internet service provider (DSL, Fiber optic, No) 10. OnlineSecurity: Whether the customer has online security or not (Yes, No, No internet service) 11.