Goto

Collaborating Authors

 gridsearchcv



Sentiment Classification of Thai Central Bank Press Releases Using Supervised Learning

arXiv.org Artificial Intelligence

Central bank communication plays a critical role in shaping economic expectations and monetary policy effectiveness. This study applies supervised machine learning techniques to classify the sentiment of press releases from the Bank of Thailand, addressing gaps in research that primarily focus on lexicon-based approaches. My findings show that supervised learning can be an effective method, even with smaller datasets, and serves as a starting point for further automation. However, achieving higher accuracy and better generalization requires a substantial amount of labeled data, which is time-consuming and demands expertise. Using models such as Na\"ive Bayes, Random Forest and SVM, this study demonstrates the applicability of machine learning for central bank sentiment analysis, with English-language communications from the Thai Central Bank as a case study.


Movie Revenue Prediction using Machine Learning Models

arXiv.org Artificial Intelligence

In the contemporary film industry, accurately predicting a movie's earnings is paramount for maximizing profitability. This project aims to develop a machine learning model for predicting movie earnings based on input features like the movie name, the MPAA rating of the movie, the genre of the movie, the year of release of the movie, the IMDb Rating, the votes by the watchers, the director, the writer and the leading cast, the country of production of the movie, the budget of the movie, the production company and the runtime of the movie. Through a structured methodology involving data collection, preprocessing, analysis, model selection, evaluation, and improvement, a robust predictive model is constructed. Linear Regression, Decision Trees, Random Forest Regression, Bagging, XGBoosting and Gradient Boosting have been trained and tested. Model improvement strategies include hyperparameter tuning and cross-validation. The resulting model offers promising accuracy and generalization, facilitating informed decision-making in the film industry to maximize profits.


Implementing Custom GridSearchCV and RandomSearchCV without scikit-learn

#artificialintelligence

Scikit-Learn offers two vehicles for optimizing hyperparameter tuning: GridSearchCV and RandomizedSearchCV. GridSearchCV performs an exhaustive search over specified parameter values for an estimator (or machine learning algorithm) and returns the best performing hyperparametric combination. So, all we need to do is specify the hyperparameters with which we want to experiment and their range of values, and GridSearchCV performs all possible combinations of hyperparameter values using cross-validation. As such, we naturally limit our choice of hyperparameters and their range of values. Theoretically, we can specify a set of parameter values for ALL hyperparameters of a model, but such a search consumes vast computer resources and time.


How to Choose n_estimators in Random Forest ? Get Solution

#artificialintelligence

Are you looking for how to choose n_estimators in the random forest? Actually, n_estimators defines in the underline decision tree in Random Forest. See! the Random Forest algorithms is a bagging Technique. Where we ensemble many weak learn to decrease the variance. The n_estimators is a hyperparameter for Random Forest.


E-Commerce Customer Churn Prediction - Analytics Vidhya

#artificialintelligence

This article was published as a part of the Data Science Blogathon. Customer churn or attrition is one of the most crucial problems for any business that directly sells or serves customers Be it Telecom service providers, eCommerce or SaaS businesses it is important to track and analyse how many customers are leaving the platform and how many are sticking and the reasons behind them. Knowing customer behaviour can greatly enhance decision-making processes and can further help reduce churn to improve profitability. In this article, we are going to analyse an eCommerce dataset and find the best model to predict customer churn. But before delving into analysis let's have a brief look at what is churn Customer churn can be defined as the rate at which customers leave a platform or service.


Kaggle Master with Heart Attack Prediction Kaggle Project

#artificialintelligence

Kaggle Master with Heart Attack Prediction Kaggle Project - Kaggle is Machine Learning & Data Science community. Become Kaggle master with real machine learning kaggle project Preview this Course Kaggle, a subsidiary of Google LLC, is an online community of data scientists and machine learning practitioners. Kaggle is a platform where data scientists can compete in machine learning challenges. These challenges can be anything from predicting housing prices to detect Machine learning describes systems that make predictions using a model trained on real-world data. Machine learning is constantly being applied to new industries and ne Data science includes preparing, analyzing, and processing data.


Hyperparameter Tuning of Decision Tree Classifier Using GridSearchCV

#artificialintelligence

The models can have many hyperparameters and finding the best combination of the parameter using grid search methods. Grid search is a technique for tuning hyperparameter that may facilitate build a model and evaluate a model for every combination of algorithms parameters per grid. We might use 10 fold cross-validation to search the best value for that tuning hyperparameter. These values are called hyperparameters. To get the simplest set of hyperparameters we will use the Grid Search method.


How to Check if a Classification Model is Overfitted using scikit-learn

#artificialintelligence

One of the hardest problems, when dealing with Machine Learning algorithms, is evaluating whether the trained model performs well with unseen samples. For example, it may happen that a model behaves very well with a given dataset, but it is not able to predict the correct values, when deployed. This discordance between the trained and testing data can be due to different problems. One of the most common problems is overfitting. A model thats fits the training set well but testing set poorly is said to be overfit to the training set and a model that fits both sets poorly is said to be underfit.


Analyzing Boston housing dataset

#artificialintelligence

I hope you are all safe and healthy. It's also been a while since I've gotten my hands dirty in writing scripts and analyzing data. So, I'll start with something kinda light for me -- analyzing one of the go-to datasets for projects and demos, the Boston housing dataset. The Boston housing dataset contains information collected by the U.S Census Service concerning housing in the area of Boston Massachusetts. It has 506 samples and 14 variables.