Data Dimensionality Reduction in the Age of Machine Learning - DATAVERSITY

#artificialintelligence

Click to learn more about author Rosaria Silipo. Machine Learning is all the rage as companies try to make sense of the mountains of data they are collecting. Data is everywhere and proliferating at unprecedented speed. But, more data is not always better. In fact, large amounts of data can not only considerably slow down the system execution but can sometimes even produce worse performances in Data Analytics applications.


Seven Techniques for Data Dimensionality Reduction

@machinelearnbot

The recent explosion of data set size, in number of records and attributes, has triggered the development of a number of big data platforms as well as parallel data analytics algorithms. At the same time though, it has pushed for usage of data dimensionality reduction procedures. Indeed, more is not always better. Large amounts of data might sometimes produce worse performances in data analytics applications. One of my most recent projects happened to be about churn prediction and to use the 2009 KDD Challenge large data set.


?utm_content=bufferffe48&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer

@machinelearnbot

This powerful quote by William Shakespeare applies well to techniques used in data science & analytics as well. Allow me to prove it using a short story. In May ' 2015, we conducted a Data Hackathon ( a data science competition) in Delhi-NCR, India. We gave participants the challenge to identify Human Activity Recognition Using Smartphones Data Set. The data set had 561 variables for training model used for the identification of Human activity in test data set.


Beginners Guide To Learn Dimension Reduction Techniques

@machinelearnbot

This powerful quote by William Shakespeare applies well to techniques used in data science & analytics as well. Allow me to prove it using a short story. In May ' 2015, we conducted a Data Hackathon ( a data science competition) in Delhi-NCR, India. We gave participants the challenge to identify Human Activity Recognition Using Smartphones Data Set. The data set had 561 variables for training model used for the identification of Human activity in test data set.


Survival of the Fittest in PlayerUnknown BattleGround

arXiv.org Machine Learning

The goal of this paper was to predict the placement in the multiplayer game PUBG (playerunknown battleground). In the game, up to one hundred players parachutes onto an island and scavenge for weapons and equipment to kill others, while avoiding getting killed themselves. The available safe area of the game map decreases in size over time, directing surviving players into tighter areas to force encounters. The last player or team standing wins the round. In this paper specifically, we have tried to predict the placement of the player in the ultimate survival test. The data set has been taken from Kaggle. Entire dataset has 29 attributes which are categories to 1 label(winPlacePerc), training set has 4.5 million instances and testing set has 1.9 million. winPlacePerc is continuous category, which makes it harder to predict the survival of the fittest. To overcome this problem, we have applied multiple machine learning models to find the optimum prediction. Model consists of LightGBM Regression (Light Gradient Boosting Machine Regression), MultiLayer Perceptron, M5P (improvement on C4.5) and Random Forest. To measure the error rate, Mean Absolute Error has been used. With the final prediction we have achieved MAE of 0.02047, 0.065, 0.0592 and 0634 respectively.