In recent years, we've seen a resurgence in AI, or artificial intelligence, and machine learning. Machine learning has led to some amazing results, like being able to analyze medical images and predict diseases on-par with human experts. Google's AlphaGo program was able to beat a world champion in the strategy game go using deep reinforcement learning. Machine learning is even being used to program self driving cars, which is going to change the automotive industry forever. Imagine a world with drastically reduced car accidents, simply by removing the element of human error.
In 2019, AWS unveiled Amazon SageMaker Debugger, a SageMaker capability that enables you to automatically detect a variety of issues that may arise while a model is being trained. SageMaker Debugger captures model state data at specified intervals during a training job. With this data, SageMaker Debugger can detect training issues or anomalies by leveraging built-in or user-defined rules. In addition to detecting issues during the training job, you can analyze the captured state data afterwards to evaluate model performance and identify areas for improvement. This task is made easier with the newly launched XGBoost training report feature.
There are so many machine learning algorithms out there, how do you choose the best one for your problem? This question is going to have a different response based on the application and the data. Is it classification, regression, supervised, unsupervised, natural language processing, time series? There are so many avenues to take but in this article I am going to focus on on algorithm that I particularly find very interesting, XGBoost. XGBoost stands for extreme gradient boosting and is an open source library that provides an efficient and effective implementation of gradient boosting.
As we make tremendous advances in machine learning and artificial intelligence technosciences, there is a renewed understanding in the AI community that we must ensure that humans being are at the center of our deliberations so that we don't end in technology-induced dystopias. As strongly argued by Green in his book Smart Enough City, the incorporation of technology in city environs does not automatically translate into prosperity, wellbeing, urban livability, or social justice. There is a great need to deliberate on the future of the cities worth living and designing. There are philosophical and ethical questions involved along with various challenges that relate to the security, safety, and interpretability of AI algorithms that will form the technological bedrock of future cities. Several research institutes on human centered AI have been established at top international universities. Globally there are calls for technology to be made more humane and human-compatible. For example, Stuart Russell has a book called Human Compatible AI. The Center for Humane Technology advocates for regulators and technology companies to avoid business models and product features that contribute to social problems such as extremism, polarization, misinformation, and Internet addiction. In this paper, we analyze and explore key challenges including security, robustness, interpretability, and ethical challenges to a successful deployment of AI or ML in human-centric applications, with a particular emphasis on the convergence of these challenges. We provide a detailed review of existing literature on these key challenges and analyze how one of these challenges may lead to others or help in solving other challenges. The paper also advises on the current limitations, pitfalls, and future directions of research in these domains, and how it can fill the current gaps and lead to better solutions.
Marvel's Avengers: Endgame recently dethroned Avatar as the highest grossing movie in history and while there was no doubt about this movie becoming very successful, I want to understand what makes any given movie a success. I am using data from The Movie Database provided through kaggle. The data set is split into a train and test set with the train set containing 3,000 movies and the test set comprising 4,398. The train data set also contains the target variable revenue. Prequels and Sequels: Maybe unsurprisingly, movies that are either prequels or sequels to related movies earn on average a higher revenue than standalone movies.
Tennis is a popular sport worldwide, boasting millions of fans and numerous national and international tournaments. Like many sports, tennis has benefit ted from the popularity of rigorous record - keeping of game and player information, as well as the growth of machine learning methods for use in sports analytics. Of particular interest to bettors and betting companies alike is potential use of sports records to predict tennis match outcomes prior to match start. We compiled, cleaned, and used th e largest database of tennis match information to date to predict match outcome using fairly simple machine learning methods. Using such methods allows for rapid fit and prediction times to readily incorporate new data and make real - time predictions. We were able to predict match outcomes with upwards of 80% accuracy, much greater than predictions using betting odds alone, an d identify serve strength as a key predictor of match outcome. By combining prediction accuracies from three models, we were able t o nearly recreate a probability distribution based on average betting odds from betting companies, which indicates that betting companies are using similar information to assign odds to matches. These results demonstrate the capability of relatively simpl e machine learning models to quite accurately predict tennis match outcomes.
The goal of this paper was to predict the placement in the multiplayer game PUBG (playerunknown battleground). In the game, up to one hundred players parachutes onto an island and scavenge for weapons and equipment to kill others, while avoiding getting killed themselves. The available safe area of the game map decreases in size over time, directing surviving players into tighter areas to force encounters. The last player or team standing wins the round. In this paper specifically, we have tried to predict the placement of the player in the ultimate survival test. The data set has been taken from Kaggle. Entire dataset has 29 attributes which are categories to 1 label(winPlacePerc), training set has 4.5 million instances and testing set has 1.9 million. winPlacePerc is continuous category, which makes it harder to predict the survival of the fittest. To overcome this problem, we have applied multiple machine learning models to find the optimum prediction. Model consists of LightGBM Regression (Light Gradient Boosting Machine Regression), MultiLayer Perceptron, M5P (improvement on C4.5) and Random Forest. To measure the error rate, Mean Absolute Error has been used. With the final prediction we have achieved MAE of 0.02047, 0.065, 0.0592 and 0634 respectively.
Ensemble methods are meta-algorithms that combine several machine learning techniques into one predictive model in order to decrease variance (bagging), bias (boosting), or improve predictions (stacking). Ensemble learning helps improve machine learning results by combining several models. This approach allows the production of better predictive performance compared to a single model. That is why ensemble methods placed first in many prestigious machine learning competitions, such as the Netflix Competition, KDD 2009, and Kaggle. The Statsbot team wanted to give you the advantage of this approach and asked a data scientist, Vadim Smolyakov, to dive into three basic ensemble learning techniques.
When you want to purchase a new car, will you walk up to the first car shop and purchase one based on the advice of the dealer? You would likely browser a few web portals where people have posted their reviews and compare different car models, checking for their features and prices. You will also probably ask your friends and colleagues for their opinion. In short, you wouldn't directly reach a conclusion, but will instead make a decision considering the opinions of other people as well. Ensemble models in machine learning operate on a similar idea. They combine the decisions from multiple models to improve the overall performance.
Machine Learning (ML) is one of the most exciting and dynamic areas of modern research and application. The purpose of this review is to provide an introduction to the core concepts and tools of machine learning in a manner easily understood and intuitive to physicists. The review begins by covering fundamental concepts in ML and modern statistics such as the bias-variance tradeoff, overfitting, regularization, and generalization before moving on to more advanced topics in both supervised and unsupervised learning. Topics covered in the review include ensemble models, deep learning and neural networks, clustering and data visualization, energy-based models (including MaxEnt models and Restricted Boltzmann Machines), and variational methods. Throughout, we emphasize the many natural connections between ML and statistical physics. A notable aspect of the review is the use of Python notebooks to introduce modern ML/statistical packages to readers using physics-inspired datasets (the Ising Model and Monte-Carlo simulations of supersymmetric decays of proton-proton collisions). We conclude with an extended outlook discussing possible uses of machine learning for furthering our understanding of the physical world as well as open problems in ML where physicists maybe able to contribute. (Notebooks are available at https://physics.bu.edu/~pankajm/MLnotebooks.html )