Collaborating Authors

Statistical Learning

Secure Collaborative XGBoost on Encrypted Data


Training a machine learning model requires a large quantity of high-quality data. One way to achieve this is to combine data from many different data organizations or data owners. But data owners are often unwilling to share their data with each other due to privacy concerns, which can stem from business competition, or be a matter of regulatory compliance. The question is: how can we mitigate such privacy concerns? Secure collaborative learning enables many data owners to build robust models on their collective data, but without revealing their data to each other.

K-means Clustering from Scratch


Though there are many library implementations of the k-means algorithm in Python, I decided to use only Numpy in order to provide an instructive approach. Numpy is a popular library in Python used for numerical computations. We first create a class called Kmeans and pass a single constructor argumentk to it. This argument is a hyperparameter. Hyperparameters are parameters that are set by the user before training the machine learning algorithm.

Introduction to Machine Learning For Beginners [A to Z] 2020


To provide awareness of the two most integral branches (i.e. To build appropriate neural models from using state-of-the-art python framework. To build neural models from scratch, following step-by-step instructions. To build end - to - end solutions to resolve real-world problems by using appropriate Machine Learning techniques from a pool of techniques available. To use ML evaluation methodologies to compare and contrast supervised and unsupervised ML algorithms using an established machine learning framework.

Council Post: Five Steps To Developing A Data Science Culture


You've developed a platform that's gaining significant customer traction and enabling you to collect vast amounts of transaction and user data. Word gets out about your software, you acquire more users and feature requests start rolling in. As you develop and deliver those new features, you engage more users and collect even more data! There's tremendous value in that data, but limited thinking may be limiting your ability to mine it for the insights you need to further improve your product or even develop new ones that better meet the needs of your user base. Perhaps you've only gotten as far as creating simple plots and histograms around events, fault detection and other simple rules-based alerting and reporting.

On Moving from Statistics to Machine Learning, the Final Stage of Grief


I've spent the last few months preparing for and applying for data science jobs. It's possible the data science world may reject me and my lack of both experience and a credential above a bachelors degree, in which case I'll do something else. Regardless of what lies in store for my future, I think I've gotten a good grasp of the mindset underlying machine learning and how it differs from traditional statistics, so I thought I'd write about it for those who have a similar background to me considering a similar move.1 This post is geared toward people who are excellent at statistics but don't really "get" machine learning and want to understand the gist of it in about 15 minutes of reading. If you have a traditional academic stats backgrounds (be it econometrics, biostatistics, psychometrics, etc.), there are two good reasons to learn more about data science: The world of data science is, in many ways, hiding in plain sight from the more academically-minded quantitative disciplines.

Supporting the Math Behind Supporting Vector Machines!


Support Vector Machine(SVM) is a powerful classifier that works with both linear and non-linear data. If you have a n-dimensional space, then the dimension of the hyperplane will be (n-1). The goal of SVM is to find an optimal hyperplane that best separates our data so that distance from the nearest points in space to itself is maximized. To keep it simple, consider a road, which separates the left, right-side cars, buildings, pedestrians and makes the widest lane as possible. And those cars, buildings, really close to the street are the support vectors.

Artificial Intelligence revolutionizes the insurance industry


Pricing: Through predictive models (with algorithms such as random forest, linear regression, xgboost, etc.), we can provide insurance premiums in a more dynamic and precise way. More specifically, they can be personalized according to driving habits, geographic area and commute distance. To the traditional price-setting variables, a new set of variables are added to improve the profitability of the portfolio. These variables depend on the company's own needs/capacities and can range from competitors' prices to the policyholder's traffic record, driver's license age, credit score, as well as external data systems and sources. The interesting thing here is the dynamism in determining the price; the models change based on data inputted over time, then recognize patterns and adjust the rate autonomously.

2 books to strengthen your command of python machine learning


This post is part of "AI education", a series of posts that review and explore educational content on data science and machine learning. Mastering machine learning is not easy, even if you're a crack programmer. I've seen many people come from a solid background of writing software in different domains (gaming, web, multimedia, etc.) thinking that adding machine learning to their roster of skills is another walk in the park. And every single one of them has been dismayed. I see two reasons for why the challenges of machine learning are misunderstood. First, as the name suggests, machine learning is software that learns by itself as opposed to being instructed on every single rule by a developer.

Data Analytics and Mining for Dummies – Data Science Blog (English only)


Data Analytics and Mining is often perceived as an extremely tricky task cut out for Data Analysts and Data Scientists having a thorough knowledge encompassing several different domains such as mathematics, statistics, computer algorithms and programming. However, there are several tools available today that make it possible for novice programmers or people with no absolutely no algorithmic or programming expertise to carry out Data Analytics and Mining. One such tool which is very powerful and provides a graphical user interface and an assembly of nodes for ETL: Extraction, Transformation, Loading, for modeling, data analysis and visualization without, or with only slight programming is the KNIME Analytics Platform. KNIME, or the Konstanz Information Miner, was developed by the University of Konstanz and is now popular with a large international community of developers. Initially KNIME was originally made for commercial use but now it is available as an open source software and has been used extensively in pharmaceutical research since 2006 and also a powerful data mining tool for the financial data sector. It is also frequently used in the Business Intelligence (BI) sector.