Goto

Collaborating Authors

feature selection


Image classification with FASHION MNIST: why convolutional neural networks outperform traditional…

#artificialintelligence

In the last decade, with the discovery of deep learning, the field of image classification has experienced a renaissance. Traditional machine learning methods have been replaced by newer and more powerful deep learning algorithms, such as the convolutional neural network. However, to truly understand and appreciate deep learning, we must know why does it succeed where the other methods fail. In this article, we try to answer some of those questions, by applying various classification algorithms on the Fashion MNIST dataset. Dataset information Fashion MNIST was introduced in August 2017, by research lab at Zalando Fashion.


Regression Feature Selection using the Kydavra LassoSelector

#artificialintelligence

We all know the Occam's Razor: From a set of solutions took the one that is the simplest. This principle is applied in the regularization of the linear models in Machine Learning. L1-regularisation (also known as LASSO) tend to shrink wights of the linear model to 0 while L2-regularisation (known as Ridge) tend to keep overall complexity as simple as possible, by minimizing the norm of vector weights of the model. One of Kydavra's selectors uses Lasso for selecting the best features. So let's see how to apply it.


Ensemble Feature Selection in Machine Learning by OptimalFlow

#artificialintelligence

Feature selection is a crucial part of the machine learning workflow. How well the features were selected directly related to the model's performance. So I wrote a handful Python library called OptimalFlow with an ensemble feature selection module in it, called autoFS to simplify this process easily. OptimalFlow is an Omni-ensemble Automated Machine Learning toolkit, which is based on Pipeline Cluster Traversal Experiment(PCTE) approach, to help data scientists building optimal models in easy way, and automate Machine Learning workflow with simple codes. You could read another story of its introduction: "An Omni-ensemble Automated Machine Learning -- OptimalFlow".


PyCaret 2.1 is here: What's new? - KDnuggets

#artificialintelligence

We are excited to announce PyCaret 2.1 -- update for the month of Aug 2020. PyCaret is an open-source, low-code machine learning library in Python that automates the machine learning workflow. It is an end-to-end machine learning and model management tool that speeds up the machine learning experiment cycle and makes you 10x more productive. In comparison with the other open-source machine learning libraries, PyCaret is an alternate low-code library that can be used to replace hundreds of lines of code with few words only. This makes experiments exponentially fast and efficient.


State of the Art in Automated Machine Learning

#artificialintelligence

In recent years, machine learning has been very successful in solving a wide range of problems. In particular, neural networks have reached human, and sometimes super-human, levels of ability in tasks such as language translation, object recognition, game playing, and even driving cars. Prevent out-of-control infrastructure and remove blockers to deployments. With this growth in capability has come a growth in complexity. Data scientists and machine learning engineers must perform feature engineering, design model architectures, and optimize hyperparameters. Since the purpose of the machine learning is to automate a task normally done by humans, naturally the next step is to automate the tasks of data scientists and engineers. This area of research is called automated machine learning, or AutoML. There have been many exciting developments in AutoML recently, and it's important to take a look at the current state of the art and learn about what's happening now and what's coming up in the future. InfoQ reached out to the following subject matter experts in the industry to discuss the current state and future trends in AutoML space. InfoQ: What is AutoML and why is it important? Francesca Lazzeri: AutoML is the process of automating the time consuming, iterative tasks of machine learning model development, including model selection and hyperparameter tuning.


The Top 6 Data Science Courses

#artificialintelligence

After writing an article on why everyone uses Kaggle and subsequently doing some further research on Kaggle myself, I realized that there were several data science courses. There are countless companies that offer online courses, but the main reason why I want to describe the top Kaggle courses specifically is because I have used Kaggle the most out of any other platform in terms of learning data science (outside of online courses) -- like viewing code, downloading data, and viewing other Jupyter notebooks. For example, LinkedIn offers courses, but I would rather participate in courses from a website where I have already learned from. I have practiced endless machine learning algorithms and their respective code because of the examples and data on Kaggle. So, why not trust Kaggle with courses to teach data science, as well as improve upon current data science knowledge?


Using Dropout with Neural Networks: Not A Magic Bullet

#artificialintelligence

Overfitting is an issue that occurs when a model shows high accuracy in predicting training data (the data used to build the model), but low accuracy in predicting test data (unseen data that the model has not used before). This can particularly be a problem when it comes to using small datasets in the course of building a neural network. It is possible for the neural network to be of such a size that it "overtrains" on the training data -- and therefore performs poorly when it comes to predicting new data. This is to prevent excessive "noise" in the network that artificially increases the training accuracy, but does not result in any meaningful information being transferred to the output layer -- i.e. any increase in the training accuracy comes from excessive training and not from any useful information from the model features themselves. Dropout renders certain nodes in the network inactive as illustrated in the image at the beginning of this article -- thus forcing the network to look for more meaningful patterns that influence the output layer.


Classifying Breast Cancer Subtypes Using Deep Neural Networks Based on Multi-Omics Data

#artificialintelligence

With the high prevalence of breast cancer, it is urgent to find out the intrinsic difference between various subtypes, so as to infer the underlying mechanisms. Given the available multi-omics data, their proper integration can improve the accuracy of breast cancer subtype recognition. In this study, DeepMO, a model using deep neural networks based on multi-omics data, was employed for classifying breast cancer subtypes. Three types of omics data including mRNA data, DNA methylation data, and copy number variation (CNV) data were collected from The Cancer Genome Atlas (TCGA). After data preprocessing and feature selection, each type of omics data was input into the deep neural network, which consists of an encoding subnetwork and a classification subnetwork. The results of DeepMO based on multi-omics on binary classification are better than other methods in terms of accuracy and area under the curve (AUC). Moreover, compared with other methods using single omics data and multi-omics data, DeepMO also had a higher prediction accuracy on multi-classification. We also validated the effect of feature selection on DeepMO. Finally, we analyzed the enrichment gene ontology (GO) terms and biological pathways of these significant genes, which were discovered during the feature selection process. We believe that the proposed model is useful for multi-omics data analysis.


State of the Art in Automated Machine Learning

#artificialintelligence

In recent years, machine learning has been very successful in solving a wide range of problems. In particular, neural networks have reached human, and sometimes super-human, levels of ability in tasks such as language translation, object recognition, game playing, and even driving cars. Aerospike is the global leader in next-generation, real-time NoSQL data solutions for any scale. Aerospike's patented Hybrid Memory Architecture delivers an unbreakable competitive advantage by unlocking the full potential of modern hardware, delivering previously unimaginable value from vast amounts of data at the edge, to the core and in the cloud. With this growth in capability has come a growth in complexity. Data scientists and machine learning engineers must perform feature engineering, design model architectures, and optimize hyperparameters. Since the purpose of the machine learning is to automate a task normally done by humans, naturally the next step is to automate the tasks of data scientists and engineers. This area of research is called automated machine learning, or AutoML. There have been many exciting developments in AutoML recently, and it's important to take a look at the current state of the art and learn about what's happening now and what's coming up in the future. InfoQ reached out to the following subject matter experts in the industry to discuss the current state and future trends in AutoML space. InfoQ: What is AutoML and why is it important?


How to Build a Machine Learning Model

#artificialintelligence

How to Build a Machine Learning Model A Visual Guide to Learning Data Science Jul 25 · 13 min read Learning data science may seem intimidating but it doesn't have to be that way. Let's make learning data science fun and easy. So the challenge is how do we exactly make learning data science both fun and easy? Cartoons are fun and since "a picture is worth a thousand words", so why not make a cartoon about data science? With that goal in mind, I've set out to doodle on my iPad the elements that are required for building a machine learning model.