Understanding data usually is half the battle won. For any machine learning project it helps immensely to analyze your data from different points of view. Summarising a dataset means understanding how your data looks when subjected to simple statistical anaylsis. To illustrate the various techniques let us consider the glass dataset from the r package mlbench. It has 214 observations containing examples of the chemical analysis of 7 different types of glass.
It is a good idea to have small well understood datasets when getting started in machine learning and learning a new tool. The Weka machine learning workbench provides a directory of small well understood datasets in the installed directory. In this post you will discover some of these small well understood datasets distributed with Weka, their details and where to learn more about them. We will focus on a handful of datasets of differing types. Standard Machine Learning Datasets Used For Practice in Weka Photo by Marvin Foushee, some rights reserved.
There is one truth discovered by every data analyst: datasets are not always available. Most of the times, just to find the specific chunks of data we are searching for we need to scavenge the internet for non existing links, obsolete and badly structured datasets. Sometimes, the data cannot even be found. One issue that you might have encountered already, is that you found the information you were searching for, but not in the form of a dataset. Perhaps, summarized on a graph in a research paper, but not in the form of a downloadable dataset.
This work begins by establishing a mathematical formalization between different geometrical interpretations of Neural Networks, providing a first contribution. From this starting point, a new interpretation is explored, using the idea of implicit vector fields moving data as particles in a flow. A new architecture, Vector Fields Neural Networks(VFNN), is proposed based on this interpretation, with the vector field becoming explicit. A specific implementation of the VFNN using Euler's method to solve ordinary differential equations (ODEs) and gaussian vector fields is tested. The first experiments present visual results remarking the important features of the new architecture and providing another contribution with the geometrically interpretable regularization of model parameters. Then, the new architecture is evaluated for different hyperparameters and inputs, with the objective of evaluating the influence on model performance, computational time, and complexity. The VFNN model is compared against the known basic models Naive Bayes, Feed Forward Neural Networks, and Support Vector Machines(SVM), showing comparable, or better, results for different datasets. Finally, the conclusion provides many new questions and ideas for improvement of the model that can be used to increase model performance.
Many machine learning algorithms expect data to be scaled consistently. There are two popular methods that you should consider when scaling your data for machine learning. In this tutorial, you will discover how you can rescale your data for machine learning. How To Prepare Machine Learning Data From Scratch With Python Photo by Ondra Chotovinsky, some rights reserved. Many machine learning algorithms expect the scale of the input and even the output data to be equivalent.