Collaborating Authors

dimensionality reduction

Principal Component Analysis (PCA) with Python Examples -- Tutorial


When implementing machine learning algorithms, the inclusion of more features might lead to worsening performance issues. Increasing the number of features will not always improve classification accuracy, which is also known as the curse of dimensionality. Hence, we apply dimensionality reduction to improve classification accuracy by selecting the optimal set of lower dimensionality features. Principal component analysis (PCA) is essential for data science, machine learning, data visualization, statistics, and other quantitative fields. It is essential to know about vector, matrix, and transpose matrix, eigenvalues, eigenvectors, and others to understand the concept of dimensionality reduction.

Dimensionality Reduction: Principal Component Analysis


A dataset is made up of a number of features. As long as these features are related in someway to the target and are optimal in number a machine learning model will be able to produce decent results after learning from the data. But if the number of features are high and most of the features do not contribute towards the model's learning then the performance of the model will go down and the time taken to output predictions also increases. The process of reducing the number of dimensions by transforming the original feature space into a subspace is one method of performing dimensionality reduction and Principal Component Analysis (PCA) does this. So let's take a look into the building concepts of PCA.

Popular Machine Learning Algorithms - KDnuggets


When starting out with Data Science, there is so much to learn it can become quite overwhelming. This guide will help aspiring data scientists and machine learning engineers gain better knowledge and experience. I will list different types of machine learning algorithms, which can be used with both Python and R. Linear Regression is the simplest Machine learning algorithm that branches off from Supervised Learning. It is primarily used to solve regression problems and make predictions on continuous dependent variables with the knowledge from independent variables. The goal of Linear Regression is to find the line of best fit, which can help predict the output for continuous dependent variables.

CS229: Machine Learning - AI Summary


CS229: Machine Learning Course Description This course provides a broad introduction to machine learning and statistical pattern recognition. Topics include: supervised learning (generative/discriminative learning, parametric/non-parametric learning, neural networks, support vector machines); unsupervised learning (clustering, dimensionality reduction, kernel methods); learning theory (bias/variance tradeoffs, practical advice); reinforcement learning and adaptive control. The course will also discuss recent applications of machine learning, such as to robotic control, data mining, autonomous navigation, bioinformatics, speech recognition, and text and web data processing. Course Description This course provides a broad introduction to machine learning and statistical pattern recognition. Topics include: supervised learning (generative/discriminative learning, parametric/non-parametric learning, neural networks, support vector machines); unsupervised learning (clustering, dimensionality reduction, kernel methods); learning theory (bias/variance tradeoffs, practical advice); reinforcement learning and adaptive control.

What is Machine Learning?


The term Machine Learning might sound a bit intimidating. It is sometimes associated with scary accurate recommender systems or humanoid robots. However, the fundamentals of machine learning are actually not that complicated. If you, as a human, were given two (x, y) coordinates and had to draw a line through these points, you'd probably do this without much effort. However, if someone asked you to draw a line based on 1 million (x, y) coordinates, you'd probably politely tell this person to fuck off.

Allowing Computers to learn from Data : Gold of 21st Century


As a Data scientist, I think that machine learning, the application and science of algorithms that can make sense of data, are the most exciting of all the fields within computer science. As a society, we are living in a time when data is abundant and we can turn this data into knowledge by using self-learning algorithms from the field of machine learning. The plethora of powerful open source algorithms that have been developed in recent years has made now our best time ever to learn more about machine learning and learn how to use them to spot patterns in data and anticipate future events was probably not the case before many useful open source libraries were created. Despite the fact that we live in an age of modern technology, there is one resource that is plenty available to anyone: a large amount of structured and unstructured data. A subfield of artificial intelligence that evolved in the second half of the 20th century was machine learning.

Denoising Autoencoders (DAE) -- How To Use Neural Networks to Clean Up Your Data


The most common type of Autoencoder is an Undercomplete Autocencoder which squeezes (encodes) data into fewer neurons (lower dimension) while removing "unimportant" information. It achieves that by training an encoder and decoder simultaneously, so the output neurons match inputs as closely as possible.

Non-Linear Spectral Dimensionality Reduction Under Uncertainty Artificial Intelligence

In this paper, we consider the problem of non-linear dimensionality reduction under uncertainty, both from a theoretical and algorithmic perspectives. Since real-world data usually contain measurements with uncertainties and artifacts, the input space in the proposed framework consists of probability distributions to model the uncertainties associated with each sample. We propose a new dimensionality reduction framework, called NGEU, which leverages uncertainty information and directly extends several traditional approaches, e.g., KPCA, MDA/KMFA, to receive as inputs the probability distributions instead of the original data. We show that the proposed NGEU formulation exhibits a global closed-form solution, and we analyze, based on the Rademacher complexity, how the underlying uncertainties theoretically affect the generalization ability of the framework. Empirical results on different datasets show the effectiveness of the proposed framework.


AAAI Conferences

As an important machine learning topic, dimensionality reduction has been widely studied and utilized in various kinds of areas. A multitude of dimensionality reduction methods have been developed, among which unsupervised dimensionality reduction is more desirable when obtaining label information requires onerous work. However, most previous unsupervised dimensionality reduction methods call for an affinity graph constructed beforehand, with which the following dimensionality reduction steps can be then performed. Separation of graph construction and dimensionality reduction leads the dimensionality reduction process highly dependent on quality of the input graph. In this paper, we propose a novel graph embedding method for unsupervised dimensionality reduction.


AAAI Conferences

Robots with many sensors are capable of generating volumes of high-dimensional perceptual data. Making sense of this data and extracting useful knowledge from it is a difficult problem. For robots lacking proper models, trying to understand a stream of uninterpreted data is an especially acute problem. One critical step in linking raw uninterpreted perceptual data to cognition is dimensionality reduction. Current methods for reducing the dimension of data do not meet the demands of a robot situated in the world, and methods that use only perceptual data do not take full advantage of the interactive experience of an embodied robot agent. This work proposes a new scalable, incremental and active approach to dimensionality reduction suitable for extracting geometric knowledge from uninterpreted sensors and effectors. The proposed method uses distinctive state abstractions to organize early sensorimotor experience and sensorimotor embedding to incrementally learn accurate geometric representations based on experience. This approach is applied to the problem of learning the geometry of sensors, space, and objects. The result is evaluated using techniques from statistical shape analysis.