Highlights: In this post we are going to talk about vectors. They are the fundamental building blocks in Linear Algebra. We will give an intuitive definition what the vectors are, where we use them, how we add them and multiply with scalars. We provide a code examples to demonstrate how to work with vectors in Python. So, what exactly is a vector?

In general, the goal of the loss function is to maximise the dot product between input vector and output vector while minimise the dot product between the input vector and other random vector. So this will make vectors corresponding to input and output word (context word) become more similar. With CBOW, the idea is kind of the same but with a different formulation.

This post will be quite an interesting one. We will show how a 2D plane can be transformed into another one. Understanding these concepts is a crucial step for some more advanced linear algebra/machine learning methods (e.g. So, let's proceed and we will learn how to connect a matrix-vector multiplication with a linear transformation. In this post we will introduce a linear transformation. A linear transformation can also be seen as a simple function.

In past posts, I've described how Recurrent Neural Networks (RNNs) can be used to learn patterns in sequences of inputs, and how the idea of unrolling can be used to train them. It turns out that there are some significant limitations to the types of patterns that a typical RNN can learn, due to the way their weight matrices are used. As a result, there has been a lot of interest in a variant of RNNs called Long Short-Term Memory networks (LSTMs). As I'll describe below, LSTMs have more control than typical RNNs over what they remember, which allows them to learn much more complex patterns. Lets start with what I mean by a "typical" RNN.

In the field of statistics and machine learning, the sums-of-squares, commonly referred to as \emph{ordinary least squares}, can be used as a convenient choice of cost function because of its many nice analytical properties, though not always the best choice. However, it has been long known that \emph{ordinary least squares} is not robust to outliers. Several attempts to resolve this problem led to the creation of alternative methods that, either did not fully resolved the \emph{outlier problem} or were computationally difficult. In this paper, we provide a very simple solution that can make \emph{ordinary least squares} less sensitive to outliers in data classification, by \emph{scaling the augmented input vector by its length}. We show some mathematical expositions of the \emph{outlier problem} using some approximations and geometrical techniques. We present numerical results to support the efficacy of our method.