Lecture notes on high-dimensional data

Wegner, Sven-Ake

arXiv.org Artificial Intelligence 

The text below arose from a course on'Mathematical Data Science' that I taught twice for final year BSc Mathematics students in the UK between 2019 and 2020. The notes presently cover the first part (roughly a third) of the course focussing on the characteristics and peculiarities of high-dimensional data. An improved version of the notes appeared as part of the textbook [7]; we refer the reader in particular to [7, Chapters 8 -12]. I would like to thank my former students who attended the course and helped me with their feedback to write these lecture notes. Concrete examples are as follows. Each user can give a rating from one to five stars for each movie. When doing medical diagnostic tests, we can represent a subject by the vector containing her/his results. These can include integers like antibody counts, real numbers like temperature, pairs of real numbers like blood pressure, or binary values like if a subject has tested positive or negative for a certain infection. If we name the users 1, 2, 3,..., we can represent user j in R Given such a high-dimensional data set A, classical tasks to analyze the data, or make predictions based on it, involve to compute distances between data points. This can be for example the classical euclidean distance (or any other p-norm), CHAPTER 1. THE CURSE OF HIGH DIMENSIONS 4 However, if d is very large, we are faced with the following two obstructions.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found