Goto

Collaborating Authors

 Nearest Neighbor Methods


Machine Learning in Medicine -- Part II

#artificialintelligence

In Part I of this course, we introduced the names of several common machine learning algorithms, such as decision trees, k-nearest neighbors, and neural networks, and discussed how they fit into one another. We proceeded to set up our project by downloading a public domain dataset, the 500 Cities dataset and setting up a JavaScript machine learning library called the DRESS Kit. Next, We went through the data preparation process to extract useful data points from the dataset using several basic functions from the DRESS Kit, including DRESS.local (to load a local file), DRESS.save At the end of Part I, we created a JSON file data.json We also create a JSON file measures.json


Deep Adversarially-Enhanced k-Nearest Neighbors

arXiv.org Artificial Intelligence

Recent works have theoretically and empirically shown that deep neural networks (DNNs) have an inherent vulnerability to small perturbations. Applying the Deep k-Nearest Neighbors (DkNN) classifier, we observe a dramatically increasing robustness-accuracy trade-off as the layer goes deeper. In this work, we propose a Deep Adversarially-Enhanced k-Nearest Neighbors (DAEkNN) method which achieves higher robustness than DkNN and mitigates the robustness-accuracy trade-off in deep layers through two key elements. First, DAEkNN is based on an adversarially trained model. Second, DAEkNN makes predictions by leveraging a weighted combination of benign and adversarial training data. Empirically, we find that DAEkNN improves both the robustness and the robustness-accuracy trade-off on MNIST and CIFAR-10 datasets.


Clustering with UMAP: Why and How Connectivity Matters

arXiv.org Artificial Intelligence

Topology based dimensionality reduction methods such as t-SNE and UMAP have seen increasing success and popularity in high-dimensional data. These methods have strong mathematical foundations and are based on the intuition that the topology in low dimensions should be close to that of high dimensions. Given that the initial topological structure is a precursor to the success of the algorithm, this naturally raises the question: What makes a "good" topological structure for dimensionality reduction? %Insight into this will enable us to design better algorithms which take into account both local and global structure. In this paper which focuses on UMAP, we study the effects of node connectivity (k-Nearest Neighbors vs \textit{mutual} k-Nearest Neighbors) and relative neighborhood (Adjacent via Path Neighbors) on dimensionality reduction. We explore these concepts through extensive ablation studies on 4 standard image and text datasets; MNIST, FMNIST, 20NG, AG, reducing to 2 and 64 dimensions. Our findings indicate that a more refined notion of connectivity (\textit{mutual} k-Nearest Neighbors with minimum spanning tree) together with a flexible method of constructing the local neighborhood (Path Neighbors), can achieve a much better representation than default UMAP, as measured by downstream clustering performance.


k-Nearest Twitter Neighbors

#artificialintelligence

I'm also a mathematics lecturer at Cal State East Bay, and have been fortunate to be able to work with my mentor Prateek Jain as a Data Science Fellow at SharpestMinds. This project was selected as a way for me to practice writing a machine learning algorithm from scratch (no scikit-learn allowed!) and to therefore deeply learn and understand the k-nearest neighbors algorithm, or kNN. If you're not already familiar with kNN, it's a nice ML algorithm to make your first deep dive with, because it's relatively intuitive. Zip codes are frequently useful proxies for individuals because people who live in the same neighborhood often have similar economic backgrounds and educational attainment, and are therefore also likely to share values and politics (not a guarantee, though!). So if you wanted to predict whether a particular piece of legislation would pass in an area, you might poll some of the area's constituents and assume most of those constituents' neighbors will feel similarly about your bill as do the majority of those you polled.


Self-supervised Consensus Representation Learning for Attributed Graph

arXiv.org Artificial Intelligence

Attempting to fully exploit the rich information of topological structure and node features for attributed graph, we introduce self-supervised learning mechanism to graph representation learning and propose a novel Self-supervised Consensus Representation Learning (SCRL) framework. In contrast to most existing works that only explore one graph, our proposed SCRL method treats graph from two perspectives: topology graph and feature graph. We argue that their embeddings should share some common information, which could serve as a supervisory signal. Specifically, we construct the feature graph of node features via k-nearest neighbor algorithm. Then graph convolutional network (GCN) encoders extract features from two graphs respectively. Self-supervised loss is designed to maximize the agreement of the embeddings of the same node in the topology graph and the feature graph. Extensive experiments on real citation networks and social networks demonstrate the superiority of our proposed SCRL over the state-of-the-art methods on semi-supervised node classification task. Meanwhile, compared with its main competitors, SCRL is rather efficient.


Quantitative Finance & Algorithmic Trading in Python

#artificialintelligence

Understand stock market fundamentals Understand the Modern Portfolio Theory Understand stochastic processes and the famous Black-Scholes mode Understand Monte-Carlo simulations Understand Value-at-Risk (VaR) You should have an interest in quantitative finance as well as in mathematics and programming! This course is about the fundamental basics of financial engineering. First of all you will learn about stocks, bonds and other derivatives. The main reason of this course is to get a better understanding of mathematical models concerning the finance in the main. Markowitz-model is the first step.


Character Spotting Using Machine Learning Techniques

arXiv.org Artificial Intelligence

This work presents a comparison of machine learning algorithms that are implemented to segment the characters of text presented as an image. The algorithms are designed to work on degraded documents with text that is not aligned in an organized fashion. The paper investigates the use of Support Vector Machines, K-Nearest Neighbor algorithm and an Encoder Network to perform the operation of character spotting. Character Spotting involves extracting potential characters from a stream of text by selecting regions bound by white space.


Algorithm Selection on a Meta Level

arXiv.org Artificial Intelligence

The problem of selecting an algorithm that appears most suitable for a specific instance of an algorithmic problem class, such as the Boolean satisfiability problem, is called instance-specific algorithm selection. Over the past decade, the problem has received considerable attention, resulting in a number of different methods for algorithm selection. Although most of these methods are based on machine learning, surprisingly little work has been done on meta learning, that is, on taking advantage of the complementarity of existing algorithm selection methods in order to combine them into a single superior algorithm selector. In this paper, we introduce the problem of meta algorithm selection, which essentially asks for the best way to combine a given set of algorithm selectors. We present a general methodological framework for meta algorithm selection as well as several concrete learning methods as instantiations of this framework, essentially combining ideas of meta learning and ensemble learning. In an extensive experimental evaluation, we demonstrate that ensembles of algorithm selectors can significantly outperform single algorithm selectors and have the potential to form the new state of the art in algorithm selection.


K-Nearest Neighbor (KNN) Algorithm

#artificialintelligence

"Tell me who your friends are and I will tell you who you are" As the saying goes -- "A person is known by the company he keeps" and it sounds quite intuitive because people in the same company share similar interests. Data points that are closer to a particular data point will portray similarity in properties and there is a high possibility that they belong to the same class. Here these close data points are called "neighbors" for that particular data point. So, today we gonna discuss one famous Machine Learning algorithm called "K-Nearest Neighbours" or KNN simply. "KNN is a supervised, non-parametric and lazy learning algorithm." Non-parametric means there is no assumption for underlying data distribution. In other words, the model structure is determined from the dataset.


Machine Learning for Telecom Customers Churn Prediction

#artificialintelligence

In this hands-on project, we will train several classification algorithms such as Logistic Regression, Support Vector Machine, K-Nearest Neighbors, and Random Forest Classifier to predict the churn rate of Telecommunication Customers. Machine learning help companies analyze customer churn rate based on several factors such as services subscribed by customers, tenure rate, and payment method. Predicting churn rate is crucial for these companies because the cost of retaining an existing customer is far less than acquiring a new one. Note: This course works best for learners who are based in the North America region.