Support Vector Machines
Very Deep Convolutional Neural Networks for Complex Land Cover Mapping Using Multispectral Remote Sensing Imagery
Despite recent advances of deep Convolutional Neural Networks (CNNs) in various computer vision tasks, their potential for classification of multispectral remote sensing images has not been thoroughly explored. In particular, the applications of deep CNNs using optical remote sensing data have focused on the classification of very high-resolution aerial and satellite data, owing to the similarity of these data to the large datasets in computer vision. Accordingly, this study presents a detailed investigation of state-of-the-art deep learning tools for classification of complex wetland classes using multispectral RapidEye optical imagery. Specifically, we examine the capacity of seven well-known deep convnets, namely DenseNet121, InceptionV3, VGG16, VGG19, Xception, ResNet50, and InceptionResNetV2, for wetland mapping in Canada. In addition, the classification results obtained from deep CNNs are compared with those based on conventional machine learning tools, including Random Forest and Support Vector Machine, to further evaluate the efficiency of the former to classify wetlands. The results illustrate that the full-training of convnets using five spectral bands outperforms the other strategies for all convnets. InceptionResNetV2, ResNet50, and Xception are distinguished as the top three convnets, providing state-of-the-art classification accuracies of 96.17%, 94.81%, and 93.57%, respectively. The classification accuracies obtained using Support Vector Machine (SVM) and Random Forest (RF) are 74.89% and 76.08%, respectively, considerably inferior relative to CNNs. Importantly, InceptionResNetV2 is consistently found to be superior compared to all other convnets, suggesting the integration of Inception and ResNet modules is an efficient architecture for classifying complex remote sensing scenes such as wetlands.
Intercon World Keynote Dr. Ganapathi Pulipaka Receives a Top 50 Technology Leader Award for His Contributions to AI, Machine Learning, Mathematics, and Data Science
At the Intercon conference, Dr. GP gave a motivational keynote speech on Deep Reinforcement Learning and the landscape of machine learning and artificial intelligence that inspired the audience. He noted that the MIT Technology Review has downloaded 16,625 research papers from arxiv that are publicly available under the computer science and artificial intelligence section through November 2018. Through natural language processing techniques on the abstracts, the words "constraint," "theory," "rule," "logic," "program," "learning," "network," "data," "task," and "performance" have been evaluated to find the reinforcement learning boom in recent times. Dr. GP said trends have shown the rise of traditional neural networks in the 1950s and 1960s, symbolic approaches in the 1970s, knowledge-based and rule-based systems in 1980s, support vector machines in 1990s, and the reign of neural networks in the 2010s with the advent of heavy implementation of deep neural networks. Deep Traffic is a reinforcement learning simulation based on the 24,000 entries received on MIT's Deep Traffic competition on self-driving cars that drive on a multi-lane freeway with a model-free off-policy reinforcement learning process that inspires a number of data scientists and machine learning enthusiasts to evaluate the Deep-Q-Learning reinforcement learning network variants and hyperparameter configurations with episodic iterations training of 96.6 years of RL simulations, 572.2 million crowdsourced and optimized DQN hyperparameters to train the agents successfully.
Understanding the Different Types of Machine Learning Models
Supervised learning models map inputs to outputs. Supervised learning is typically done in the context of classification, when we want to map input to output labels, or regression, when we want to map input to a continuous output. Common algorithms in supervised learning include logistic regression, naive bayes, support vector machines, artificial neural networks, and random forests. In both regression and classification, the goal is to find specific relationships or structure in the input data that allow us to effectively produce correct output data. Note that "correct" output is determined entirely from the training data, so while we do have a ground truth that our model will assume is true, it is not to say that data labels are always correct in real-world situations.
How to Predict Hotel Cancellations with Support Vector Machines and ARIMA
Hotel cancellations can cause issues for many businesses in the industry. Not only is there the lost revenue as a result of the customer canceling, but this can also cause difficulty in coordinating bookings and adjusting revenue management practices. Data analytics can help to overcome this issue, in terms of identifying the customers who are most likely to cancel โ allowing a hotel chain to adjust its marketing strategy accordingly. To investigate how machine learning can aid in this task, the ExtraTreesClassifer, logistic regression, and support vector machine models were employed in Python to determine whether cancellations can be accurately predicted with this model. For this example, both hotels are based in Portugal.
Automatic Language Identification in Texts: A Survey
Jauhiainen, Tommi, Lui, Marco, Zampieri, Marcos, Baldwin, Timothy, Lindรฉn, Krister
Language identification ("LI") is the problem of determining the natural language that a document or part thereof is written in. Automatic LI has been extensively researched for over fifty years. Today, LI is a key part of many text processing pipelines, as text processing techniques generally assume that the language of the input text is known. Research in this area has recently been especially active. This article provides a brief history of LI research, and an extensive survey of the features and methods used in the LI literature. We describe the features and methods using a unified notation, to make the relationships between methods clearer. We discuss evaluation methods, applications of LI, as well as off-the-shelf LI systems that do not require training by the end user. Finally, we identify open issues, survey the work to date on each issue, and propose future directions for research in LI.
Quantum algorithms for Second-Order Cone Programming and Support Vector Machines
Kerenidis, Iordanis, Prakash, Anupam, Szilรกgyi, Dรกniel
Convex optimization is one of the central areas of study in computer science and mathematical optimization. The reason for the great importance of convex optimization is twofold. Firstly, starting with the seminal works of Khachiyan [25] and Karmarkar [18], efficient algorithms have been developed for a large family of convex optimization problems over the last few decades. Secondly, convex optimization has many real world applications and many optimization problems that arise in practice can be reduced to convex optimization [8]. There are three main classes of structured convex optimization problems: linear programs (LP), semidefinite programs (SDP), and second-order conic programs (SOCP). The fastest (classical) algorithms for these problems belong to the family of interior-point methods (IPM). Interior point methods are iterative algorithms where the main computation in each step is the solution of a system of linear equations whose size depends on the dimension of the optimization problem. The size of structured optimization problems that can be solved in practice is therefore limited by the efficiency of linear system solvers - on a single computer, most open-source and commercial solvers can handle dense problems with up to tens of thousands of constraints and variables, or sparse problems with the same number of nonzero entries [30, 31]. In recent years, there has been a tremendous interest in quantum linear algebra algorithms following the breakthrough algorithm of Harrow, Hassidim and Lloyd [16].
Machine Learning as Ecology
Howell, Owen, Wenping, Cui, Marsland, Robert III, Mehta, Pankaj
Machine learning methods have had spectacular success on numerous problems. Here we show that a prominent class of learning algorithms - including Support Vector Machines (SVMs) -- have a natural interpretation in terms of ecological dynamics. We use these ideas to design new online SVM algorithms that exploit ecological invasions, and benchmark performance using the MNIST dataset. Our work provides a new ecological lens through which we can view statistical learning and opens the possibility of designing ecosystems for machine learning. Supplemental code is found at https://github.com/owenhowell20/EcoSVM.
Quadratic Surface Support Vector Machine with L1 Norm Regularization
Mousavi, Seyedahmad, Gao, Zheming, Han, Lanshan, Lim, Alvin
We propose $\ell_1$ norm regularized quadratic surface support vector machine models for binary classification in supervised learning. We establish their desired theoretical properties, including the existence and uniqueness of the optimal solution, reduction to the standard SVMs over (almost) linearly separable data sets, and detection of true sparsity pattern over (almost) quadratically separable data sets if the penalty parameter of $\ell_1$ norm is large enough. We also demonstrate their promising practical efficiency by conducting various numerical experiments on both synthetic and publicly available benchmark data sets.
Viability of machine learning to reduce workload in systematic review screenings in the health sciences: a working paper
Systematic reviews, which summarize and synthesize all the current research in a specific topic, are a crucial component to academia. They are especially important in the biomedical and health sciences, where they synthesize the state of medical evidence and conclude the best course of action for various diseases, pathologies, and treatments. Due to the immense amount of literature that exists, as well as the output rate of research, reviewing abstracts can be a laborious process. Automation may be able to significantly reduce this workload. Of course, such classifications are not easily automated due to the peculiar nature of written language. Machine learning may be able to help. This paper explored the viability and effectiveness of using machine learning modelling to classify abstracts according to specific exclusion/inclusion criteria, as would be done in the first stage of a systematic review. The specific task was performing the classification of deciding whether an abstract is a randomized control trial (RCT) or not, a very common classification made in systematic reviews in the healthcare field. Random training/testing splits of an n=2042 dataset of labelled abstracts were repeatedly created (1000 times in total), with a model trained and tested on each of these instances. A Bayes classifier as well as an SVM classifier were used, and compared to non-machine learning, simplistic approaches to textual classification. An SVM classifier was seen to be highly effective, yielding a 90% accuracy, as well as an F1 score of 0.84, and yielded a potential workload reduction of 70%. This shows that machine learning has the potential to significantly revolutionize the abstract screening process in healthcare systematic reviews.
Twitter Sentiment on Affordable Care Act using Score Embedding
Mohsen Farhadloo, PhD John Molson Scool of Business, Concordia University mohsen.farhadloo@concordia.ca August 21, 2019 Abstract In this paper we introduce score embedding, a neural network based model to learn interpretable vector representations for words. Score embedding is a supervised method that takes advantage of the labeled training data and the neural network architecture to learn interpretable representations for words. Health care has been a controversial issue between political parties in the United States. In this paper we use the discussions on Twitter regarding different issues of affordable care act to identify the public opinion about the existing health care plans using the proposed score embedding. Our results indicate our approach effectively incorporates the sentiment information and outperforms or is at least comparable to the state-of-the-art methods and the negative sentiment towards "TrumpCare" was consistently greater than neutral and positive sentiment over time. 1 Introduction Sentiment analysis as a type of text categorization is the task of identifying the sentiment orientation of documents written in natural language which assigns one of the predefined sentiment categories into a whole document or pieces of the document such as phrases or sentences [23, 8]. Many studies used binary classification and reported high performance [18, 29, 24] and some studies have observed that the performance of the categorization reduces as the number of sentiment categories increases [2, 16, 3, 11]. Bag-Of-Words (BOW), a standard approach for text categorization, represents a document by a vector that indicates the words that appear in the document.