Support Vector Machines
Bayesian multi-domain learning for cancer subtype discovery from next-generation sequencing count data
Hajiramezanali, Ehsan, Dadaneh, Siamak Zamani, Karbalayghareh, Alireza, Zhou, Mingyuan, Qian, Xiaoning
Precision medicine aims for personalized prognosis and therapeutics by utilizing recent genome-scale high-throughput profiling techniques, including next-generation sequencing (NGS). However, translating NGS data faces several challenges. First, NGS count data are often overdispersed, requiring appropriate modeling. Second, compared to the number of involved molecules and system complexity, the number of available samples for studying complex disease, such as cancer, is often limited, especially considering disease heterogeneity. The key question is whether we may integrate available data from all different sources or domains to achieve reproducible disease prognosis based on NGS count data. In this paper, we develop a Bayesian Multi-Domain Learning (BMDL) model that derives domain-dependent latent representations of overdispersed count data based on hierarchical negative binomial factorization for accurate cancer subtyping even if the number of samples for a specific cancer type is small. Experimental results from both our simulated and NGS datasets from The Cancer Genome Atlas (TCGA) demonstrate the promising potential of BMDL for effective multi-domain learning without "negative transfer" effects often seen in existing multi-task learning and transfer learning methods.
Optimal arrangements of hyperplanes for multiclass classification
Blanco, Víctor, Japón, Alberto, Puerto, Justo
In this paper, we present a novel approach to construct multiclass clasifiers by means of arrangements of hyperplanes. We propose different mixed integer non linear programming formulations for the problem by using extensions of widely used measures for misclassifying observations. We prove that kernel tools can be extended to these models. Some strategies are detailed that help solving the associated mathematical programming problems more efficiently. An extensive battery of experiments has been run which reveal the powerfulness of our proposal in contrast to other previously proposed methods.
Proactive Security: Embedded AI Solution for Violent and Abusive Speech Recognition
Shulby, Christopher Dane, Pombal, Leonardo, Jordão, Vitor, Ziolle, Guilherme, Martho, Bruno, Postal, Antônio, Prochnow, Thiago
Abstract--Violence is an epidemic in Brazil and a problem on the rise worldwide. Mobile devices provide communication technologies which can be used to monitor and alert about violent situations. However, current solutions, like panic buttons or safe words, might increase the loss of life in violent situations. We propose an embedded artificial intelligence solution, using natural language and speech processing technology, to silently alert someone who can help in this situation. The corpus used contains 400 positive phrases and 800 negative phrases, totaling 1,200 sentences which are classified using two well-known extraction methods for natural language processing tasks: bag-of-words and word embeddings and classified with a support vector machine. We describe the proof-of-concept product in development with promising results, indicating a path towards a commercial product. More importantly we show that model improvements via word embeddings and data augmentation techniques provide an intrinsically robust model. The final embedded solution also has a small footprint of less than 10 MB.
Vaimal - Machine Learning Add-In - Vortarus Technologies LLC
Vaimal is a machine learning add-in that allows you to train and deploy machine learning algorithms without programming. You can make predictions on new data using models that are trained on historical data. Vaimal allows you to create decision trees, support vector machines and neural networks all within Excel . It also includes more powerful ensemble methods to combine models for even better predictive performance. The easy to use interface allows you to focus on your data without worrying about learning mundane programming tasks required with common machine learning platforms.
Feature Selection and Comparison of Machine Learning Algorithms in Classification of Grazing and Rumination Behaviour in Sheep
Grazing and ruminating are the most important behaviours for ruminants, as they spend most of their daily time budget performing these. Continuous surveillance of eating behaviour is an important means for monitoring ruminant health, productivity and welfare. However, surveillance performed by human operators is prone to human variance, time-consuming and costly, especially on animals kept at pasture or free-ranging. The use of sensors to automatically acquire data, and software to classify and identify behaviours, offers significant potential in addressing such issues. In this work, data collected from sheep by means of an accelerometer/gyroscope sensor attached to the ear and collar, sampled at 16 Hz, were used to develop classifiers for grazing and ruminating behaviour using various machine learning algorithms: random forest (RF), support vector machine (SVM), k nearest neighbour (kNN) and adaptive boosting (Adaboost).
Scikit-Learn: A silver bullet for basic machine learning
Scikit-Learn is python's core machine learning package that has most of the necessary modules to support a basic machine learning project. The library provides a unified API (Application Programming Interface) for practitioners to ease the use of machine learning algorithms with only writing a few lines to accomplish the predictive or classification task. One of the few libraries in python which has kept to the promise of maintaining the algorithm and interface layer simple and not complicating it to cover the entire machine learning feature landscape. The package is written heavily in python, and it incorporates C libraries like LibSVM and LibLinear for support vector machines and generalized linear model implementation. The package depends on Pandas (mainly for the dataframe processes), numpy (for the ndarray construct) and scipy (for sparse matrices).
Average Margin Regularization for Classifiers
Adversarial robustness has become an important research topic given empirical demonstrations on the lack of robustness of deep neural networks. Unfortunately, recent theoretical results suggest that adversarial training induces a strict tradeoff between classification accuracy and adversarial robustness. In this paper, we propose and then study a new regularization for any margin classifier or deep neural network. We motivate this regularization by a novel generalization bound that shows a tradeoff in classifier accuracy between maximizing its margin and average margin. We thus call our approach an average margin (AM) regularization, and it consists of a linear term added to the objective. We theoretically show that for certain distributions AM regularization can both improve classifier accuracy and robustness to adversarial attacks. We conclude by using both synthetic and real data to empirically show that AM regularization can strictly improve both accuracy and robustness for support vector machine's (SVM's) and deep neural networks, relative to unregularized classifiers and adversarially trained classifiers.
Embedding Geographic Locations for Modelling the Natural Environment using Flickr Tags and Structured Data
Jeawak, Shelan S., Jones, Christopher B., Schockaert, Steven
Meta-data from photo-sharing websites such as Flickr can be used to obtain rich bag-of-words descriptions of geographic locations, which have proven valuable, among others, for modelling and predicting ecological features. One important insight from previous work is that the descriptions obtained from Flickr tend to be complementary to the structured information that is available from traditional scientific resources. To better integrate these two diverse sources of information, in this paper we consider a method for learning vector space embeddings of geographic locations. We show experimentally that this method improves on existing approaches, especially in cases where structured information is available.
An intuitive introduction to support vector machines using R – Part 1
Which outputs the following: the function call, SVM type, kernel and cost (which is set to its default). In case you are wondering about gamma, although it's set to 0.5 here, it plays no role in linear SVMs. We'll say more about it in the sequel to this article in which we'll cover more complex kernels. More interesting are the support vectors. In a nutshell, these are training dataset points that specify the location of the decision boundary.