Goto

Collaborating Authors

 Support Vector Machines


Machine Learning for Biomedical Literature Triage

#artificialintelligence

Databases allows storing data in a consistent way, facilitating easy retrieval and enabling both complex searches and computation on data. In the biomedical field, databases are also used as vital resources for scientists searching literature. Over the past few years, researchers and users have noted a significant expansion of such literature databases [1]. For example, the free on-line database PubMed [2] currently holds over 22 million documents, and a simple keyword search can retrieve more than hundreds of thousands of documents. The analysis of the vast biomedical data currently available is a challenge addressed by studies such as [5] [6], as well as the use of this data to identify relevant information for biomedical research [7] [8].


Machine Learning for Biomedical Literature Triage

#artificialintelligence

Databases allows storing data in a consistent way, facilitating easy retrieval and enabling both complex searches and computation on data. In the biomedical field, databases are also used as vital resources for scientists searching literature. Over the past few years, researchers and users have noted a significant expansion of such literature databases [1]. For example, the free on-line database PubMed [2] currently holds over 22 million documents, and a simple keyword search can retrieve more than hundreds of thousands of documents. The analysis of the vast biomedical data currently available is a challenge addressed by studies such as [5] [6], as well as the use of this data to identify relevant information for biomedical research [7] [8].


Single Class Universum-SVM

arXiv.org Artificial Intelligence

This paper extends the idea of Universum learning [1, 2] to single-class learning problems. We propose Single Class Universum-SVM setting that incorporates a priori knowledge (in the form of additional data samples) into the single class estimation problem. These additional data samples or Universum belong to the same application domain as (positive) data samples from a single class (of interest), but they follow a different distribution. Proposed methodology for single class U-SVM is based on the known connection between binary classification and single class learning formulations [3]. Several empirical comparisons are presented to illustrate the utility of the proposed approach.


Daily Digest September 16, 2019 โ€“ BioDecoded

#artificialintelligence

Reseachers benchmarked 22 classification methods that automatically assign cell identities including single-cell-specific and general-purpose classifiers. The performance of the methods is evaluated using 27 publicly available single-cell RNA sequencing datasets of different sizes, technologies, species, and levels of complexity. The general-purpose support vector machine classifier has overall the best performance across the different experiments. Researchers present a novel algorithm for predicting genetic ancestry using only variables that are routinely captured in electronic health records (EHRs), such as self-reported race and ethnicity, and condition billing codes. Using patients that have both genetic and clinical information at Columbia University / New York-Presbyterian Irving Medical Center, they developed a pipeline that uses only clinical data to predict the genetic ancestry of all patients of which more than 80% identify as other or unknown.


On Education Machine Learning: Support Vector Machines in R (SVM in R) - all courses

#artificialintelligence

You're looking for a complete Support Vector Machines course that teaches you everything you need to create a SVM model in R, right? You've found the right Support Vector Machines techniques course! How this course will help you? A Verifiable Certificate of Completion is presented to all students who undertake this Machine learning advanced course. If you are a business manager or an executive, or a student who wants to learn and apply machine learning in Real world problems of business, this course will give you a solid base for that by teaching you some of the advanced technique of machine learning, which are Support Vector Machines.


Many Heads Are Better Than One: The Case For Ensemble Learning

#artificialintelligence

"The interests of truth require a diversity of opinions." Banks and lenders are increasingly turning to AI and machine learning to automate their core functions and make more accurate predictions in credit underwriting and fraud detection. ML practitioners can take advantage of a growing number of modeling algorithms, such as simple decision trees, random forests, gradient boosting machines, deep neural networks, and support vector machines. Each method has its strengths and weaknesses, which is why it often makes sense to combine ML algorithms to provide even greater predictive performance than any single ML method could provide on its own. This method of combining algorithms is known as ensembling.


Persistence B-Spline Grids: Stable Vector Representation of Persistence Diagrams Based on Data Fitting

arXiv.org Machine Learning

Over the last decades, many attempts have been made to optimally integrate machine learning (ML) and topological data analysis. A prominent problem in applying persistent homology to ML tasks is finding a vector representation of a persistence diagram (PD), which is a summary diagram for representing topological features. From the perspective of data fitting, a stable vector representation, persistence B-spline grid (PB), is proposed based on the efficient technique of progressive-iterative approximation for least-squares B-spline surface fitting. Meanwhile, we theoretically prove that the PB method is stable with respect to the metrics defined on the PD space, i.e., the $p$-Wasserstein distance and the bottleneck distance. The proposed method was tested on a synthetic dataset, datasets of randomly generated PDs, data of a dynamical system, and 3D CAD models.


Adversarial Robustness 360 Toolbox v1.0: A Milestone in AI Security

#artificialintelligence

Next week at AI Research Week, hosted by the MIT-IBM Watson AI Lab in Cambridge, MA, we will publish the first major release of the Adversarial Robustness 360 Toolbox (ART). Initially released in April 2018, ART is an open-source library for adversarial machine learning that provides researchers and developers with state-of-the-art tools to defend and verify AI models against adversarial attacks. ART v1.0 marks a milestone in AI security, introducing new features that extend ART to conventional machine learning models and a variety of data types beyond images: The number of reports on real-world exploitations using adversarial attacks against AI is growing, as in the case of anti-virus software, highlighting the importance of understanding, improving and monitoring the adversarial robustness of AI models. ART provides a comprehensive and growing set of tools to systematically assess and improve the robustness of AI models against adversarial attacks, including evasion and poisoning. In evasion attacks, the adversary crafts small changes to the original input to an AI model in order to influence its behaviour.


Instagram Fake and Automated Account Detection

arXiv.org Machine Learning

Fake engagement is one of the significant problems in Online Social Networks (OSNs) which is used to increase the popularity of an account in an inorganic manner. The detection of fake engagement is crucial because it leads to loss of money for businesses, wrong audience targeting in advertising, wrong product predictions systems, and unhealthy social network environment. This study is related with the detection of fake and automated accounts which leads to fake engagement on Instagram. As far as we know, there is no publicly available dataset for fake and automated accounts. For this purpose, two datasets have been generated for the detection of fake and automated accounts. For the detection of these accounts, machine learning algorithms like Naive Bayes, logistic regression, support vector machines and neural networks are applied. Additionally, for the detection of automated accounts, cost sensitive genetic algorithm is applied because of the unnatural bias in the dataset. To deal with the unevenness problem in the fake dataset, Smote-nc algorithm is implemented. For the automated and fake account detection problem, 86% and 96% are obtained, respectively.


A Note on Posterior Probability Estimation for Classifiers

arXiv.org Machine Learning

One of the central themes in the classification task is the estimation of class posterior probability at a new point $\bf{x}$. The vast majority of classifiers output a score for $\bf{x}$, which is monotonically related to the posterior probability via an unknown relationship. There are many attempts in the literature to estimate this latter relationship. Here, we provide a way to estimate the posterior probability without resorting to using classification scores. Instead, we vary the prior probabilities of classes in order to derive the ratio of pdf's at point $\bf{x}$, which is directly used to determine class posterior probabilities. We consider here the binary classification problem.