AITopics | Performance Analysis

Collaborating Authors

Performance Analysis

News Overviews Instructional Materials AI-Alerts Classics

A Local Density-Based Approach for Local Outlier Detection

arXiv.org Machine LearningJun-27-2016

This paper presents a simple but effective density-based outlier detection approach with the local kernel density estimation (KDE). A Relative Density-based Outlier Score (RDOS) is introduced to measure the local outlierness of objects, in which the density distribution at the location of an object is estimated with a local KDE method based on extended nearest neighbors of the object. Instead of using only $k$ nearest neighbors, we further consider reverse nearest neighbors and shared nearest neighbors of an object for density distribution estimation. Some theoretical properties of the proposed RDOS including its expected value and false alarm probability are derived. A comprehensive experimental study on both synthetic and real-life data sets demonstrates that our approach is more effective than state-of-the-art outlier detection methods.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Machine Learning

1606.08538

Country: North America > United States (0.93)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

Add feedback

Dynamic Hierarchical Dirichlet Process for Abnormal Behaviour Detection in Video

Isupova, Olga, Kuzin, Danil, Mihaylova, Lyudmila

arXiv.org Machine LearningJun-27-2016

This paper proposes a novel dynamic Hierarchical Dirichlet Process topic model that considers the dependence between successive observations. Conventional posterior inference algorithms for this kind of models require processing of the whole data through several passes. It is computationally intractable for massive or sequential data. We design the batch and online inference algorithms, based on the Gibbs sampling, for the proposed model. It allows to process sequential data, incrementally updating the model by a new observation. The model is applied to abnormal behaviour detection in video sequences. A new abnormality measure is proposed for decision making. The proposed method is compared with the method based on the non- dynamic Hierarchical Dirichlet Process, for which we also derive the online Gibbs sampler and the abnormality measure. The results with synthetic and real data show that the consideration of the dynamics in a topic model improves the classification performance for abnormal behaviour detection.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

1606.08476

Country: North America > United States > Oregon (0.14)

Genre: Research Report (0.64)

Industry: Consumer Products & Services > Restaurants (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.88)

Add feedback

Anomaly detection in video with Bayesian nonparametrics

Isupova, Olga, Kuzin, Danil, Mihaylova, Lyudmila

arXiv.org Machine LearningJun-27-2016

A novel dynamic Bayesian nonparametric topic model for anomaly detection in video is proposed in this paper. Batch and online Gibbs samplers are developed for inference. The paper introduces a new abnormality measure for decision making. The proposed method is evaluated on both synthetic and real data. The comparison with a non-dynamic model shows the superiority of the proposed dynamic one in terms of the classification performance for anomaly detection.

data mining, detection, machine learning, (14 more...)

arXiv.org Machine Learning

1606.08455

Country:

North America > United States > Oregon (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.50)

Industry: Consumer Products & Services (0.47)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Add feedback

Cyberbullying Identification Using Participant-Vocabulary Consistency

Raisi, Elaheh, Huang, Bert

arXiv.org Machine LearningJun-26-2016

With the rise of social media, people can now form relationships and communities easily regardless of location, race, ethnicity, or gender. However, the power of social media simultaneously enables harmful online behavior such as harassment and bullying. Cyberbullying is a serious social problem, making it an important topic in social network analysis. Machine learning methods can potentially help provide better understanding of this phenomenon, but they must address several key challenges: the rapidly changing vocabulary involved in cyber- bullying, the role of social network structure, and the scale of the data. In this study, we propose a model that simultaneously discovers instigators and victims of bullying as well as new bullying vocabulary by starting with a corpus of social interactions and a seed dictionary of bullying indicators. We formulate an objective function based on participant-vocabulary consistency. We evaluate this approach on Twitter and Ask.fm data sets and show that the proposed method can detect new bullying vocabulary as well as victims and bullies.

artificial intelligence, cyberbullying identification, machine learning, (16 more...)

arXiv.org Machine Learning

1606.08084

Country: North America > United States > Virginia (0.15)

Genre: Research Report (0.70)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.95)

Add feedback

Discriminating sample groups with multi-way data

Lyu, Tianmeng, Lock, Eric F., Eberly, Lynn E.

arXiv.org Machine LearningJun-26-2016

High-dimensional linear classifiers, such as the support vector machine (SVM) and distance weighted discrimination (DWD), are commonly used in biomedical research to distinguish groups of subjects based on a large number of features. However, their use is limited to applications where a single vector of features is measured for each subject. In practice data are often multi-way, or measured over multiple dimensions. For example, metabolite abundance may be measured over multiple regions or tissues, or gene expression may be measured over multiple time points, for the same subjects. We propose a framework for linear classification of high-dimensional multi-way data, in which coefficients can be factorized into weights that are specific to each dimension. More generally, the coefficients for each measurement in a multi-way dataset are assumed to have low-rank structure. This framework extends existing classification techniques, and we have implemented multi-way versions of SVM and DWD. We describe informative simulation results, and apply multi-way DWD to data for two very different clinical research studies. The first study uses metabolite magnetic resonance spectroscopy data over multiple brain regions to compare patients with and without spinocerebellar ataxia, the second uses publicly available gene expression time-course data to compare treatment responses for patients with multiple sclerosis. Our method improves performance and simplifies interpretation over naive applications of full rank linear classification to multi-way data. An R package is available at https://github.com/lockEF/MultiwayClassification .

artificial intelligence, discriminating sample group, machine learning, (16 more...)

arXiv.org Machine Learning

doi: 10.1093/biostatistics/kxw057

1606.08046

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)

Genre:

Research Report > Experimental Study (0.86)
Research Report > New Finding (0.66)

Industry: Health & Medicine > Therapeutic Area > Neurology > Multiple Sclerosis (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)

Add feedback

The Rise of Social Bots July 2016 Communications of the ACM

Communications of the ACMJun-25-2016, 15:57:28 GMT

Bots (short for software robots) have been around since the early days of computers. One compelling example of bots is chatbots, algorithms designed to hold a conversation with a human, as envisioned by Alan Turing in the 1950s.33 The dream of designing a computer algorithm that passes the Turing test has driven artificial intelligence research for decades, as witnessed by initiatives like the Loebner Prize, awarding progress in natural language processing.a Many things have changed since the early days of AI, when bots like Joseph Weizenbaum's ELIZA,39 mimicking a Rogerian psychotherapist, were developed as demonstrations or for delight. Today, social media ecosystems populated by hundreds of millions of individuals present real incentives--including economic and political ones--to design algorithms that exhibit human-like behavior. Such ecosystems also raise the bar of the challenge, as they introduce new dimensions to emulate in addition to content, including the social network, temporal activity, diffusion patterns, and sentiment expression. A social bot is a computer algorithm that automatically produces content and interacts with humans on social media, trying to emulate and possibly alter their behavior. Social bots have inhabited social media platforms for the past few years.7,24

artificial intelligence, machine learning, natural language, (19 more...)

Communications of the ACM

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Indiana > Monroe County > Bloomington (0.05)
North America > United States > Texas (0.04)
(2 more...)

Genre:

Personal (0.68)
Research Report (0.46)

Industry:

Information Technology > Services (1.00)
Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Banking & Finance > Trading (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Issues > Turing's Test (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

Add feedback

A gentle introduction to Naïve Bayes classification using R

#artificialintelligenceJun-25-2016, 05:51:17 GMT

Now that we have a model, we can do some predicting. We do this by feeding our test data into our model and comparing the predicted party affiliations with the known ones. The latter is done via the wonderfully named confusion matrix – a table in which true and predicted values for each of the predicted classes are displayed in a matrix format.

artificial intelligence, machine learning, probability, (15 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Large-Scale Kernel Methods for Independence Testing

Zhang, Qinyi, Filippi, Sarah, Gretton, Arthur, Sejdinovic, Dino

arXiv.org Machine LearningJun-25-2016

Representations of probability measures in reproducing kernel Hilbert spaces provide a flexible framework for fully nonparametric hypothesis tests of independence, which can capture any type of departure from independence, including nonlinear associations and multivariate interactions. However, these approaches come with an at least quadratic computational cost in the number of observations, which can be prohibitive in many applications. Arguably, it is exactly in such large-scale datasets that capturing any type of dependence is of interest, so striking a favourable tradeoff between computational efficiency and test performance for kernel independence tests would have a direct impact on their applicability in practice. In this contribution, we provide an extensive study of the use of large-scale kernel approximations in the context of independence testing, contrasting block-based, Nystrom and random Fourier feature approaches. Through a variety of synthetic data experiments, it is demonstrated that our novel large scale methods give comparable performance with existing methods whilst using significantly less computation time and memory.

approximation, kernel, large-scale kernel method, (13 more...)

arXiv.org Machine Learning

1606.07892

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
North America > United States > Oregon > Benton County > Corvallis (0.04)
(2 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

NYC Data Science Academy

#artificialintelligenceJun-23-2016, 00:15:51 GMT

They are currently in the NYC Data Science Academy 12 week full time Data Science Bootcamp program taking place between January 11th to April 1st, 2016. This post is based on their fourth class project - Machine learning(due on the 8th week of the program). The Higgs Boson Challenge, hosted by Kaggle, asked the data scientist community to utilize machine learning to accurately predict if a particle was a Higgs-Boson particle or not; more specifically if a signal detected was either a'tau tau decay of a Higgs boson' or just'background'. The datasets provided were the training and test set with 250,000 and 550,000 observations, respectively. The training set contained all the same features as the test with two additional columns of'Label' and'Weight' that gave the accurate classifiers to help train our models.

artificial intelligence, machine learning, node, (14 more...)

#artificialintelligence

Industry: Education (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.30)

Add feedback

An effective approach for classification of advanced malware with high accuracy

Sharma, Ashu, Sahay, Sanjay K.

arXiv.org Artificial IntelligenceJun-22-2016

Combating malware is very important for software/systems security, but to prevent the software/systems from the advanced malware, viz. metamorphic malware is a challenging task, as it changes the structure/code after each infection. Therefore in this paper, we present a novel approach to detect the advanced malware with high accuracy by analyzing the occurrence of opcodes (features) by grouping the executables. These groups are made on the basis of our earlier studies [1] that the difference between the sizes of any two malware generated by popular advanced malware kits viz. PS-MPC, G2 and NGVCK are within 5 KB. On the basis of obtained promising features, we studied the performance of thirteen classifiers using N-fold cross-validation available in machine learning tool WEKA. Among these thirteen classifiers we studied in-depth top five classifiers (Random forest, LMT, NBT, J48 and FT) and obtain more than 96.28% accuracy for the detection of unknown malware, which is better than the maximum detection accuracy (95.9%) reported by Santos et al (2013). In these top five classifiers, our approach obtained a detection accuracy of 97.95% by the Random forest.

artificial intelligence, machine learning, malware, (18 more...)

arXiv.org Artificial Intelligence

1606.06897

Country: North America > United States (0.68)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.50)

Add feedback