AITopics

doi: 10.1016/j.compbiomed.2018.12.014

1902.05064

Country:

North America > United States (0.14)
Europe > United Kingdom > England > Warwickshire (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Bars, Batiste Le, Kalogeratos, Argyris

A Probabilistic Framework to Node-level Anomaly Detection in Communication Networks

arXiv.org Machine LearningFeb-12-2019

Abstract--In this paper we consider the task of detecting abnormal communication volume occurring at node-level in communication networks. The signal of the communication activity is modeled by means of a clique stream: each occurring communication event is instantaneous and activates an undirected subgraph spanning over a set of equally participating nodes. We present a probabilistic framework to model and assess the communication volume observed at any single node. Specifically, we employ nonparametric regression to learn the probability that a node takes part in a certain event knowing the set of other nodes that are involved. On the top of that, we present a concentration inequality around the estimated volume of events in which a node could participate, which in turn allows us to build an efficient and interpretable anomaly scoring function. Finally, the superior performance of the proposed approach is empirically demonstrated in real-world sensor network data, as well as using synthetic communication activity that is in accordance with that latter setting. I. INTRODUCTION Monitoring the activity in communication networks has become a popular area of research and particular attention has been paid to detection tasks such as spotting events or anomalies. Aneffective way to represent the communication activity is via a dynamic graph where the entities are considered to be nodes, and each communication event (or more simply event) to be represented by a set of connecting edges that appear at a specific time interval.

detection, inequality, node, (16 more...)

1902.04521

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)
Asia > Middle East > Jordan (0.04)
Africa > Senegal > Kolda Region > Kolda (0.04)

Genre: Research Report (1.00)

Industry:

Telecommunications (0.66)
Information Technology (0.48)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.71)

A Machine Learning based Robust Prediction Model for Real-life Mobile Phone Data

Sarker, Iqbal H.

Real-life mobile phone data may contain noisy instances, which is a fundamental issue for building a prediction model with many potential negative consequences. The complexity of the inferred model may increase, may arise overfitting problem, and thereby the overall prediction accuracy of the model may decrease. In this paper, we address these issues and present a robust prediction model for real-life mobile phone data of individual users, in order to improve the prediction accuracy of the model. In our robust model, we first effectively identify and eliminate the noisy instances from the training dataset by determining a dynamic noise threshold using naive Bayes classifier and laplace estimator, which may differ from user-to-user according to their unique behavioral patterns. After that, we employ the most popular rule-based machine learning classification technique, i.e., decision tree, on the noise-free quality dataset to build the prediction model. Experimental results on the real-life mobile phone datasets (e.g., phone call log) of individual mobile phone users, show the effectiveness of our robust model in terms of precision, recall and f-measure.

mobile phone data, prediction model, probability, (13 more...)

1902.07588

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Oceania > Australia > Victoria > Melbourne (0.14)
North America > United States > New York (0.05)
(21 more...)

Genre: Research Report (0.64)

Industry:

Telecommunications (0.93)
Information Technology (0.93)
Materials > Metals & Mining (0.67)

Technology:

Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (1.00)
(2 more...)

Yelp Food Identification via Image Feature Extraction and Classification

Sun, Fanbo, Gu, Zhixiang, Feng, Bo

Yelp has been one of the most popular local service search engine in US since 2004. It is powered by crowd-sourced text reviews and photo reviews. Restaurant customers and business owners upload photo images to Yelp, including reviewing or advertising either food, drinks, or inside and outside decorations. It is obviously not so effective that labels for food photos rely on human editors, which is an issue should be addressed by innovative machine learning approaches. In this paper, we present a simple but effective approach which can identify up to ten kinds of food via raw photos from the challenge dataset. We use 1) image pre-processing techniques, including filtering and image augmentation, 2) feature extraction via convolutional neural networks (CNN), and 3) three ways of classification algorithms. Then, we illustrate the classification accuracy by tuning parameters for augmentations, CNN, and classification. Our experimental results show this simple but effective approach to identify up to 10 food types from images.

accuracy, classification, dataset, (9 more...)

1902.05413

Country:

North America > United States > Indiana > Monroe County > Bloomington (0.16)
North America > United States > New York > New York County > New York City (0.05)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)

Genre: Research Report (0.84)

Industry: Consumer Products & Services (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.36)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.35)

KTBoost: Combined Kernel and Tree Boosting

Sigrist, Fabio

Boosting algorithms [Freund et al., 1996, Friedman et al., 2000, Mason et al., 2000, Friedman, 2001,Bühlmann and Hothorn, 2007] enjoy large popularity in both applied data analysis and machine learning research due to their high predictive accuracy observed on a wide range of data sets [Chen and Guestrin, 2016]. Boosting additively combines base learners by sequentially minimizing a risk functional. To the best of our knowledge, except for one reference [Hothorn et al., 2010], the large majority of boosting algorithms use only one type of functions as base learners. In this article, we relax this assumption by combining trees[Breiman et al., 1984] and reproducing kernel Hilbert space (RKHS) regression functions [Schölkopf and Smola, 2001, Berlinet and Thomas-Agnan, 2011] as base learners, and we empirically show that this combination of different base learners results in increased predictive accuracy compared to both only tree and kernel boosting. To date, regression trees are the most common choice of base learners for boosting in both applied data analysis and machine learning research. In particular, a lot of effort has been made in recent years to develop tree-based boosting methods that scale to large data [Chen and Guestrin, 2016, Ke et al., 2017, Ponomareva et al., 2017, Prokhorenkova et al., 2018]. On the other hand, kernel machines show state-of-the-art predictive accuracy for many data sets as they can, for instance, achieve near-optimal test error [Belkin et al., 2018b,a], and kernel classifiers parallel the behaviors of deep networks as noted in Zhang

base learner, kernel, tree and kernel, (14 more...)

1902.03999

Country:

Europe > Switzerland > Zug > Zug (0.04)
Europe > Italy > Apulia > Bari (0.04)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.92)

Leguia, Marc G., Levnajic, Zoran, Todorovski, Ljupco, Zenko, Bernard

Reconstructing dynamical networks via feature ranking

Empirical data on real complex systems are becoming increasingly available. Parallel to this is the need for new methods of reconstructing (inferring) the topology of networks from time-resolved observations of their node-dynamics. The methods based on physical insights often rely on strong assumptions about the properties and dynamics of the scrutinized network. Here, we use the insights from machine learning to design a new method of network reconstruction that essentially makes no such assumptions. Specifically, we interpret the available trajectories (data) as features, and use two independent feature ranking approaches -- Random forest and RReliefF -- to rank the importance of each node for predicting the value of each other node, which yields the reconstructed adjacency matrix. We show that our method is fairly robust to coupling strength, system size, trajectory length and noise. We also find that the reconstruction quality strongly depends on the dynamical regime.

coupling strength, node, reconstruction, (15 more...)

1902.03896

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Gultchin, Limor, Patterson, Genevieve, Baym, Nancy, Swinger, Nathaniel, Kalai, Adam Tauman

Humor in Word Embeddings: Cockamamie Gobbledegook for Nincompoops

Why is humor so difficult for machine learning and AI systems to understand? In light of recent studies in Psychology showing that individual words can be humorous Engelthaler & Hills (2017); Westbury et al. (2016), and in light of the fact that Word Embeddings (WEs) have been to shown to capture numerous properties of words (e.g., Mikolov et al., 2013), it is natural to study if and how WEs capture humor. First, we find that individual-word humor possesses many aspects of humor that have been discussed in general theories of humor, and that many of these aspects of humor are captured by WEs. To more deeply understand which features of humor WEs capture and to what extent, we draw on existing theories of humor to define a number of candidate features of word humor. Interestingly, many of these theories can be applied to word humor.

humor rating, participant, vector, (15 more...)

1902.02783

Country:

Asia > India (0.05)
North America > United States > New York (0.04)
North America > United States > Maryland (0.04)
(3 more...)

Genre: Research Report > Experimental Study (0.47)

Industry:

Health & Medicine > Therapeutic Area (0.49)
Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.46)

Crime Linkage Detection by Spatio-Temporal-Textual Point Processes

Zhu, Shixiang, Xie, Yao

Crimes emerge out of complex interactions of behaviors and situations; thus there are complex linkages between crime incidents. Solving the puzzle of crime linkage is a highly challenging task because we often only have limited information from indirect observations such as records, text descriptions, and associated time and locations. We propose a new modeling and learning framework for detecting linkage between crime events using \textit{spatio-temporal-textual} data, which are highly prevalent in the form of police reports. We capture the notion of \textit{modus operandi} (M.O.), by introducing a multivariate marked point process and handling the complex text jointly with the time and location. The model is able to discover the latent space that links the crime series. The model fitting is achieved by a computationally efficient Expectation-Maximization (EM) algorithm. In addition, we explicitly reduce the bias in the text documents in our algorithm. Our numerical results using real data from the Atlanta Police show that our method has competitive performance relative to the state-of-the-art. Our results, including variable selection, are highly interpretable and may bring insights into M.O. extraction.

crime event, crime series, keyword, (15 more...)

1902.0044

Country:

North America > United States > Georgia > Fulton County > Atlanta (0.34)
Asia > Middle East > Jordan (0.04)
North America > United States > Virginia > Arlington County > Arlington (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(3 more...)

Schubert, Erich, Zimek, Arthur

ELKI: A large open-source library for data analysis - ELKI Release 0.7.5 "Heidelberg"

arXiv.org Machine LearningFeb-10-2019

This paper documents the release of the ELKI data mining framework, version 0.7.5. ELKI is an open source (AGPLv3) data mining software written in Java. The focus of ELKI is research in algorithms, with an emphasis on unsupervised methods in cluster analysis and outlier detection. In order to achieve high performance and scalability, ELKI offers data index structures such as the R*-tree that can provide major performance gains. ELKI is designed to be easy to extend for researchers and students in this domain, and welcomes contributions of additional methods. ELKI aims at providing a large collection of highly parameterizable algorithms, in order to allow easy and fair evaluation and benchmarking of algorithms. We will first outline the motivation for this release, the plans for the future, and then give a brief overview over the new functionality in this version. We also include an appendix presenting an overview on the overall implemented functionality.

density-based clustering, survey article, upstream oil & gas, (22 more...)

1902.03616

Country:

North America > United States > Wisconsin (0.14)
North America > United States > Virginia (0.13)
North America > United States > New York (0.13)
(3 more...)

Genre: Research Report > Experimental Study (0.45)

Industry: Energy > Oil & Gas > Upstream (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
(4 more...)

Agrawal, Rakshit, de Alfaro, Luca, Ballarin, Gabriele, Moret, Stefano, Di Pierro, Massimo, Tacchini, Eugenio, Della Vedova, Marco L.

Identifying Fake News from Twitter Sharing Data: A Large-Scale Study

arXiv.org Machine LearningFeb-10-2019

Social networks offer a ready channel for fake and misleading news to spread and exert influence. This paper examines the performance of different reputation algorithms when applied to a large and statistically significant portion of the news that are spread via Twitter. Our main result is that simple crowdsourcing-based algorithms are able to identify a large portion of fake or misleading news, while incurring only very low false positive rates for mainstream websites. We believe that these algorithms can be used as the basis of practical, large-scale systems for indicating to consumers which news sites deserve careful scrutiny and skepticism.

algorithm, dataset, tweet, (16 more...)

1902.07207

Country:

North America > United States > California > Santa Cruz County > Santa Cruz (0.14)
Europe > Italy (0.04)
North America > United States > New York (0.04)
(5 more...)

Genre: Research Report > New Finding (0.47)

Industry: Media > News (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)