AITopics | Statistical Learning

Collaborating Authors

Statistical Learning

News Overviews Instructional Materials AI-Alerts Classics

Discovering Spammers in Social Networks

Zhu, Yin (Hong Kong University of Science and Technology (HKUST)) | Wang, Xiao (Renren Inc.) | Zhong, Erheng (Hong Kong University of Science and Technology (HKUST)) | Liu, Nathan N. (Hong Kong University of Science and Technology (HKUST)) | Li, He (Renren Inc.) | Yang, Qiang (Hong Kong University of Science and Technology (HKUST))

AAAI ConferencesJul-21-2012

As the popularity of the social media increases, as evidenced in Twitter, Facebook and China's Renren, spamming activities also picked up in numbers and variety. On social network sites, spammers often disguise themselves by creating fake accounts and hijacking normal users' accounts for personal gains. Different from the spammers in traditional systems such as SMS and email, spammers in social media behave like normal users and they continue to change their spamming strategies to fool anti spamming systems. However, due to the privacy and resource concerns, many social media websites cannot fully monitor all the contents of users, making many of the previous approaches, such as topology-based and content-classification-based methods, infeasible to use. In this paper, we propose a novel method for spammer detection in social networks that exploits both social activities as well as users' social relations in an innovative and highly scalable manner. The proposed method detects spammers following collective activities based on users' social actions and relations. We have empirically tested our method on data from Renren.com, which is the largest social network in China, and demonstrated that our new method can improve the detection performance significantly.

machine learning, spam filtering, spammer, (18 more...)

AAAI Conferences

Twenty-Sixth AAAI Conference on Artificial Intelligence

Country:

Asia > China > Hong Kong (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.34)

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Security & Privacy > Spam Filtering (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Predicting Disease Transmission from Geo-Tagged Micro-Blog Data

Sadilek, Adam (University of Rochester) | Kautz, Henry (University of Rochester) | Silenzio, Vincent (University of Rochester)

AAAI ConferencesJul-21-2012

These results far outperform alternative models. This work is an important step towards the development Recent work has demonstrated that micro-blogging data can of automated methods that identify disease vectors, trace the be used to predict a variety of phenomena, including movie transmission between concrete individuals, and ultimately box-office revenues (Asur and Huberman 2010), elections help us understand and predict the spread of infectious diseases (Tumasjan et al. 2010), and flu epidemics (Lampos, De Bie, with fine granularity. It provides a foundation for and Cristianini 2010). Most research to date has focused on research on fundamental questions of public health, such predicting aggregate properties of the population from the as: How does an epidemic on a population scale emerge activity of the bloggers. A different kind of problem one can from low-level interactions between people in the course pose, however, is to predict the behavior or state of particular of their everyday lives? Can we identify a potentially noncooperative individuals within the social network. For instance, one individual who is a vector of a dangerous disease, could try to predict whether a person will go to a movie or i.e., a "Typhoid Mary"? What is the interaction between vote for a particular candidate based on micro-blog data. The friendship, location, and co-location in the spread of individual's own data may or may not be accessible.

artificial intelligence, machine learning, tweet, (19 more...)

AAAI Conferences

Twenty-Sixth AAAI Conference on Artificial Intelligence

Country:

North America > United States > New York > Monroe County > Rochester (0.04)
North America > United States > Texas (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.68)

Add feedback

Multinomial Relation Prediction in Social Data: A Dimension Reduction Approach

Nori, Nozomi (University of Tokyo) | Bollegala, Danushka (University of Tokyo) | Kashima, Hisashi (University of Tokyo)

AAAI ConferencesJul-21-2012

The recent popularization of social web services has made them one of the primary uses of the World Wide Web. An important concept in social web services is social actions such as making connections and communicating with others and adding annotations to web resources. Predicting social actions would improve many fundamental web applications, such as recommendations and web searches. One remarkable characteristic of social actions is that they involve multiple and heterogeneous objects such as users, documents, keywords, and locations. However, the high-dimensional property of such multinomial relations poses one fundamental challenge, that is, predicting multinomial relations with only a limited amount of data. In this paper, we propose a new multinomial relation prediction method, which is robust to data sparsity. We transform each instance of a multinomial relation into a set of binomial relations between the objects and the multinomial relation of the involved objects. We then apply an extension of a low-dimensional embedding technique to these binomial relations, which results in a generalized eigenvalue problem guaranteeing global optimal solutions. We also incorporate attribute information as side information to address the “cold start” problem in multinomial relation prediction. Experiments with various real-world social web service datasets demonstrate that the proposed method is more robust against data sparseness as compared to several existing methods, which can only find sub-optimal solutions.

artificial intelligence, machine learning, relation, (20 more...)

AAAI Conferences

Twenty-Sixth AAAI Conference on Artificial Intelligence

Country:

Africa > Senegal > Kolda Region > Kolda (0.05)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Services (0.69)

Technology:

Information Technology > Communications > Web (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.57)
(2 more...)

Add feedback

Causal Inference on Time Series using Structural Equation Models

Peters, Jonas, Janzing, Dominik, Schölkopf, Bernhard

arXiv.org Machine LearningJul-21-2012

Causal inference uses observations to infer the causal structure of the data generating system. We study a class of functional models that we call Time Series Models with Independent Noise (TiMINo). These models require independent residual time series, whereas traditional methods like Granger causality exploit the variance of residuals. There are two main contributions: (1) Theoretical: By restricting the model class (e.g. to additive noise) we can provide a more general identifiability result than existing ones. This result incorporates lagged and instantaneous effects that can be nonlinear and do not need to be faithful, and non-instantaneous feedbacks between the time series. (2) Practical: If there are no feedback loops between time series, we propose an algorithm based on non-linear independence tests of time series. When the data are causally insufficient, or the data generating process does not satisfy the model assumptions, this algorithm may still give partial results, but mostly avoids incorrect answers. An extension to (non-instantaneous) feedbacks is possible, but not discussed. It outperforms existing methods on artificial and real data. Code can be provided upon request.

artificial intelligence, machine learning, time sery, (16 more...)

arXiv.org Machine Learning

1207.5136

Country: Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Hierarchical Clustering using Randomly Selected Similarities

Eriksson, Brian

arXiv.org Machine LearningJul-19-2012

The problem of hierarchical clustering items from pairwise similarities is found across various scientific disciplines, from biology to networking. Often, applications of clustering techniques are limited by the cost of obtaining similarities between pairs of items. While prior work has been developed to reconstruct clustering using a significantly reduced set of pairwise similarities via adaptive measurements, these techniques are only applicable when choice of similarities are available to the user. In this paper, we examine reconstructing hierarchical clustering under similarity observations at-random. We derive precise bounds which show that a significant fraction of the hierarchical clustering can be recovered using fewer than all the pairwise similarities. We find that the correct hierarchical clustering down to a constant fraction of the total number of items (i.e., clusters sized O(N)) can be found using only O(N log N) randomly selected pairwise similarities in expectation.

artificial intelligence, machine learning, similarity, (18 more...)

arXiv.org Machine Learning

1207.4748

Country: North America > United States (0.14)

Genre: Research Report (0.64)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Models of Disease Spectra

Rezek, Iead, Beckmann, Christian

arXiv.org Machine LearningJul-19-2012

Case vs control comparisons have been the classical approach to the study of neurological diseases. However, most patients will not fall cleanly into either group. Instead, clinicians will typically find patients that cannot be classified as having clearly progressed into the disease state. For those subjects, very little can be said about their brain function on the basis of analyses of group differences. To describe the intermediate brain function requires models that interpolate between the disease states. We have chosen Gaussian Processes (GP) regression to obtain a continuous spectrum of brain activation and to extract the unknown disease progression profile. Our models incorporate spatial distribution of measures of activation, e.g. the correlation of an fMRI trace with an input stimulus, and so constitute ultra-high multi-variate GP regressors. We applied GPs to model fMRI image phenotypes across Alzheimer's Disease (AD) behavioural measures, e.g. MMSE, ACE etc. scores, and obtained predictions at non-observed MMSE/ACE values. The overall model confirmed the known reduction in the spatial extent of activity in response to reading versus false-font stimulation. The predictive uncertainty indicated the worsening confidence intervals at behavioural scores distance from those used for GP training. Thus, the model indicated the type of patient (what behavioural score) that would need to included in the training data to improve models predictions.

artificial intelligence, machine learning, prediction, (17 more...)

arXiv.org Machine Learning

1207.4674

Country:

North America > United States (0.46)
North America > Canada (0.28)
Europe > United Kingdom > England (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.68)

Add feedback

Distributed Strongly Convex Optimization

Tsianos, Konstantinos I., Rabbat, Michael G.

arXiv.org Machine LearningJul-19-2012

A lot of effort has been invested into characterizing the convergence rates of gradient based algorithms for non-linear convex optimization. Recently, motivated by large datasets and problems in machine learning, the interest has shifted towards distributed optimization. In this work we present a distributed algorithm for strongly convex constrained optimization. Each node in a network of n computers converges to the optimum of a strongly convex, L-Lipchitz continuous, separable objective at a rate O(log (sqrt(n) T) / T) where T is the number of iterations. This rate is achieved in the online setting where the data is revealed one at a time to the nodes, and in the batch setting where each node has access to its full local dataset from the start. The same convergence rate is achieved in expectation when the subgradients used at each node are corrupted with additive zero-mean noise.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

1207.3031

Country: North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.50)

Add feedback

Stochastic optimization and sparse statistical recovery: An optimal algorithm for high dimensions

Agarwal, Alekh, Negahban, Sahand, Wainwright, Martin J.

arXiv.org Machine LearningJul-18-2012

Stochastic optimization algorithms have many desirable features for large-scale machine learning, and accordingly have been the focus of renewed and intensive study in the last several years (e.g., see the papers [26, 4, 10, 30] and references therein). The empirical efficiency of these methods is backed with strong theoretical guarantees, providing sharp bounds on their convergence rates. These convergence rates are known to depend on the structure of the underlying objective function, with faster rates being possible for objective functions that are smooth and/or (strongly) convex, or optima that have desirable features such as sparsity. More precisely, for an objective function that is strongly convex, stochastic gradient descent enjoys a convergence rate ranging from O(1/T), when features vectors are extremely sparse, to O(d/T) when feature vectors are dense [11, 19, 12]. Such results are of significant interest, because the strong convexity condition is satisfied for many common machine learning problems, including boosting, least squares regression, support vector machines and generalized linear models, among other examples. A complementary type of condition is that of sparsity, either exact or approximate, in the optimal solution.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

1207.4421

Country:

North America > United States > Massachusetts (0.28)
North America > United States > California (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.54)

Add feedback

Expectation-Propagation for Likelihood-Free Inference

Barthelmé, Simon, Chopin, Nicolas

arXiv.org Machine LearningJul-18-2012

Many models of interest in the natural and social sciences have no closed-form likelihood function, which means that they cannot be treated using the usual techniques of statistical inference. In the case where such models can be efficiently simulated, Bayesian inference is still possible thanks to the Approximate Bayesian Computation (ABC) algorithm. Although many refinements have been suggested, ABC inference is still far from routine. ABC is often excruciatingly slow due to very low acceptance rates. In addition, ABC requires introducing a vector of "summary statistics", the choice of which is relatively arbitrary, and often require some trial and error, making the whole process quite laborious for the user. We introduce in this work the EP-ABC algorithm, which is an adaptation to the likelihood-free context of the variational approximation algorithm known as Expectation Propagation (Minka, 2001). The main advantage of EP-ABC is that it is faster by a few orders of magnitude than standard algorithms, while producing an overall approximation error which is typically negligible. A second advantage of EP-ABC is that it replaces the usual global ABC constraint on the vector of summary statistics computed on the whole dataset, by n local constraints of the form that apply separately to each data-point. As a consequence, it is often possible to do away with summary statistics entirely. In that case, EP-ABC approximates directly the evidence (marginal likelihood) of the model. Comparisons are performed in three real-world applications which are typical of likelihood-free inference, including one application in neuroscience which is novel, and possibly too challenging for standard ABC techniques.

approximation, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

1107.5959

Country:

North America > United States (0.92)
Europe > United Kingdom > England (0.28)

Genre: Research Report > New Finding (0.45)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)

Add feedback

Ultrametric Model of Mind, II: Application to Text Content Analysis

Murtagh, Fionn

arXiv.org Artificial IntelligenceJul-16-2012

In a companion paper, Murtagh (2012), we discussed how Matte Blanco's work linked the unrepressed unconscious (in the human) to symmetric logic and thought processes. We showed how ultrametric topology provides a most useful representational and computational framework for this. Now we look at the extent to which we can find ultrametricity in text. We use coherent and meaningful collections of nearly 1000 texts to show how we can measure inherent ultrametricity. On the basis of our findings we hypothesize that inherent ultrametricty is a basis for further exploring unconscious thought processes.

data mining, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

1201.2719

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.66)

Industry: Transportation > Air (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Data Science > Data Mining (0.82)

Add feedback