SpamCop: A Spam Classification & Organization Program

AAAI Conferences

Patrick Pantel and Dekang Lin Department of Computer Science University of Manitoba Winnipeg, Manitoba Canada R3T 2N2 Abstract We present a simple, yet highly accurate, spam filtering program, called SpamCop, which is able to identify about 92% of the spams while misclassifying only about 1.16% of the nonspam emails. SpamCop treats an email message as a multiset of words and employs a na'fve Bayes algorithm to determine whether or not a message is likely to be a spam. Compared with keyword-spotting rules, the probabilistic approach taken in SpamCop not only offers high accuracy, but also overcomes the brittleness suffered by the keyword spotting approach. Introduction With the explosive growth of the Internet, so too comes the proliferation of spams. Spammers collect a plethora of email addresses without the consent of the owners of these addresses.


AIS-BN: An Adaptive Importance Sampling Algorithm for Evidential Reasoning in Large Bayesian Networks

Journal of Artificial Intelligence Research

Stochastic sampling algorithms, while an attractive alternative to exact algorithms in very large Bayesian network models, have been observed to perform poorly in evidential reasoning with extremely unlikely evidence. To address this problem, we propose an adaptive importance sampling algorithm, AIS-BN, that shows promising convergence rates even under extreme conditions and seems to outperform the existing sampling algorithms consistently. Three sources of this performance improvement are (1) two heuristics for initialization of the importance function that are based on the theoretical properties of importance sampling in finite-dimensional integrals and the structural advantages of Bayesian networks, (2) a smooth learning method for the importance function, and (3) a dynamic weighting function for combining samples from different stages of the algorithm. We tested the performance of the AIS-BN algorithm along with two state of the art general purpose sampling algorithms, likelihood weighting (Fung & Chang, 1989; Shachter & Peot, 1989) and self-importance sampling (Shachter & Peot, 1989). We used in our tests three large real Bayesian network models available to the scientific community: the CPCS network (Pradhan et al., 1994), the PathFinder network (Heckerman, Horvitz, & Nathwani, 1990), and the ANDES network (Conati, Gertner, VanLehn, & Druzdzel, 1997), with evidence as unlikely as 10^-41. While the AIS-BN algorithm always performed better than the other two algorithms, in the majority of the test cases it achieved orders of magnitude improvement in precision of the results. Improvement in speed given a desired precision is even more dramatic, although we are unable to report numerical results here, as the other algorithms almost never achieved the precision reached even by the first few iterations of the AIS-BN algorithm.


Nonparametric Bayesian inference of the microcanonical stochastic block model

arXiv.org Machine Learning

A principled approach to characterize the hidden structure of networks is to formulate generative models, and then infer their parameters from data. When the desired structure is composed of modules or "communities", a suitable choice for this task is the stochastic block model (SBM), where nodes are divided into groups, and the placement of edges is conditioned on the group memberships. Here, we present a nonparametric Bayesian method to infer the modular structure of empirical networks, including the number of modules and their hierarchical organization. We focus on a microcanonical variant of the SBM, where the structure is imposed via hard constraints, i.e. the generated networks are not allowed to violate the patterns imposed by the model. We show how this simple model variation allows simultaneously for two important improvements over more traditional inference approaches: 1. Deeper Bayesian hierarchies, with noninformative priors replaced by sequences of priors and hyperpriors, that not only remove limitations that seriously degrade the inference on large networks, but also reveal structures at multiple scales; 2. A very efficient inference algorithm that scales well not only for networks with a large number of nodes and edges, but also with an unlimited number of modules. We show also how this approach can be used to sample modular hierarchies from the posterior distribution, as well as to perform model selection. We discuss and analyze the differences between sampling from the posterior and simply finding the single parameter estimate that maximizes it. Furthermore, we expose a direct equivalence between our microcanonical approach and alternative derivations based on the canonical SBM.


Variational Bayes Approximations for Clustering via Mixtures of Normal Inverse Gaussian Distributions

arXiv.org Machine Learning

Parameter estimation for model-based clustering using a finite mixture of normal inverse Gaussian (NIG) distributions is achieved through variational Bayes approximations. Univariate NIG mixtures and multivariate NIG mixtures are considered. The use of variational Bayes approximations here is a substantial departure from the traditional EM approach and alleviates some of the associated computational complexities and uncertainties. Our variational algorithm is applied to simulated and real data. The paper concludes with discussion and suggestions for future work.


Elon Musk, DeepMind and AI researchers promise not to develop robot killing machines

The Independent

Elon Musk and many of the world's most respected artificial intelligence researchers have committed not to build autonomous killer robots. The public pledge not to make any "lethal autonomous weapons" comes amid increasing concern about how machine learning and AI will be used on the battlefields of the future. The signatories to the new pledge – which includes the founders of DeepMind, a founder of Skype, and leading academics from across the industry – promise that they will not allow the technology they create to be used to help create killing machines. The I.F.O. is fuelled by eight electric engines, which is able to push the flying object to an estimated top speed of about 120mph. The giant human-like robot bears a striking resemblance to the military robots starring in the movie'Avatar' and is claimed as a world first by its creators from a South Korean robotic company Waseda University's saxophonist robot WAS-5, developed by professor Atsuo Takanishi and Kaptain Rock playing one string light saber guitar perform jam session A man looks at an exhibit entitled'Mimus' a giant industrial robot which has been reprogrammed to interact with humans during a photocall at the new Design Museum in South Kensington, London Electrification Guru Dr. Wolfgang Ziebart talks about the electric Jaguar I-PACE concept SUV before it was unveiled before the Los Angeles Auto Show in Los Angeles, California, U.S The Jaguar I-PACE Concept car is the start of a new era for Jaguar.