Europe
A Suffix Tree Approach to Email Filtering
Pampapathi, Rajesh M., Mirkin, Boris, Levene, Mark
Just as email traffic has increased over the years since its in ception, so has the proportion that is unsolicited; some estimations have plac ed the proportion as high as 60%, and the average cost of this to business at arou nd $2000 per year, per employee (see [29] for a range of numbers and statis tics on spam). Unsolicited emails - commonly know as spam - have thereby become a daily feature of every email user's inbox; and regardless of advan ces in email filtering, spam continues to be a problem in a similar way to comp uter viruses which constantly reemerge in new guises. This leaves the res earch community with the task of continually investigating new approac hes to sorting the welcome emails (known as ham) from the unwelcome spam. W e present just such an approach to email classification and fi ltering based on a well studied data structure, the suffix tree (see [1 6] for a brief introduction). The approach is similar to many existing one s, in that it uses training examples to construct a model or profile of the class and its features, then uses this to make decisions as to the class of new example s; but it differs in the depth and extent of the anaysis. For a good overview of a number of text classification methods, see [26, 1, 31]. Using a suffix tree, we are able to compare not only single word s, as in most current approaches, but substrings of an arbitrary len gth.
On Self-Regulated Swarms, Societal Memory, Speed and Dynamics
Ramos, Vitorino, Fernandes, Carlos, Rosa, Agostinho C.
Wasps, bees, ants and termites all make effective use of their environment and resources by displaying collective "swarm" intelligence. Termite colonies - for instance - build nests with a complexity far beyond the comprehension of the individual termite, while ant colonies dynamically allocate labor to various vital tasks such as foraging or defense without any central decision-making ability. Recent research suggests that microbial life can be even richer: highly social, intricately networked, and teeming with interactions, as found in bacteria. What strikes from these observations is that both ant colonies and bacteria have similar natural mechanisms based on Stigmergy and Self-Organization in order to emerge coherent and sophisticated patterns of global foraging behavior. Keeping in mind the above characteristics we propose a Self-Regulated Swarm (SRS) algorithm which hybridizes the advantageous characteristics of Swarm Intelligence as the emergence of a societal environmental memory or cognitive map via collective pheromone laying in the landscape (properly balancing the exploration/exploitation nature of our dynamic search strategy), with a simple Evolutionary mechanism that trough a direct reproduction procedure linked to local environmental features is able to self-regulate the above exploratory swarm population, speeding it up globally.
Hiding Satisfying Assignments: Two are Better than One
Achlioptas, D., Jia, H., Moore, C.
The evaluation of incomplete satisfiability solvers depends critically on the availability of hard satisfiable instances. A plausible source of such instances consists of random k-SAT formulas whose clauses are chosen uniformly from among all clauses satisfying some randomly chosen truth assignment A. Unfortunately, instances generated in this manner tend to be relatively easy and can be solved efficiently by practical heuristics. Roughly speaking, for a number of different algorithms, A acts as a stronger and stronger attractor as the formula's density increases. Motivated by recent results on the geometry of the space of satisfying truth assignments of random k-SAT and NAE-k-SAT formulas, we introduce a simple twist on this basic model, which appears to dramatically increase its hardness. Namely, in addition to forbidding the clauses violated by the hidden assignment A, we also forbid the clauses violated by its complement, so that both A and compliment of A are satisfying. It appears that under this "symmetrization" the effects of the two attractors largely cancel out, making it much harder for algorithms to find any truth assignment. We give theoretical and experimental evidence supporting this assertion.
Binary Encodings of Non-binary Constraint Satisfaction Problems: Algorithms and Experimental Results
A non-binary Constraint Satisfaction Problem (CSP) can be solved directly using extended versions of binary techniques. Alternatively, the non-binary problem can be translated into an equivalent binary one. In this case, it is generally accepted that the translated problem can be solved by applying well-established techniques for binary CSPs. In this paper we evaluate the applicability of the latter approach. We demonstrate that the use of standard techniques for binary CSPs in the encodings of non-binary problems is problematic and results in models that are very rarely competitive with the non-binary representation. To overcome this, we propose specialized arc consistency and search algorithms for binary encodings, and we evaluate them theoretically and empirically. We consider three binary representations; the hidden variable encoding, the dual encoding, and the double encoding. Theoretical and empirical results show that, for certain classes of non-binary constraints, binary encodings are a competitive option, and in many cases, a better one than the non-binary representation.
Reasoning about Action: An Argumentation - Theoretic Approach
We present a uniform non-monotonic solution to the problems of reasoning about action on the basis of an argumentation-theoretic approach. Our theory is provably correct relative to a sensible minimisation policy introduced on top of a temporal propositional logic. Sophisticated problem domains can be formalised in our framework. As much attention of researchers in the field has been paid to the traditional and basic problems in reasoning about actions such as the frame, the qualification and the ramification problems, approaches to these problems within our formalisation lie at heart of the expositions presented in this paper.
Pure Nash Equilibria: Hard and Easy Games
Gottlob, G., Greco, G., Scarcello, F.
We investigate complexity issues related to pure Nash equilibria of strategic games. We show that, even in very restrictive settings, determining whether a game has a pure Nash Equilibrium is NP-hard, while deciding whether a game has a strong Nash equilibrium is SigmaP2-complete. We then study practically relevant restrictions that lower the complexity. In particular, we are interested in quantitative and qualitative restrictions of the way each player's payoff depends on moves of other players. We say that a game has small neighborhood if the utility function for each player depends only on (the actions of) a logarithmically small number of other players. The dependency structure of a game G can be expressed by a graph DG(G) or by a hypergraph H(G). By relating Nash equilibrium problems to constraint satisfaction problems (CSPs), we show that if G has small neighborhood and if H(G) has bounded hypertree width (or if DG(G) has bounded treewidth), then finding pure Nash and Pareto equilibria is feasible in polynomial time. If the game is graphical, then these problems are LOGCFL-complete and thus in the class NC2 of highly parallelizable problems.
MAP estimation via agreement on (hyper)trees: Message-passing and linear programming
Wainwright, Martin J., Jaakkola, Tommi S., Willsky, Alan S.
We develop and analyze methods for computing provably optimal {\em maximum a posteriori} (MAP) configurations for a subclass of Markov random fields defined on graphs with cycles. By decomposing the original distribution into a convex combination of tree-structured distributions, we obtain an upper bound on the optimal value of the original problem (i.e., the log probability of the MAP assignment) in terms of the combined optimal values of the tree problems. We prove that this upper bound is tight if and only if all the tree distributions share an optimal configuration in common. An important implication is that any such shared configuration must also be a MAP configuration for the original distribution. Next we develop two approaches to attempting to obtain tight upper bounds: (a) a {\em tree-relaxed linear program} (LP), which is derived from the Lagrangian dual of the upper bounds; and (b) a {\em tree-reweighted max-product message-passing algorithm} that is related to but distinct from the max-product algorithm. In this way, we establish a connection between a certain LP relaxation of the mode-finding problem, and a reweighted form of the max-product (min-sum) message-passing algorithm.