AITopics | Europe

Collaborating Authors

Europe

Hilbert space embeddings and metrics on probability measures

Sriperumbudur, Bharath K., Gretton, Arthur, Fukumizu, Kenji, Schölkopf, Bernhard, Lanckriet, Gert R. G.

arXiv.org Machine LearningJan-29-2010

A Hilbert space embedding for probability measures has recently been proposed, with applications including dimensionality reduction, homogeneity testing, and independence testing. This embedding represents any probability measure as a mean element in a reproducing kernel Hilbert space (RKHS). A pseudometric on the space of probability measures can be defined as the distance between distribution embeddings: we denote this as $\gamma_k$, indexed by the kernel function $k$ that defines the inner product in the RKHS. We present three theoretical properties of $\gamma_k$. First, we consider the question of determining the conditions on the kernel $k$ for which $\gamma_k$ is a metric: such $k$ are denoted {\em characteristic kernels}. Unlike pseudometrics, a metric is zero only when two distributions coincide, thus ensuring the RKHS embedding maps all distributions uniquely (i.e., the embedding is injective). While previously published conditions may apply only in restricted circumstances (e.g. on compact domains), and are difficult to check, our conditions are straightforward and intuitive: bounded continuous strictly positive definite kernels are characteristic. Alternatively, if a bounded continuous kernel is translation-invariant on $\bb{R}^d$, then it is characteristic if and only if the support of its Fourier transform is the entire $\bb{R}^d$. Second, we show that there exist distinct distributions that are arbitrarily close in $\gamma_k$. Third, to understand the nature of the topology induced by $\gamma_k$, we relate $\gamma_k$ to other popular metrics on probability measures, and present conditions on the kernel $k$ under which $\gamma_k$ metrizes the weak topology.

characterization, kernel, probability measure, (14 more...)

arXiv.org Machine Learning

0907.5309

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
North America > United States > New York (0.04)
(11 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Janus: Automatic Ontology Builder from XSD Files

Bedini, Ivan, Nguyen, Benjamin, Gardarin, Georges

arXiv.org Artificial IntelligenceJan-27-2010

The construction of a reference ontology for a large domain still remains an hard human task. The process is sometimes assisted by software tools that facilitate the information extraction from a textual corpus. Despite of the great use of XML Schema files on the internet and especially in the B2B domain, tools that offer a complete semantic analysis of XML schemas are really rare. In this paper we introduce Janus, a tool for automatically building a reference knowledge base starting from XML Schema files. Janus also provides different useful views to simplify B2B application integration.

artificial intelligence, information management, natural language, (18 more...)

arXiv.org Artificial Intelligence

1001.4892

Country: Europe > France (0.18)

Genre: Research Report (0.40)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Text Relatedness Based on a Word Thesaurus

Tsatsaronis, G., Varlamis, I., Vazirgiannis, M.

Journal of Artificial Intelligence ResearchJan-25-2010

The computation of relatedness between two fragments of text in an automated manner requires taking into account a wide range of factors pertaining to the meaning the two fragments convey, and the pairwise relations between their words. Without doubt, a measure of relatedness between text segments must take into account both the lexical and the semantic relatedness between words. Such a measure that captures well both aspects of text relatedness may help in many tasks, such as text retrieval, classification and clustering. In this paper we present a new approach for measuring the semantic relatedness between words based on their implicit semantic links. The approach exploits only a word thesaurus in order to devise implicit semantic links between words. Based on this approach, we introduce Omiotis, a new measure of semantic relatedness between texts which capitalizes on the word-to-word semantic relatedness measure (SR) and extends it to measure the relatedness between texts. We gradually validate our method: we first evaluate the performance of the semantic relatedness measure between individual words, covering word-to-word similarity and relatedness, synonym identification and word analogy; then, we proceed with evaluating the performance of our method in measuring text-to-text semantic relatedness in two tasks, namely sentence-to-sentence similarity and paraphrase recognition. Experimental evaluation shows that the proposed method outperforms every lexicon-based method of semantic relatedness in the selected tasks and the used data sets, and competes well against corpus-based and hybrid approaches.

Journal of Artificial Intelligence Research

doi: 10.1613/jair.2880

AI Access Foundation

10636

Journal of Artificial Intelligence Research

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Greece (0.04)
North America > United States > New York (0.04)
(3 more...)

Genre: Research Report > New Finding (0.92)

Industry: Education > Health & Safety > School Nutrition (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Add feedback

Classifying Network Data with Deep Kernel Machines

Tang, Xiao, Zhu, Mu

arXiv.org Machine LearningJan-22-2010

Inspired by a growing interest in analyzing network data, we study the problem of node classification on graphs, focusing on approaches based on kernel machines. Conventionally, kernel machines are linear classifiers in the implicit feature space. We argue that linear classification in the feature space of kernels commonly used for graphs is often not enough to produce good results. When this is the case, one naturally considers nonlinear classifiers in the feature space. We show that repeating this process produces something we call "deep kernel machines." We provide some examples where deep kernel machines can make a big difference in classification performance, and point out some connections to various recent literature on deep architectures in artificial intelligence and machine learning.

artificial intelligence, kernel machine, machine learning, (17 more...)

arXiv.org Machine Learning

1001.4019

Country:

North America > Canada (0.28)
North America > United States (0.28)
Europe > United Kingdom > England (0.14)

Genre: Research Report (0.64)

Industry:

Energy (0.69)
Law (0.68)
Telecommunications > Networks (0.61)
Information Technology > Networks (0.61)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

A parameter-free hedging algorithm

Chaudhuri, Kamalika, Freund, Yoav, Hsu, Daniel

arXiv.org Artificial IntelligenceJan-18-2010

We study the problem of decision-theoretic online learning (DTOL). Motivated by practical applications, we focus on DTOL when the number of actions is very large. Previous algorithms for learning in this framework have a tunable learning rate parameter, and a barrier to using online-learning in practical applications is that it is not understood how to set this parameter optimally, particularly when the number of actions is large. In this paper, we offer a clean solution by proposing a novel and completely parameter-free algorithm for DTOL. We introduce a new notion of regret, which is more natural for applications with a large number of actions. We show that our algorithm achieves good performance with respect to this new notion of regret; in addition, it also achieves performance close to that of the best bounds achieved by previous algorithms with optimally-tuned parameters, according to previous notions of regret.

algorithm, best action, lnn, (15 more...)

arXiv.org Artificial Intelligence

0903.2851

Country:

North America > United States > California > San Diego County > San Diego (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Poland (0.04)

Genre: Research Report (0.40)

Industry: Education (0.55)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Detecting Botnets Through Log Correlation

Al-Hammadi, Yousof, Aickelin, Uwe

arXiv.org Artificial IntelligenceJan-15-2010

Abstract-- Botnets, which consist of thousands of compromised machines, can cause significant threats to other systems by launching Distributed Denial of Service (DDoS) attacks, keylogging, and backdoors. In response to these threats, new effective techniques are needed to detect the presence of botnets. In this paper, we have used an interception technique to monitor Windows Application Programming Interface (API) functions calls made by communication applications and store these calls with their arguments in log files. Our algorithm detects botnets based on monitoring abnormal activity by correlating the changes in log file sizes from different hosts. Recently, an explosive growth of coordinated attacks has been noticed [1][6].

artificial intelligence, log file size, machine learning, (13 more...)

arXiv.org Artificial Intelligence

1001.2665

Country:

Europe > United Kingdom (0.15)
North America > United States (0.14)
Europe > France (0.14)

Genre: Research Report (0.66)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.99)

Add feedback

Dendritic Cells for Real-Time Anomaly Detection

Greensmith, Julie, Aickelin, Uwe

arXiv.org Artificial IntelligenceJan-14-2010

Intrusion detection systems (IDS) are a method used in computer security for detection of unauthorised use of machines. The Danger Project proposed by Aickelin et al. (2003) aims to improve on results previously seen with artificial immune systems (AIS) by applying concepts from the Danger Theory to IDS. Danger theory proposes that exposure to danger signals or pathogenic bacteria causes the activation of the immune system, not pattern matching of antigen. The cells responsible for combining these various signals are Dendritic cells. We use the'signals plus context' processing power of Dendritic Cells (DCs) to perform anomaly detection.

antigen, artificial intelligence, data mining, (16 more...)

arXiv.org Artificial Intelligence

1001.2405

Country: Europe > United Kingdom > England > Nottinghamshire > Nottingham (0.15)

Genre: Research Report > New Finding (0.48)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.63)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence (1.00)

Add feedback

Dendritic Cells for Anomaly Detection

Greensmith, Julie, Twycross, Jamie, Aickelin, Uwe

arXiv.org Artificial IntelligenceJan-14-2010

Artificial immune systems, more specifically the negative selection algorithm, have previously been applied to intrusion detection. The aim of this research is to develop an intrusion detection system based on a novel concept in immunology, the Danger Theory. Dendritic Cells (DCs) are antigen presenting cells and key to the activation of the human signals from the host tissue and correlate these signals with proteins know as antigens. In algorithmic terms, individual DCs perform multi-sensor data fusion based on time-windows. The whole population of DCs asynchronously correlates the fused signals with a secondary data stream. The behaviour of human DCs is abstracted to form the DC Algorithm (DCA), which is implemented using an immune inspired framework, libtissue. This system is used to detect context switching for a basic machine learning dataset and to detect outgoing portscans in real-time. Experimental results show a significant difference between an outgoing portscan and normal traffic.

data mining, evolutionary algorithm, machine learning, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/CEC.2006.1688374

1001.2411

Country:

North America > United States (0.93)
Europe (0.68)

Genre:

Research Report > Experimental Study (0.66)
Research Report > New Finding (0.48)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.68)

Add feedback

Comparing Simulation Output Accuracy of Discrete Event and Agent Based Models: A Quantitive Approach

Majid, Mazlina Abdul, Aickelin, Uwe, Siebers, Peer-Olaf

arXiv.org Artificial IntelligenceJan-13-2010

In our research we investigate the output accuracy of discrete event simulation models and agent based simulation models when studying human centric complex systems. In this paper we focus on human reactive behaviour as it is possible in both modelling approaches to implement human reactive behaviour in the model by using standard methods. As a case study we have chosen the retail sector, and here in particular the operations of the fitting room in the women wear department of a large UK department store. In our case study we looked at ways of determining the efficiency of implementing new management policies for the fitting room operation through modelling the reactive behaviour of staff and customers of the department. First, we have carried out a validation experiment in which we compared the results from our models to the performance of the real system. This experiment also allowed us to establish differences in output accuracy between the two modelling methids. In a second step a multi-scenario experiment was carried out to study the behaviour of the models when they are used for the purpose of operational improvement. Overall we have found that for our case study example both discrete event simulation and agent based simulation have the same potential to support the investigation into the efficiency of implementing new management policies.

artificial intelligence, customer, modeling & simulation, (16 more...)

arXiv.org Artificial Intelligence

1001.217

Country:

North America > United States (0.93)
Europe > United Kingdom > England > Nottinghamshire > Nottingham (0.15)

Genre: Research Report > New Finding (1.00)

Industry: Retail (0.87)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

Cooperative Automated Worm Response and Detection Immune Algorithm

Kim, Jungwon, Wilson, William, Aickelin, Uwe, McLeod, Julie

arXiv.org Artificial IntelligenceJan-13-2010

The role of T-cells within the immune system is to confirm and assess anomalous situations and then either respond to or tolerate the source of the effect. To illustrate how these mechanisms can be harnessed to solve real-world problems, we present the blueprint of a T-cell inspired algorithm for computer security worm detection. We show how the three central T-cell processes, namely T-cell maturation, differentiation and proliferation, naturally map into this domain and further illustrate how such an algorithm fits into a complete immune inspired computer security system and framework.

antigen, artificial intelligence, information, (17 more...)

arXiv.org Artificial Intelligence

1001.2155

Country: Europe > United Kingdom > England (0.68)

Genre: Research Report (0.40)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.90)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence (1.00)
Information Technology > Communications > Networks (0.93)

Add feedback