Goto

Collaborating Authors

 Performance Analysis


Document Classification for Focused Topics

AAAI Conferences

Feature extraction is one of the fundamental challenges in improving the accuracy of document classification. While there has been a large body of research literature on document classification, most existing approaches either do not have a high classification accuracy or require massive training sets. In this paper, we propose a simple feature extraction algorithm that can achieve high document classification accuracy in the context of development-centric topics. Our feature extraction algorithm exploits two distinct aspects in development-centric topics: most of these topics tend to be very focused (unlike semantically hard classification topics such as chemistry or banks) due to local language and cultural underpinnings in these topics, the authentic pages tend to use several region specific features. Our algorithm uses a combination of popularity and rarity as two separate metrics to extract features that describe a topic. Given a topic, our output feature set comprises of: (i) a list of popular keywords closely related to the topic; (ii) a list of rare keywords closely related to the topic. We show that a simple joint classifier based on these two feature sets can achieve high classification accuracy while each feature sub-set in itself is insufficient. We have tested our algorithm across a wide range of development-centric topics.


Using Linked Data to Build Open, Collaborative Recommender Systems

AAAI Conferences

While recommender systems can greatly enhance the user experience, the entry barriers in terms of data acquisition are very high, making it hard for new service providers to compete with existing recommendation services. This paper proposes to build open recommender systems which can utilise Linked Data to mitigate the new-user, new-item and sparsity problems of collaborative recommender systems. We describe how to aggregate data about object centred sociality from different sources and how to process it for collaborative recommendation. To demonstrate the validity of our approach, we augment the data from a closed collaborative music recommender system with Linked Data, and significantly improve its precision and recall.


Privacy Classification Systems: Recall and Precision Optimization as Enabler of Trusted Information Sharing

AAAI Conferences

Information is shared more extensively when a user can confidently classify all his information according to its desired degree of disclosure prior to transmission. While high quality classification is relatively straightforward for structured data (e.g., credit card numbers, cookies, "confidential" reports), most consumer and business information is unstructured (e.g., Facebook posts, corporate email). All current technological approaches to classifying unstructured information seek to identify only that information having the desired characteristics (i.e., to maximize the percentage of filtered content that requires privacy protection). Such focus on boosting classifier Precision (P) causes technology solutions to miss sensitive information [i.e., Recall (R) is compromised for the sake of P improvement]. Such privacy protection will fall short of user expectations no matter how "intelligent" the technology may be in extending beyond keywords to user meaning. Systems must simultaneously optimize both P and R in order to protect privacy sufficiently to encourage the free flow of personal and corporate information. This requires a socio-technical methodology wherein the user is intimately involved in iterative privacy improvement. The approach is a general one in which the classifier can be modified as necessary at any time when sampling measures of P and R deem it appropriate. Matching the ever-evolving user privacy model to the technology solution (e.g., active machine learning) affords a technique for building and maintaining user trust.


The Web as a Privacy Lab

AAAI Conferences

The privacy dangers of data proliferation on the Web are well-known. Information on the Web has facilitated the deanonymization of anonymous bloggers, the de-sanitization of government records and the identification of individuals based on search engine queries. What has received less attention is Web-mining in support of privacy. In this position paper we argue that the very ability ofWeb data to breach privacy demonstrates its value as a laboratory for the detection of privacy breaches before they happen. In addition, we argue that privacy-invasive services may become privacy-respecting by mining publicly available Web data, with little decrease in performance and efficiency.


Whoโ€™s Calling? Demographics of Mobile Phone Use in Rwanda

AAAI Conferences

But whereas in the general Rwandan populace males tend Despite the increasing ubiquity of mobile phones in the developing to be much better educated (76.3% of males are literate, but world, remarkably little is known about the structure only 64.7% of females), among mobile phone users it is the and demographics of the mobile phone market. While a women who achieve higher levels of education: the median few qualitative studies have detailed social norms of phone woman completes secondary school, while the median man use in specific communities (Donner 2007; Burrell 2009), does not (t 4.79). Table 1 shows a few statistics on asset and a handful of quantitative researchers have begun to analyze ownership, with associated sampling error.


Mining Road Traffic Accident Data to Improve Safety: Role of Road-Related Factors on Accident Severity in Ethiopia

AAAI Conferences

Road traffic accidents (RTAs) are a major public health concern, resulting in an estimated 1.2 million deaths and 50 million injuries worldwide each year. In the developing world, RTAs are among the leading cause of death and injury; Ethiopia in particular experiences the highest rate of such accidents. Thus, methods to reduce accident severity are of great interest to traffic agencies and the public at large. In this work, we applied data mining technologies to link recorded road characteristics to accident severity in Ethiopia, and developed a set of rules that could be used by the Ethiopian Traffic Agency to improve safety.


Predicting Positive and Negative Links in Online Social Networks

arXiv.org Artificial Intelligence

We study online social networks in which relationships can be either positive (indicating relations such as friendship) or negative (indicating relations such as opposition or antagonism). Such a mix of positive and negative links arise in a variety of online settings; we study datasets from Epinions, Slashdot and Wikipedia. We find that the signs of links in the underlying social networks can be predicted with high accuracy, using models that generalize across this diverse range of sites. These models provide insight into some of the fundamental principles that drive the formation of signed links in networks, shedding light on theories of balance and status from social psychology; they also suggest social computing applications by which the attitude of one user toward another can be estimated from evidence provided by their relationships with other members of the surrounding social network.


libtissue - implementing innate immunity

arXiv.org Artificial Intelligence

In a previous paper the authors argued the case for incorporating ideas from innate immunity into articficial immune systems (AISs) and presented an outline for a conceptual framework for such systems. A number of key general properties observed in the biological innate and adaptive immune systems were hughlighted, and how such properties might be instantiated in artificial systems was discussed in detail. The next logical step is to take these ideas and build a software system with which AISs with these properties can be implemented and experimentally evaluated. This paper reports on the results of that step - the libtissue system.


Information Fusion in the Immune System

arXiv.org Artificial Intelligence

The field of artificial immune systems (AISs) is an emerging biologically-inspired method which builds systems based on algorithms inspired by the biological immune system. AIS research has provided a number of general purpose techniques and algorithms which have successfully been applied to a range of optimisation, classification and data mining problems. As with evolutionary algorithms and neural networks, AISs could also provide useful solutions to optimisation and classification problems in multi-sensor data fusion. More interestingly though perhaps, recent research in AISs [14,15,35,36] shows the importance of multilevel information in the construction of AISs. New models for AISs are emerging that are inspired by research in immunology into the role of the innate immune system in overall immune system dynamics. These AISs, which incorporate mechanisms inspired by both the innate and adaptive immune systems, are called second generation AISs. They stand in contrast to first generation AISs, which are inspired by adaptive immune system mechanisms only. One of the consequences of incorporating innate and adaptive mechanisms, as well as one of the defining characteristics of second generation AISs, is the need for a multilevel problem representation, and a multi-le- vel interaction of the components of the AIS with the problem [36]. As systems that integrate multilevel information sources, second generation AISs share much in common with multi-sensor data fusion systems.


Integrating Innate and Adaptive Immunity for Intrusion Detection

arXiv.org Artificial Intelligence

Network Intrusion Detection Systems (NDIS) monitor a network with the aim of discerning malicious from benign activity on that network. While a wide range of approaches have met varying levels of success, most IDS's rely on having access to a database of known attack signatures which are written by security experts. Nowadays, in order to solve problems with false positive alters, correlation algorithms are used to add additional structure to sequences of IDS alerts. However, such techniques are of no help in discovering novel attacks or variations of known attacks, something the human immune system (HIS) is capable of doing in its own specialised domain. This paper presents a novel immune algorithm for application to an intrusion detection problem. The goal is to discover packets containing novel variations of attacks covered by an existing signature base.