Asia
Cost-sensitive C4.5 with post-pruning and competition
Xu, Zilong, Min, Fan, Zhu, William
Decision tree is an effective classification approach in data mining and machine learning. In applications, test costs and misclassification costs should be considered while inducing decision trees. Recently, some cost-sensitive learning algorithms based on ID3 such as CS-ID3, IDX, \lambda-ID3 have been proposed to deal with the issue. These algorithms deal with only symbolic data. In this paper, we develop a decision tree algorithm inspired by C4.5 for numeric data. There are two major issues for our algorithm. First, we develop the test cost weighted information gain ratio as the heuristic information. According to this heuristic information, our algorithm is to pick the attribute that provides more gain ratio and costs less for each selection. Second, we design a post-pruning strategy through considering the tradeoff between test costs and misclassification costs of the generated decision tree. In this way, the total cost is reduced. Experimental results indicate that (1) our algorithm is stable and effective; (2) the post-pruning technique reduces the total cost significantly; (3) the competition strategy is effective to obtain a cost-sensitive decision tree with low cost.
A Comparative Study of Gaussian Mixture Model and Radial Basis Function for Voice Recognition
A comparative study of the application of Gaussian Mixture Model (GMM) and Radial Basis Function (RBF) in biometric recognition of voice has been carried out and presented. The application of machine learning techniques to biometric authentication and recognition problems has gained a widespread acceptance. In this research, a GMM model was trained, using Expectation Maximization (EM) algorithm, on a dataset containing 10 classes of vowels and the model was used to predict the appropriate classes using a validation dataset. For experimental validity, the model was compared to the performance of two different versions of RBF model using the same learning and validation datasets. The results showed very close recognition accuracy between the GMM and the standard RBF model, but with GMM performing better than the standard RBF by less than 1% and the two models outperformed similar models reported in literature. The DTREG version of RBF outperformed the other two models by producing 94.8% recognition accuracy. In terms of recognition time, the standard RBF was found to be the fastest among the three models.
LAGE: A Java Framework to reconstruct Gene Regulatory Networks from Large-Scale Continues Expression Data
Lu, Yang, Wang, Mengying, Zhu, Kenny Q., Yuan, Bo
LAGE is a systematic framework developed in Java. The motivation of LAGE is to provide a scalable and parallel solution to reconstruct Gene Regulatory Networks (GRNs) from continuous gene expression data for very large amount of genes. The basic idea of our framework is motivated by the philosophy of divideand-conquer. Specifically, LAGE recursively partitions genes into multiple overlapping communities with much smaller sizes, learns intra-community GRNs respectively before merge them altogether. Besides, the complete information of overlapping communities serves as the byproduct, which could be used to mine meaningful functional modules in biological networks.
Secured Wireless Communication using Fuzzy Logic based High Speed Public-Key Cryptography (FLHSPKC)
Sarkar, Arindam, Mandal, J. K.
In this paper secured wireless communication using fuzzy logic based high speed public key cryptography (FLHSPKC) has been proposed by satisfying the major issues likes computational safety, power management and restricted usage of memory in wireless communication. Wireless Sensor Network (WSN) has several major constraints likes inadequate source of energy, restricted computational potentiality and limited memory. Though conventional Elliptic Curve Cryptography (ECC) which is a sort of public key cryptography used in wireless communication provides equivalent level of security like other existing public key algorithm using smaller parameters than other but this traditional ECC does not take care of all these major limitations in WSN. In conventional ECC consider Elliptic curve point p, an arbitrary integer k and modulus m, ECC carry out scalar multiplication kP mod m, which takes about 80% of key computation time on WSN. In this paper proposed FLHSPKC scheme provides some novel strategy including novel soft computing based strategy to speed up scalar multiplication in conventional ECC and which in turn takes shorter computational time and also satisfies power consumption restraint, limited usage of memory without hampering the security level. Performance analysis of the different strategies under FLHSPKC scheme and comparison study with existing conventional ECC methods has been done.
Modeling the Effects of Transient Populations on Epidemics
Parikh, Nidhi Kiranbhai (Virginia Tech) | Shirole, Sushrut (Virginia Tech) | Swarup, Samarth (Virginia Tech)
A large number of transients visit big cities on any given day and they visit crowded areas and come in contact with many people. However, epidemiological studies have not paid much attention to the role of this subpopulation in disease spread. In the present work, we extend a synthetic population model of Washington DC metro area to include leisure and business travelers. This approach involves combining Census data, activity surveys, and geospatial data to build a detailed minute-by-minute simulation of population interaction. We simulate a flu-like disease outbreak both with and without the transient population to evaluate the effect of the transients on outbreak size and peak day in terms of number of residents infected. Results show that there are significantly more infections when transients are considered. We also evaluate interventions like closing big museums and encouraging use of hand sanitizers at those musuems. Surprisingly closing musuems does not result in a significant difference in the epidemic. However, we find that if the use of hand sanitizer reduces the infectivity and suceptibility to 80% or 60% of the original values, it is as effective as closing museums for a few days or entirely eliminating the effect of transients. If infectivity and susceptibility are reduced to 40% or 20%, it reduces the number of resident infections over the period of 120 days by 10% and 13%.
Learning to Select and Generalize Striking Movements in Robot Table Tennis
Muelling, Katharina (Max Planck Institute for Intelligent Systems) | Kober, Jens (Max Planck Institute for Intelligent Systems) | Kroemer, Oliver (Technische Universitaet Darmstadt) | Peters, Jan (Technische Universitaet Darmstadt)
Learning new motor tasks autonomously from interaction with a human being is an important goal for both robotics and machine learning. However, when moving beyond basic skills, most monolithic machine learning approaches fail to scale. In this paper, we take the task of learning table tennis as an example and present a new framework which allows a robot to learn cooperative table tennis from interaction with a human. Therefore, the robot first learns a set of elementary table tennis hitting movements from a human teacher by kinesthetic teach-in, which is compiled into a set of dynamical system motor primitives (DMPs). Subsequently, the system generalizes these movements to a wider range of situations using our mixture of motor primitives (MoMP) approach. The resulting policy enables the robot to select appropriate motor primitives as well as to generalize between them. Finally, the robot plays with a human table tennis partner and learns online to improve its behavior.
Improving Predictions with Hybrid Markets
Nagar, Yiftach (Massachusetts Institute of Technology) | Malone, Thomas W. (Massachusetts Institute of Technology)
Statistical models almost always yield predictions that are more accurate than those of human experts. However, humans are better at data acquisition and at recognizing atypical circumstances. We use prediction markets to combine predictions from groups of humans and artificial-intelligence agents and show that they are more robust than those from groups of humans or agents alone.
Generating Interpretable Hypotheses Based on Syllogistic Patterns
Hagimura, Takuya (Kobe University) | Seki, Kazuhiro (Kobe University) | Uehara, Kuniaki (Kobe University)
The ever-growing literature in biomedicine makes it virtually impossible for individuals to grasp all the information relevant to their interests. Since even experts' knowledge is likely to be incomplete, important associations among key biomedical concepts may remain unnoticed in the flood of information. Discovering those implicit, hidden knowledge is called hypothesis discovery. This paper reports our preliminary work on hypothesis discovery, which takes advantage of a syllogistic chain of relations extracted from existing knowledge (i.e., published literature). We consider such chains of relations as implicit patterns or rules to generate potential hypotheses. The generated hypotheses are then ranked according to their plausibility judged from the reliability of the rule which generated the hypothesis and the analogical resemblance between new and existing knowledge. We discuss the validity of the proposed approach on the entire Medline database.
Studying Direct and Indirect Human Influence on Consensus in Swarms
Amraii, Saman Amirpour (University of Pittsburgh) | Chakraborty, Nilanjan (Carnegie Mellon University) | Lewis, Michael (University of Pittsburgh)
Many cooperative control problems ranging from formation following, to rendezvous to flocking can be expressed as consensus problems. The ability of an operator to influence the development of consensus within a swarm therefore provides a basic test of the quality of human-swarm interaction (HSI). Two plausible approaches are : Direct- dictate a desired value to swarm members or Indirect- control or influence one or more swarm members relying on existing control laws to propagate that influence. Both approaches have been followed by HSI researchers. The Indirect case uses standard consensus methods where the operator exerts influence over a few robots and then the swarm reaches a consensus based on its intrinsic rules. The Direct method corresponds to flooding in which the operator directly sends the intention to a subset of the swarm and the command then propagates through the remainder of the swarm as a privileged message. In this paper we compare these two methods regarding their convergence time and properties in noisy and noiseless conditions with static and dynamic graphs. We have found that average consensus method (indirect control) converges much slower than flooding (direct) method but it has more noise tolerance in comparison with simple flooding algorithms. Also, we have found that the convergence time of the consensus method behaves erratically when the graph's connectivity (Fiedler value) is high.
Detecting and Generating Ironic Comparisons: An Application of Creative Information Retrieval
Veale, Tony (Korean Advanced Institute of Science and Technology)
Ironic utterances promise an expected meaning that never arrives, and deliver instead a meaning that exposes the failure of our expectations. Though they can appear contextually inappropriate, ironic statements succeed when they subvert their context of use, so it is the context rather than the utterance that is shown to be incongruous. Every ironic statement thus poses two related questions: the first, “what is unexpected about my meaning?” helps us answer the second, “what is unexpected about my context of use?”. Like metaphor, irony is not overtly marked, and relies instead on a listener’s understanding of stereotypical norms to unpack its true meaning. In this paper we consider how irony relies upon and subverts our stereotypical knowledge of a domain, and show how this knowledge can be exploited to both recognize and generate ironic similes for a topic.