Industry
Evolutionary Churn Prediction in Mobile Networks Using Hybrid Learning
Yeshwanth, V. (Ericsson India Global Services Pvt. Ltd) | Raj, V. Vimal (Sri. Venkateswara College of Enginerring) | Saravanan, M. (Ericsson India Global Services Pvt. Ltd)
Churn is the movement of customers from one mobile network operator to another. It is always better to retain a customer than having to find a new customer in the present competitive environment and the importance of this fact can’t be stressed enough. Being more of a social phenomenon than a mathematical one the existing models fail in prediction of such a behavioral quantity. Churn prediction is valuable to the mobile operator depending on the level of accuracy of predictions. This paper presents predictive modeling of customer behavior based on the application of hybrid learning approaches for churn prediction in the mobile network. Our proposed framework deals with a better and more accurate churner prediction technique compared to the existing ones as it incorporates hybrid learning method which is a combination of tree induction system and genetic programming to derive the rules for classification based on the customer behavior. Finally using the game theory techniques we understand the community effect of churn. We calculated the predicted score which is a churn value of a mobile customer. The proposed model is used for prediction of various user defined groupings based on usage time, location and their underlying social network, thus making it a pragmatic approach which models churn on human level than a mathematical level. The post evaluation results on a real world dataset from a leading operator validate our findings.
Learning Parameters of the K-Means Algorithm From Subjective Human Annotation
Dutta, Haimonti (Columbia University) | Passonneau, Rebecca J. (Columbia University) | Lee, Austin (Columbia University) | Radeva, Axinia (Columbia University) | Xie, Boyi (Columbia University) | Waltz, David (Columbia University)
The New York Public Library is participating in the Chronicling America initiative to develop an online searchable database of historically significant newspaper articles. Microfilm copies of the papers are scanned and high resolution OCR software is run on them. The text from the OCR provides a wealth of data and opinion for researchers and historians. However, the categorization of articles provided by the OCR engine is rudimentary and a large number of the articles are labeled ``editorial" without further categorization. To provide a more refined grouping of articles, unsupervised machine learning algorithms (such as K-Means) are being investigated. The K-Means algorithm requires tuning of parameters such as the number of clusters and mechanism of seeding to ensure that the search is not prone to being caught in a local minima. We designed a pilot study to observe whether humans are adept at finding sub-categories. The subjective labels provided by humans are used as a guide to compare performance of the automated clustering techniques. In addition, seeds provided by annotators are carefully incorporated into a semi-supervised K-Means algorithm (Seeded K-Means); empirical results indicate that this helps to improve performance and provides an intuitive sub-categorization of the articles labeled ``editorial" by the OCR engine.
Robustness of Filter-Based Feature Ranking: A Case Study
Altidor, Wilker (Florida Atlantic University) | Khoshgoftaar, Taghi M. (Florida Atlantic University) | Hulse, Jason Van (Florida Atlantic University)
The filter model of feature selection has been well studied. In previous studies, classification performance has traditionally been proposed as a way to evaluate filter solutions. In this study, a new method of comparing feature ranking techniques is presented providing a straightforward approach for quantifying individual filters’ robustness to class noise. Six commonly-used filters, plus one which is rarely used, are investigated regarding their ability to retain, in the presence of class noise, strong classification performance. Three classifiers and one classification performance metric are considered. The experimental results of this study show that Gain Ratio, one of the well known and widely used filters, is very sensitive to class noise. ReliefF offers the best results with both the NB and kNN learners while Signal-to-noise, though not as widely used in the literature as the others, outperforms all the filters with the SVM learner.
Special Track on Data Mining
Bisant, David (The Laboratory for Physical Sciences) | Eberle, William (Tennessee Technological University)
Data mining is the process of extracting hidden patterns from data. With data ever increasing in volume, mining it into usable information is becoming increasingly important. Data mining approaches are commonly used in a wide range of profiling services, including marketing, fraud detection, and scientific discovery. The FLAIRS Data Mining special track is devoted to data mining with the aim of presenting new and important contributions in this area.
A Cognitive Tutoring Agent with Automatic Reasoning Capabilities
Faghihi, Usef (University of Memphis) | Fournier-Viger, Philippe (National Cheng Kung University) | Nkambou, Roger (Université)
In this paper, we show how to make a cognitive tutoring agent capable of precise causal reasoning by integrating constraints with data mining algorithms. Putting constraints on recorded interactions between the agent and learners during learning activities allows data mining algorithms to extract the causes of the learners’ problems. Subsequently, the agent uses this information to provide useful and customized explanations to learners.
Difficulty Rating of Sudoku Puzzles by a Computational Model
Pelánek, Radek (Masaryk University Brno)
We discuss and evaluate metrics for difficulty rating of Sudoku puzzles. The correlation coefficient with human performance for our best metric is 0.95. The data on human performance were obtained from three web portals and they comprise thousands of hours of human solving over 2000 problems. We provide a simple computational model of human solving activity and evaluate it over collected data. Using the model we show that there are two sources of problem difficulty: complexity of individual steps (logic operations) and structure of dependency among steps. Beside providing a very good Sudoku-tuned metric, we also discuss a metric with few Sudoku-specific details, which still provides good results (correlation coefficient is 0.88). Hence we believe that the approach should be applicable to difficulty rating of other constraint satisfaction problems.
Happy Movie: A Group Recommender Application in Facebook
Quijano-Sánchez, Lara (Universidad Complutense de Madrid) | Recio-Garcia, Juan A. (Universidad Complutense de Madrid) | Díaz-Agudo, Belén (Universidad Complutense de Madrid) | Jimenez-Diaz, Guillermo (Universidad Complutense de Madrid)
In this paper we introduce our recommender Happy Movie, a Facebook application for movie recommendation to groups. This system exploits information about the social relationships and behaviour of the users to provide better recommendations. Our previous works have shown that social factors improve the recommendation results. However it required many questionnaires to be filled for obtaining the social information, so we have moved to a social network environment where this information is easily available.
Active and Interactive Discovery of Goal Selection Knowledge
Powell, Jay (Indiana University) | Molineaux, Matthew (Knexus Research Corporation) | Aha, David William (Naval Research Laboratory)
If given manually-crafted goal selection knowledge, goal reasoning agents can dynamically determine which goals they should achieve in complex environments. These agents should instead learn goal selection knowledge through expert interaction. We describe T-ARTUE, a goal reasoning agent that performs case-based active and interactive learning to discover goal selection knowledge. We also report tests of its performance in a complex environment. We found that, under some conditions, T-ARTUE can quickly learn goal selection knowledge.
Extending Case-Based Planning with Behavior Trees
Palma, Ricardo (Universidad Complutense de Madrid) | González-Calero, Pedro Antonio (Universidad Complutense de Madrid) | Gómez-Martín, Marco Antonio (Universidad Complutense de Madrid) | Gómez-Martín, Pedro Pablo (Universidad Complutense de Madrid)
The combination of learning by demonstration and planning has proved an effective solution for real-time strategy games. Nevertheless, learning hierarchical plans from expert traces also has its limitations regarding the number of training traces required, and the absence of mechanisms for rapidly reacting to high priority goals. We propose to bring the game designer back into the loop, by allowing him to explicitly inject decision making knowledge, in the form of behavior trees, to complement the knowledge obtained from the traces. By providing a natural mechanism for designers to inject knowledge into the plan library, we intend to integrate the best of both worlds: learning from traces and hard-coded rules.
An Efficient Random Decision Tree Algorithm for Case-Based Reasoning Systems
Houeland, Tor Gunnar (Norwegian University of Science and Technology)
We present an efficient random decision tree algorithm for case-based reasoning systems. We combine this algorithm with a simple similarity measure based on domain knowledge to create a stronger hybrid algorithm. This combination is based on our general approach for combining lazy and eager learning methods. We evaluate the resulting algorithms on a case base of patient records in a palliative care domain. Our hybrid algorithm consistently produces a lower average error than the base algorithms.