Materials
A Semantic Metadirectory of Services Based on Web Mining Techniques
Fernández-Villamor, José Ignacio (Universidad Politecnica de Madrid) | Zemke, Tilo (Technische Universitaet Chemnitz) | Iglesias, Carlos Ángel (Universidad Politecnica de Madrid) | Garijo, Mercedes (Universidad Politecnica de Madrid)
In the current web, developers are able to create new applications by composing already existing services from third-party vendors. However, the vast amount of choices, technologies and repositories can make it a tedious task. This paper describes a semantic metadirectory of services that helps in the process of discovering services. We propose a semantic service discovery process and description of existing service repositories, such as Programmable Web and Yahoo Pipes, which are two service repositories which provide plenty of services that can be reused by developers to build new web applications. The challenges behind integrating these repositories involved the problems of defining a common model, identifying relevant data and integrating and ranking the extracted data.
Reconstructing Pompeian Households
A database of objects discovered in houses in the Roman city of Pompeii provides a unique view of ordinary life in an ancient city. Experts have used this collection to study the structure of Roman households, exploring the distribution and variability of tasks in architectural spaces, but such approaches are necessarily affected by modern cultural assumptions. In this study we present a data-driven approach to household archeology, treating it as an unsupervised labeling problem. This approach scales to large data sets and provides a more objective complement to human interpretation.
A framework: Cluster detection and multidimensional visualization of automated data mining using intelligent agents
Jayabrabu, R., Saravanan, V., Vivekanandan, K.
Data Mining techniques plays a vital role like extraction of required knowledge, finding unsuspected information to make strategic decision in a novel way which in term understandable by domain experts. A generalized frame work is proposed by considering non - domain experts during mining process for better understanding, making better decision and better finding new patters in case of selecting suitable data mining techniques based on the user profile by means of intelligent agents.
Gaussian process modulated renewal processes
Renewal processes are generalizations of the Poisson process on the real line, whose intervals are drawn i.i.d. from some distribution. Modulated renewal processes allow these distributions to vary with time, allowing the introduction nonstationarity. In this work, we take a nonparametric Bayesian approach, modeling this nonstationarity with a Gaussian process. Our approach is based on the idea of uniformization, allowing us to draw exact samples from an otherwise intractable distribution. We develop a novel and efficient MCMC sampler for posterior inference. In our experiments, we test these on a number of synthetic and real datasets.
Application of Data Mining Techniques to a Selected Business Organisation with Special Reference to Buying Behaviour
Hilage, Tejaswini, Kulkarni, R. V.
Data mining is a new concept & an exploration and analysis of large data sets, in order to discover meaningful patterns and rules. Many organizations are now using the data mining techniques to find out meaningful patterns from the database. The present paper studies how data mining techniques can be apply to the large database. These data mining techniques give certain behavioral pattern from the database. The results which come after analysis of the database are useful for organization. This paper examines the result after applying association rule mining technique, rule induction technique and Apriori algorithm. These techniques are applied to the database of shopping mall. Market basket analysis is performing by the above mentioned techniques and some important results are found such as buying behavior.
Pattern-Based Classification: A Unifying Perspective
Bringmann, Björn, Nijssen, Siegfried, Zimmermann, Albrecht
The use of patterns in predictive models is a topic that has received a lot of attention in recent years. Pattern mining can help to obtain models for structured domains, such as graphs and sequences, and has been proposed as a means to obtain more accurate and more interpretable models. Despite the large amount of publications devoted to this topic, we believe however that an overview of what has been accomplished in this area is missing. This paper presents our perspective on this evolving area. We identify the principles of pattern mining that are important when mining patterns for models and provide an overview of pattern-based classification methods. We categorize these methods along the following dimensions: (1) whether they post-process a pre-computed set of patterns or iteratively execute pattern mining algorithms; (2) whether they select patterns model-independently or whether the pattern selection is guided by a model. We summarize the results that have been obtained for each of these methods.
Exploiting Subgraph Structure in Multi-Robot Path Planning
Multi-robot path planning is difficult due to the combinatorial explosion of the search space with every new robot added. Complete search of the combined state-space soon becomes intractable. In this paper we present a novel form of abstraction that allows us to plan much more efficiently. The key to this abstraction is the partitioning of the map into subgraphs of known structure with entry and exit restrictions which we can represent compactly. Planning then becomes a search in the much smaller space of subgraph configurations. Once an abstract plan is found, it can be quickly resolved into a correct (but possibly sub-optimal) concrete plan without the need for further search. We prove that this technique is sound and complete and demonstrate its practical effectiveness on a real map. A contending solution, prioritised planning, is also evaluated and shown to have similar performance albeit at the cost of completeness. The two approaches are not necessarily conflicting; we demonstrate how they can be combined into a single algorithm which outperforms either approach alone.
Fast and Accurate Modeling of Molecular Atomization Energies with Machine Learning
Rupp, Matthias, Tkatchenko, Alexandre, Müller, Klaus-Robert, von Lilienfeld, O. Anatole
Cross-validation on 7165 molecules yields a mean absolute error of 9.9 kcal/mol, which is an order of magnitude more accurate than counting bonds or semiempirical quantum chemistry. We use the GDB data base, a library of nearly one billion organic molecules that are stable and synthetically accessible according to organic chemistry rules [15]. While potentially applicable to any stoichiometry, as a proof of principle we restrict ourselves to small organic molecules. Specifically, we define a controlled test-bed consisting of all 7165 organic molecules from the GDB data base with up to seven "heavy" atoms that contain C, N, O, or S, being saturated with hydrogen atoms. Atomization energies range from -800 to -2000 kcal/mol.
Programmatic Gold: Targeted and Scalable Quality Assurance in Crowdsourcing
Oleson, David (CrowdFlower) | Sorokin, Alexander (CrowdFlower) | Laughlin, Greg (CrowdFlower) | Hester, Vaughn (CrowdFlower) | Le, John (CrowdFlower) | Biewald, Lukas (CrowdFlower)
Crowdsourcing is an effective tool for scalable data annotation in both research and enterprise contexts. Due to crowdsourcing’s open participation model, quality assurance is critical to the success of any project. Present methods rely on EM-style post-processing or manual annotation of large gold standard sets. In this paper we present an automated quality assurance process that is inexpensive and scalable. Our novel process relies on programmatic gold creation to provide targeted training feedback to workers and to prevent common scamming scenarios. We find that it decreases the amount of manual work required to manage crowdsourced labor while improving the overall quality of the results.
Online Planning to Control a Packaging Infeed System
Do, Minh (Palo Alto Research Center) | Lee, Lawrence (Palo Alto Research Center) | Zhou, Rong (Palo Alto Research Center) | Crawford, Lara (Palo Alto Research Center) | Uckun, Serdar (Palo Alto Research Center)
In this paper, we investigate a novel application of online planning and scheduling:controlling an automated infeeder for a packaging line of foodand consumer packaged goods. In this system, products arrive continuously at high-speedfrom the end of the production line and need to be arranged into a specific configurationfor downstream primary and secondary packaging machines.In collaboration with a domain expert from the packaging industry,we developed an innovative design for a reconfigurable parallel infeed system usinga matrix of interchangeable smart belts. We also adapted our online model-basedPlantrol planner to this domain. Our planner can control various configurations ofthe new infeed system through simulation both in nominal planning and when runtimefailures occur. We are also building a small physical prototype to validate the newdesign and our software framework.