Overview
Notes on a New Philosophy of Empirical Science
This book presents a methodology and philosophy of empirical science based on large scale lossless data compression. In this view a theory is scientific if it can be used to build a data compression program, and it is valuable if it can compress a standard benchmark database to a small size, taking into account the length of the compressor itself. This methodology therefore includes an Occam principle as well as a solution to the problem of demarcation. Because of the fundamental difficulty of lossless compression, this type of research must be empirical in nature: compression can only be achieved by discovering and characterizing empirical regularities in the data. Because of this, the philosophy provides a way to reformulate fields such as computer vision and computational linguistics as empirical sciences: the former by attempting to compress databases of natural images, the latter by attempting to compress large text databases. The book argues that the rigor and objectivity of the compression principle should set the stage for systematic progress in these fields. The argument is especially strong in the context of computer vision, which is plagued by chronic problems of evaluation. The book also considers the field of machine learning. Here the traditional approach requires that the models proposed to solve learning problems be extremely simple, in order to avoid overfitting. However, the world may contain intrinsically complex phenomena, which would require complex models to understand. The compression philosophy can justify complex models because of the large quantity of data being modeled (if the target database is 100 Gb, it is easy to justify a 10 Mb model). The complex models and abstractions learned on the basis of the raw data (images, language, etc) can then be reused to solve any specific learning problem, such as face recognition or machine translation.
A Machine Learning Based Analytical Framework for Semantic Annotation Requirements
Hassanzadeh, Hamed, Keyvanpour, MohammadReza
The Semantic Web is an extension of the current web in which information is given well-defined meaning. The perspective of Semantic Web is to promote the quality and intelligence of the current web by changing its contents into machine understandable form. Therefore, semantic level information is one of the cornerstones of the Semantic Web. The process of adding semantic metadata to web resources is called Semantic Annotation. There are many obstacles against the Semantic Annotation, such as multilinguality, scalability, and issues which are related to diversity and inconsistency in content of different web pages. Due to the wide range of domains and the dynamic environments that the Semantic Annotation systems must be performed on, the problem of automating annotation process is one of the significant challenges in this domain. To overcome this problem, different machine learning approaches such as supervised learning, unsupervised learning and more recent ones like, semi-supervised learning and active learning have been utilized. In this paper we present an inclusive layered classification of Semantic Annotation challenges and discuss the most important issues in this field. Also, we review and analyze machine learning applications for solving semantic annotation problems. For this goal, the article tries to closely study and categorize related researches for better understanding and to reach a framework that can map machine learning techniques into the Semantic Annotation challenges and requirements.
The Case for Case-Based Transfer Learning
Klenk, Matthew (Navy Center for Applied Research in Artificial Intelligence) | Aha, David W. (Navy Center for Applied Research in Artificial Intelligence) | Molineaux, Matt (Knexus Research Corporation)
Case-based reasoning (CBR) is a problem-solving process in which a new problem is solved by retrieving a similar situation and reusing its solution. Transfer learning occurs when, after gaining experience from learning how to solve source problems, the same learner exploits this experience to improve performance and/or learning on target problems. In transfer learning, the differences between the source and target problems characterize the transfer distance. CBR can support transfer learning methods in multiple ways. We illustrate how CBR and transfer learning interact and characterize three approaches for using CBR in transfer learning: (1) as a transfer learning method, (2) for problem learning, and (3) to transfer knowledge between sets of problems. We describe examples of these approaches from our own and related work and discuss applicable transfer distances for each. We close with conclusions and directions for future research applying CBR to transfer learning.
Human Natural Instruction of a Simulated Electronic Student
Kaochar, Tasneem (University of Arizona) | Peralta, Raquel Torres (University of Arizona) | Morrison, Clayton T. (University of Arizona) | Walsh, Thomas J. (University of Arizona) | Fasel, Ian R. (University of Arizona) | Beyon, Sumin (University of Arizona) | Tran, Anh (University of Arizona) | Wright, Jeremy (University of Arizona) | Cohen, Paul R. (University of Arizona)
Humans naturally use multiple modes of instruction while teaching one another. We would like our robots and artificial agents to be instructed in the same way, rather than programmed. In this paper, we review prior work on human instruction of autonomous agents and present observations from two exploratory pilot studies and the results of a full study investigating how multiple instruction modes are used by humans. We describe our Bootstrapped Learning User Interface, a prototype multiinstruction interface informed by our human-user studies.
SBVR Business Rules Generation from Natural Language Specification
Bajwa, Imran Sarwar (University of Birmingham) | Lee, Mark G. (University of Birmingham) | Bordbar, Behzad (University of Birmingham)
In this paper, we present a novel approach of translating natural languages specification to SBVR business rules. The business rules constraint business structure or control behaviour of a business process. In modern business modelling, one of the important phases is writing business rules. Typically, a business rule analyst has to manually write hundreds of business rules in a natural language (NL) and then manually translate NL specification of all the rules in a particular rule language such as SBVR, or OCL, as required. However, the manual translation of NL rule specification to formal representation as SBVR rule is not only difficult, complex and time consuming but also can result in erroneous business rules. In this paper, we propose an automated approach that automatically translates the NL (such as English) specification of business rules to SBVR (Semantic Business Vocabulary and Rules) rules. The major challenge in NL to SBVR translation was complex semantic analysis of English language. We have used a rule based algorithm for robust semantic analysis of English and generate SBVR rules. Automated generation of SBVR based Business rules can help in improved and efficient constrained business aspects in a typical business modelling.
Business Listing Classification Using Case Based Reasoning and Joint Probability
Sood, Sanjay (AT&T) | Kar, Parijat P. (AT&T)
One challenge of building and maintaining large-scale data management systems is managing data fusion from multiple data sources. Often times, different data sources may represent the same data element in a slightly different way. These differences may represent an error in the data or a disagreement between sources on the correct value that best represents the data point. When the quantity of data managed and fused becomes sufficiently large, manual review becomes impossible, and automated systems must be built to manage data fusion. Some of the traditional solutions use simple voting theory, Dempster-Shafer theory, fuzzy matching and incremental learning. This paper presents a novel approach to data fusion in the domain of business listings. The task at hand, business listing categorization, suffers from conflicting and incomplete data from disparate data sources. Given the need for a high degree of accuracy in this task, we use a combination of case-based reasoning, joint probability, and domain-specific rules to improve data accuracy above other methods.
On-line Planning and Scheduling: An Application to Controlling Modular Printers
Ruml, W., Do, M. B., Zhou, R., Fromherz, M. P.J.
We present a case study of artificial intelligence techniques applied to the control of production printing equipment. Like many other real-world applications, this complex domain requires high-speed autonomous decision-making and robust continual operation. To our knowledge, this work represents the first successful industrial application of embedded domain-independent temporal planning. Our system handles execution failures and multi-objective preferences. At its heart is an on-line algorithm that combines techniques from state-space planning and partial-order scheduling. We suggest that this general architecture may prove useful in other applications as more intelligent systems operate in continual, on-line settings. Our system has been used to drive several commercial prototypes and has enabled a new product architecture for our industrial partner. When compared with state-of-the-art off-line planners, our system is hundreds of times faster and often finds better plans. Our experience demonstrates that domain-independent AI planning based on heuristic search can flexibly handle time, resources, replanning, and multiple objectives in a high-speed practical application without requiring hand-coded control knowledge.
Learning with Support Vector Machines
Support Vectors Machines have become a well established tool within machine learning. They work well in practice and have now been used across a wide range of applications from recognizing hand-written digits, to face identification, text categorisation, bioinformatics, and database marketing. In this book we give an introductory overview of this subject. We start with a simple Support Vector Machine for performing binary classification before considering multi-class classification and learning in the presence of noise. We show that this framework can be extended to many other scenarios such as prediction with real-valued outputs, novelty detection and the handling of complex output structures such as parse trees.
A Human-Centric Approach to Group-Based Context-Awareness
Ghadiri, Nasser, Baraani-Dastjerdi, Ahmad, Ghasem-Aghaee, Nasser, Nematbakhsh, Mohammad A.
The emerging need for qualitative approaches in context-aware information processing calls for proper modeling of context information and efficient handling of its inherent uncertainty resulted from human interpretation and usage. Many of the current approaches to context-awareness either lack a solid theoretical basis for modeling or ignore important requirements such as modularity, high-order uncertainty management and group-based context-awareness. Therefore, their real-world application and extendability remains limited. In this paper, we present f-Context as a service-based context-awareness framework, based on language-action perspective (LAP) theory for modeling. Then we identify some of the complex, informational parts of context which contain high-order uncertainties due to differences between members of the group in defining them. An agent-based perceptual computer architecture is proposed for implementing f-Context that uses computing with words (CWW) for handling uncertainty. The feasibility of f-Context is analyzed using a realistic scenario involving a group of mobile users. We believe that the proposed approach can open the door to future research on context-awareness by offering a theoretical foundation based on human communication, and a service-based layered architecture which exploits CWW for context-aware, group-based and platform-independent access to information systems.
Survival of the flexible: explaining the recent dominance of nature-inspired optimization within a rapidly evolving world
Although researchers often comment on the rising popularity of nature-inspired meta-heuristics (NIM), there has been a paucity of data to directly support the claim that NIM are growing in prominence compared to other optimization techniques. This study presents evidence that the use of NIM is not only growing, but indeed appears to have surpassed mathematical optimization techniques (MOT) in several important metrics related to academic research activity (publication frequency) and commercial activity (patenting frequency). Motivated by these findings, this article discusses some of the possible origins of this growing popularity. I review different explanations for NIM popularity and discuss why some of these arguments remain unsatisfying. I argue that a compelling and comprehensive explanation should directly account for the manner in which most NIM success has actually been achieved, e.g. through hybridization and customization to different problem environments. By taking a problem lifecycle perspective, this paper offers a fresh look at the hypothesis that nature-inspired meta-heuristics derive much of their utility from being flexible. I discuss global trends within the business environments where optimization algorithms are applied and I speculate that highly flexible algorithm frameworks could become increasingly popular within our diverse and rapidly changing world.