"The problem of giving rules for producing true scientific statements has been replaced by the problem of finding efficient heuristic rules for culling the reasonable candidates for an explanation from an appropriate set of possible candidates [and finding methods for constructing the candidates]."
– B. Buchanan, quoted in Lindley Darden. Recent Work in Computational Scientific Discovery.
Thomas Kuhn proposed his paradigmatic view of scientific discovery five decades ago. The concept of paradigm has not only explained the progress of science, but has also become the central epistemic concept among STM scientists. Here, we adopt the principles of Kuhnian philosophy to construct a novel ontology aims at classifying and evaluating the impact of STM scholarly articles. First, we explain how the Kuhnian cycle of science describes research at different epistemic stages. Second, we show how the Kuhnian cycle could be reconstructed into modular ontologies which classify scholarly articles according to their contribution to paradigm-centred knowledge. The proposed ontology and its scenarios are discussed. To the best of the authors knowledge, this is the first attempt for creating an ontology for describing scholarly articles based on the Kuhnian paradigmatic view of science.
In parallel with the progressing digitalization of almost every area of life, artificial intelligence (AI) and analytics capabilities grew tremendously, enabling companies to transform random data trails into meaningful insights that helped them greatly improve business processes. Targeted marketing, location-based searches and personalized promotions became the name of the game. This eventually led to the ability to combine data from various sources into large datasets, and to mine them for granular user profiles of unprecedented detail in order to establish correlations between disparate aspects of consumer behaviour, making individual health risks and electoral choices ever more predictable – for those who held the data.
The statistical analysis of discrete data has been the subject of extensive statistical research dating back to the work of Pearson. In this survey we review some recently developed methods for testing hypotheses about high-dimensional multinomials. Traditional tests like the $\chi^2$ test and the likelihood ratio test can have poor power in the high-dimensional setting. Much of the research in this area has focused on finding tests with asymptotically Normal limits and developing (stringent) conditions under which tests have Normal limits. We argue that this perspective suffers from a significant deficiency: it can exclude many high-dimensional cases when - despite having non Normal null distributions - carefully designed tests can have high power. Finally, we illustrate that taking a minimax perspective and considering refinements of this perspective can lead naturally to powerful and practical tests.
Data science models, although successful in a number of commercial domains, have had limited applicability in scientific problems involving complex physical phenomena. Theory-guided data science (TGDS) is an emerging paradigm that aims to leverage the wealth of scientific knowledge for improving the effectiveness of data science models in enabling scientific discovery. The overarching vision of TGDS is to introduce scientific consistency as an essential component for learning generalizable models. Further, by producing scientifically interpretable models, TGDS aims to advance our scientific understanding by discovering novel domain insights. Indeed, the paradigm of TGDS has started to gain prominence in a number of scientific disciplines such as turbulence modeling, material discovery, quantum chemistry, bio-medical science, bio-marker discovery, climate science, and hydrology. In this paper, we formally conceptualize the paradigm of TGDS and present a taxonomy of research themes in TGDS. We describe several approaches for integrating domain knowledge in different research themes using illustrative examples from different disciplines. We also highlight some of the promising avenues of novel research for realizing the full potential of theory-guided data science.
Building on a survey of previous theories of serendipity and creativity, we advance a sequential model of serendipitous occurrences. We distinguish between serendipity as a service and serendipity in the system itself, clarify the role of invention and discovery, and provide a measure for the serendipity potential of a system. While a system can arguably not be guaranteed to be serendipitous, it can have a high potential for serendipity. Practitioners can use these theoretical tools to evaluate a computational system's potential for unexpected behaviour that may have a beneficial outcome. In addition to a qualitative features of serendipity potential, the model also includes quantitative ratings that can guide development work. We show how the model is used in three case studies of existing and hypothetical systems, in the context of evolutionary computing, automated programming, and (next-generation) recommender systems. From this analysis, we extract recommendations for practitioners working with computational serendipity, and outline future directions for research.
From the era of the desktop app to the era of the web page to the era of the mobile app to the latest paradigm shift which seems to be happening now: the conversation. These providers will most likely sit at the center of an ecosystem which will handle NLP (Natural Language Processing), semantic analysis, and other core tasks such as location and calendar integration. Currently, there are "bits and pieces" for particulars like dialogs (IBM Dialog) and NLP (IBM AlchemyAPI) all the way to large sdk's for voice and digital assistants (Alexa, Siri, and Google). While the examples above are simplistic they do provide some structure and a view into the basic text lines of voice and chat applications.
This book presents a methodology and philosophy of empirical science based on large scale lossless data compression. In this view a theory is scientific if it can be used to build a data compression program, and it is valuable if it can compress a standard benchmark database to a small size, taking into account the length of the compressor itself. This methodology therefore includes an Occam principle as well as a solution to the problem of demarcation. Because of the fundamental difficulty of lossless compression, this type of research must be empirical in nature: compression can only be achieved by discovering and characterizing empirical regularities in the data. Because of this, the philosophy provides a way to reformulate fields such as computer vision and computational linguistics as empirical sciences: the former by attempting to compress databases of natural images, the latter by attempting to compress large text databases. The book argues that the rigor and objectivity of the compression principle should set the stage for systematic progress in these fields. The argument is especially strong in the context of computer vision, which is plagued by chronic problems of evaluation. The book also considers the field of machine learning. Here the traditional approach requires that the models proposed to solve learning problems be extremely simple, in order to avoid overfitting. However, the world may contain intrinsically complex phenomena, which would require complex models to understand. The compression philosophy can justify complex models because of the large quantity of data being modeled (if the target database is 100 Gb, it is easy to justify a 10 Mb model). The complex models and abstractions learned on the basis of the raw data (images, language, etc) can then be reused to solve any specific learning problem, such as face recognition or machine translation.
In this article we have classified computational creativity research activities into three generations. Although the respective system developers were not necessarily targeting their research for computational creativity, we consider their works as contribution to this emerging field. Possibly, the first recognition of the implication of intelligent systems toward the creativity came with an AAAI Spring Symposium on AI and Creativity (Dartnall and Kim, 1993). We have here tried to chart the progress of the field by describing some sample projects. Our hope is that this article will provide some direction to the interested researchers and help creating a vision for the community.
A discovery system for detecting correspondences in data is described, based on the familiar induction methods of J. S. Mill. Given a set of observations, the system induces the "causally" related facts in these observations. Its application to empirical linguistic discovery is described. The paper is organized as follows. I begin the discussion by revealing two developments, the transformationalists' critique of "discovery procedures" and naive inductivism, which have led to the neglect of discovery issues, arguing that more attention needs to be paid to discovery in linguistics.
"Computing can change our ways of thinking about many things, mathematics, biology, engineering, administrative procedures, and many more. But my main concern is that it can change our thinking about ourselves: giving us new models, metaphors, and other thinking tools to aid our efforts to fathom the mysteries of the human mind and heart. The new discipline of Artificial Intelligence is the branch of computing most directly concerned with this revolution. By giving us new, deeper, insights into some of our inner processes, it changes our thinking about ourselves. It therefore changes some of our inner processes, and so changes what we are, like all social, technological and intellectual revolutions. "This book, published in 1978 by Harvester Press and Humanities Press, has been out of print for many years, and is now online, produced from a scanned in copy of the original, digitised by OCR software and made available in September 2001. Since then a number of notes and corrections have been added. Atlantic Highlands, NJ: Humanities Press