"The problem of giving rules for producing true scientific statements has been replaced by the problem of finding efficient heuristic rules for culling the reasonable candidates for an explanation from an appropriate set of possible candidates [and finding methods for constructing the candidates]."
– B. Buchanan, quoted in Lindley Darden. Recent Work in Computational Scientific Discovery.
The statistical analysis of discrete data has been the subject of extensive statistical research dating back to the work of Pearson. In this survey we review some recently developed methods for testing hypotheses about high-dimensional multinomials. Traditional tests like the $\chi^2$ test and the likelihood ratio test can have poor power in the high-dimensional setting. Much of the research in this area has focused on finding tests with asymptotically Normal limits and developing (stringent) conditions under which tests have Normal limits. We argue that this perspective suffers from a significant deficiency: it can exclude many high-dimensional cases when - despite having non Normal null distributions - carefully designed tests can have high power. Finally, we illustrate that taking a minimax perspective and considering refinements of this perspective can lead naturally to powerful and practical tests.
Building on a survey of previous theories of serendipity and creativity, we advance a sequential model of serendipitous occurrences. We distinguish between serendipity as a service and serendipity in the system itself, clarify the role of invention and discovery, and provide a measure for the serendipity potential of a system. While a system can arguably not be guaranteed to be serendipitous, it can have a high potential for serendipity. Practitioners can use these theoretical tools to evaluate a computational system's potential for unexpected behaviour that may have a beneficial outcome. In addition to a qualitative features of serendipity potential, the model also includes quantitative ratings that can guide development work. We show how the model is used in three case studies of existing and hypothetical systems, in the context of evolutionary computing, automated programming, and (next-generation) recommender systems. From this analysis, we extract recommendations for practitioners working with computational serendipity, and outline future directions for research.
From the era of the desktop app to the era of the web page to the era of the mobile app to the latest paradigm shift which seems to be happening now: the conversation. These providers will most likely sit at the center of an ecosystem which will handle NLP (Natural Language Processing), semantic analysis, and other core tasks such as location and calendar integration. Currently, there are "bits and pieces" for particulars like dialogs (IBM Dialog) and NLP (IBM AlchemyAPI) all the way to large sdk's for voice and digital assistants (Alexa, Siri, and Google). While the examples above are simplistic they do provide some structure and a view into the basic text lines of voice and chat applications.
A discovery system for detecting correspondences in data is described, based on the familiar induction methods of J. S. Mill. Given a set of observations, the system induces the "causally" related facts in these observations. Its application to empirical linguistic discovery is described. The paper is organized as follows. I begin the discussion by revealing two developments, the transformationalists' critique of "discovery procedures" and naive inductivism, which have led to the neglect of discovery issues, arguing that more attention needs to be paid to discovery in linguistics.