AITopics | Scientific Discovery

Collaborating Authors

Scientific Discovery

"The problem of giving rules for producing true scientific statements has been replaced by the problem of finding efficient heuristic rules for culling the reasonable candidates for an explanation from an appropriate set of possible candidates [and finding methods for constructing the candidates]."
– B. Buchanan, quoted in Lindley Darden. Recent Work in Computational Scientific Discovery.

News Overviews Instructional Materials AI-Alerts Classics

Hypothesis Testing in Speckled Data with Stochastic Distances

Nascimento, Abraão D. C., Cintra, Renato J., Frery, Alejandro C.

arXiv.org Machine LearningJul-12-2012

Images obtained with coherent illumination, as is the case of sonar, ultrasound-B, laser and Synthetic Aperture Radar -- SAR, are affected by speckle noise which reduces the ability to extract information from the data. Specialized techniques are required to deal with such imagery, which has been modeled by the G0 distribution and under which regions with different degrees of roughness and mean brightness can be characterized by two parameters; a third parameter, the number of looks, is related to the overall signal-to-noise ratio. Assessing distances between samples is an important step in image analysis; they provide grounds of the separability and, therefore, of the performance of classification procedures. This work derives and compares eight stochastic distances and assesses the performance of hypothesis tests that employ them and maximum likelihood estimation. We conclude that tests based on the triangular distance have the closest empirical size to the theoretical one, while those based on the arithmetic-geometric distances have the best power. Since the power of tests based on the triangular distance is close to optimum, we conclude that the safest choice is using this distance for hypothesis testing, even when compared with classical distances as Kullback-Leibler and Bhattacharyya.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Machine Learning

doi: 10.1109/TGRS.2009.2025498

1207.2959

Country:

North America > Canada > Alberta > Census Division No. 6 > Calgary Metropolitan Region > Calgary (0.04)
North America > United States > New York (0.04)
South America > Brazil > Pernambuco > Recife (0.04)
(9 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.60)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.54)
(2 more...)

Add feedback

Hypothesis testing using pairwise distances and associated kernels (with Appendix)

Sejdinovic, Dino, Gretton, Arthur, Sriperumbudur, Bharath, Fukumizu, Kenji

arXiv.org Machine LearningMay-21-2012

We provide a unifying framework linking two classes of statistics used in two-sample and independence testing: on the one hand, the energy distances and distance covariances from the statistics literature; on the other, distances between embeddings of distributions to reproducing kernel Hilbert spaces (RKHS), as established in machine learning. The equivalence holds when energy distances are computed with semimetrics of negative type, in which case a kernel may be defined such that the RKHS distance between distributions corresponds exactly to the energy distance. We determine the class of probability distributions for which kernels induced by semimetrics are characteristic (that is, for which embeddings of the distributions to an RKHS are injective). Finally, we investigate the performance of this family of kernels in two-sample and independence tests: we show in particular that the energy distance most commonly employed in statistics is just one member of a parametric family of kernels, and that other choices from this family can yield more powerful tests.

artificial intelligence, machine learning, scientific discovery, (17 more...)

arXiv.org Machine Learning

1205.0411

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > New York (0.04)
Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.41)

Add feedback

Discovery of Invariants through Automated Theory Formation

Llano, Maria Teresa, Ireland, Andrew, Pease, Alison

arXiv.org Artificial IntelligenceJun-21-2011

Refinement is a powerful mechanism for mastering the complexities that arise when formally modelling systems. Refinement also brings with it additional proof obligations -- requiring a developer to discover properties relating to their design decisions. With the goal of reducing this burden, we have investigated how a general purpose theory formation tool, HR, can be used to automate the discovery of such properties within the context of Event-B. Here we develop a heuristic approach to the automatic discovery of invariants and report upon a series of experiments that we undertook in order to evaluate our approach. The set of heuristics developed provides systematic guidance in tailoring HR for a given Event-B development. These heuristics are based upon proof-failure analysis, and have given rise to some promising results.

artificial intelligence, invariant, scientific discovery, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.4204/EPTCS.55.1

1106.409

Country:

Europe > Ireland (0.06)
North America > United States > California > San Francisco County > San Francisco (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Poland > Podlaskie Province > Bialystok (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.64)

Add feedback

Notes on a New Philosophy of Empirical Science

Burfoot, Daniel

arXiv.org Machine LearningApr-28-2011

This book presents a methodology and philosophy of empirical science based on large scale lossless data compression. In this view a theory is scientific if it can be used to build a data compression program, and it is valuable if it can compress a standard benchmark database to a small size, taking into account the length of the compressor itself. This methodology therefore includes an Occam principle as well as a solution to the problem of demarcation. Because of the fundamental difficulty of lossless compression, this type of research must be empirical in nature: compression can only be achieved by discovering and characterizing empirical regularities in the data. Because of this, the philosophy provides a way to reformulate fields such as computer vision and computational linguistics as empirical sciences: the former by attempting to compress databases of natural images, the latter by attempting to compress large text databases. The book argues that the rigor and objectivity of the compression principle should set the stage for systematic progress in these fields. The argument is especially strong in the context of computer vision, which is plagued by chronic problems of evaluation. The book also considers the field of machine learning. Here the traditional approach requires that the models proposed to solve learning problems be extremely simple, in order to avoid overfitting. However, the world may contain intrinsically complex phenomena, which would require complex models to understand. The compression philosophy can justify complex models because of the large quantity of data being modeled (if the target database is 100 Gb, it is easy to justify a 10 Mb model). The complex models and abstractions learned on the basis of the raw data (images, language, etc) can then be reused to solve any specific learning problem, such as face recognition or machine translation.

artificial intelligence, machine learning, natural language, (23 more...)

arXiv.org Machine Learning

1104.5466

Country:

Asia > India (0.13)
North America > Canada > Ontario > Toronto (0.13)
Africa > Madagascar (0.04)
(19 more...)

Genre:

Summary/Review (1.00)
Research Report > New Finding (1.00)
Overview (1.00)
Instructional Material (1.00)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)
Media (1.00)
(11 more...)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (1.00)
(11 more...)

Add feedback

A novel family of non-parametric cumulative based divergences for point processes

Seth, Sohan, Il, Park, Brockmeier, Austin, Semework, Mulugeta, Choi, John, Francis, Joseph, Principe, Jose

Neural Information Processing SystemsDec-31-2010

Hypothesis testing on point processes has several applications such as model fitting, plasticity detection, and non-stationarity detection. Standard tools for hypothesis testing include tests on mean firing rate and time varying rate function. However, these statistics do not fully describe a point process and thus the tests can be misleading. In this paper, we introduce a family of non-parametric divergence measures for hypothesis testing. We extend the traditional Kolmogorov--Smirnov and Cramer--von-Mises tests for point process via stratification. The proposed divergence measures compare the underlying probability structure and, thus, is zero if and only if the point processes are the same. This leads to a more robust test of hypothesis. We prove consistency and show that these measures can be efficiently estimated from data. We demonstrate an application of using the proposed divergence as a cost function to find optimally matched spike trains.

artificial intelligence, machine learning, point process, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.93)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.95)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.84)
Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.75)

Add feedback

Multiple Hypothesis Testing in Pattern Discovery

Hanhijärvi, Sami, Puolamäki, Kai, Garriga, Gemma C.

arXiv.org Machine LearningJun-29-2009

The problem of multiple hypothesis testing arises when there are more than one hypothesis to be tested simultaneously for statistical significance. This is a very common situation in many data mining applications. For instance, assessing simultaneously the significance of all frequent itemsets of a single dataset entails a host of hypothesis, one for each itemset. A multiple hypothesis testing method is needed to control the number of false positives (Type I error). Our contribution in this paper is to extend the multiple hypothesis framework to be used with a generic data mining algorithm. We provide a method that provably controls the family-wise error rate (FWER, the probability of at least one false positive) in the strong sense. We evaluate the performance of our solution on both real and generated data. The results show that our method controls the FWER while maintaining the power of the test.

data mining, machine learning, pattern recognition, (17 more...)

arXiv.org Machine Learning

0906.5263

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)

Genre: Research Report > Experimental Study (0.56)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
(2 more...)

Add feedback

Simultaneous Discovery of Conservation Laws and Hidden Particles With Smith Matrix Decomposition

Schulte, Oliver (Simon Fraser University)

AAAI ConferencesJun-23-2009

Particle physics experiments, like the Large Hadron Collider in Geneva, can generate thousands of data points listing detected particle reactions. An important learning task is to analyze the reaction data for evidence of conserved quantities and hidden particles. This task involves latent structure in two ways: first, hypothesizing hidden quantities whose conservation determines which reactions occur, and second, hypothesizing the presence of hidden particles. We model this problem in the classic linear algebra framework of automated scientific discovery due to Valdes-Perez, Zytkow and Simon, where both reaction data and conservation laws are represented as matrices. We introduce a new criterion for selecting a matrix model for reaction data: find hidden particles and conserved quantities that rule out as many interactions among the nonhidden particles as possible. A polynomial-time algorithm for optimizing this criterion is based on the new theorem that hidden particles are required if and only if the Smith Normal Form of the reaction matrix R contains entries other than 0 or 1. To our knowledge this is the first application of Smith matrix decomposition to a problem in AI. Using data from particle accelerators, we compare our algorithm to the main model of particles in physics, known as the Standard Model: our algorithm discovers conservation laws that are equivalent to those in the Standard Model, and indicates the presence of a hidden particle (the electron antineutrino) in accordance with the Standard Model.

matrix, particle, reaction, (16 more...)

AAAI Conferences

Twenty-First International Joint Conference on Artificial Intelligence

Country:

North America > United States > New York (0.04)
North America > Canada > British Columbia (0.04)
North America > Canada > Alberta (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.34)

Add feedback

Evaluating Abductive Hypotheses using an EM Algorithm on BDDs

Inoue, Katsumi (National Institute of Informatics) | Sato, Taisuke (Tokyo Institute of Technology) | Ishihata, Masakazu (Tokyo Institute of Technology) | Kameya, Yoshitaka (Tokyo Institute of Technology) | Nabeshima, Hidetomo (University of Yamanashi)

AAAI ConferencesJun-23-2009

Abductive inference is an important AI reasoning technique to find explanations of observations, and has recently been applied to scientific discovery. To find best hypotheses among many logically possible hypotheses, we need to evaluate hypotheses obtained from the process of hypothesis generation. We propose an abductive inference architecture combined with an EM algorithm working on binary decision diagrams (BDDs). This work opens a way of applying BDDs to compress multiple hypotheses and to select most probable ones from them. An implemented system has been applied to inference of inhibition in metabolic pathways in the domain of systems biology.

algorithm, hypothesis, probability, (16 more...)

AAAI Conferences

Twenty-First International Joint Conference on Artificial Intelligence

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Abductive Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.94)

Add feedback

Sequential Hypothesis Testing under Stochastic Deadlines

Frazier, Peter, Yu, Angela J.

Neural Information Processing SystemsDec-31-2008

Most models of decision-making in neuroscience assume an infinite horizon, which yields an optimal solution that integrates evidence up to a fixed decision threshold; however, under most experimental as well as naturalistic behavioral settings, the decision has to be made before some finite deadline, which is often experienced as a stochastic quantity, either due to variable external constraints or internal timing uncertainty. In this work, we formulate this problem as sequential hypothesis testing under a stochastic horizon. We use dynamic programming tools to show that, for a large class of deadline distributions, the Bayes-optimal solution requires integrating evidence up to a threshold that declines monotonically over time. We use numerical simulations to illustrate the optimal policy in the special cases of a fixed deadline and one that is drawn from a gamma distribution.

deadline, optimal policy, threshold, (15 more...)

Neural Information Processing Systems

Country: North America > United States > New Jersey > Mercer County > Princeton (0.04)

Technology: