Goto

Collaborating Authors

 Scientific Discovery


Scientific discoveries inspire amid a turbulent 2016

The Japan Times

A number of the notable science stories of the past year are, quite literally, out of this world. For me, the story of the year has to be the August discovery of an Earth-like planet orbiting the closest star to our own. The star, Proxima Centauri, is just 4.2 light-years from Earth. The planet circling that star has been named Proxima Centauri b. Proxima Centauri b was discovered by astronomers working on a project called Pale Red Dot, who reported that the planet lies in the star's habitable zone, meaning that it could possess water and, maybe, life.


The Bayesian New Statistics: Hypothesis Testing, Estimation, Meta-Analysis, and Power Analysis from a Bayesian Perspective

#artificialintelligence

Many people have found the table above to be useful for understanding two conceptual distinctions in the practice of data analysis. The article that discusses the table, and many other issues, is now in press. The in-press version can be found at OSF and at SSRN. Abstract: In the practice of data analysis, there is a conceptual distinction between hypothesis testing, on the one hand, and estimation with quantified uncertainty, on the other hand. Among frequentists in psychology a shift of emphasis from hypothesis testing to estimation has been dubbed "the New Statistics" (Cumming, 2014).


This week's popular Artificial Intelligence news: December 15, 2016

#artificialintelligence

Its been a busy week for . Here is our round up of the most popular articles. Kuhn's book The Structure of Scientific Revolution outlined an episodic model in which periods of "normal science" were interrupted by periods of "revolutionary science". Kuhn challenges us to consider new paradigms and to change the rules of the game, our standards and our best practices.Artificial intelligence (AI) and machine learning (ML) liberatingly delivers this new paradigm, putting the science back into security.It's quite clear that relying on endpoint protection solutions that only… Tagged In Computer Security Artificial Intelligence Machine Learning Malware Data Mining Dam High Frequency Trading Scientific Revolution CEVA creates new value by enhancing IoT and machine learning applications Tagged In Smartphone Compound Annual Growth Rate Artificial Intelligence Bluetooth NASDAQ Soft Bank LTE (telecommunication) Integrated Circuit Data Compression Big Data Machine Learning Artificial Neural Network Computer Vision ARM Architecture Ericsson Wilderness Software Framework Embedded System Session Initiation Protocol 3GPP Zig Bee Digital Signal Processing To get the right data, we could look at two different kinds of inputs: Explicit and implicit. Explicit means asking the user to provide information by asking straightforward questions such as, How much do you want to spend?


Characteristics of Good Visual Analytics and Data Discovery Tools

@machinelearnbot

Visual Analytics and Data Discovery allow analysis of big data sets to find insights and valuable information. This is much more than just classical Business Intelligence (BI). See this article for more details and motivation: "Using Visual Analytics to Make Better Decisions: the Death Pill Exa...". Let's take a look at important characteristics to choose the right tool for your use cases. Several tools are available on the market for Visual Analytics and Data Discovery.


Nonparametric Detection of Anomalous Data Streams

arXiv.org Machine Learning

A nonparametric anomalous hypothesis testing problem is investigated, in which there are totally n sequences with s anomalous sequences to be detected. Each typical sequence contains m independent and identically distributed (i.i.d.) samples drawn from a distribution p, whereas each anomalous sequence contains m i.i.d. samples drawn from a distribution q that is distinct from p. The distributions p and q are assumed to be unknown in advance. Distribution-free tests are constructed using maximum mean discrepancy as the metric, which is based on mean embeddings of distributions into a reproducing kernel Hilbert space. The probability of error is bounded as a function of the sample size m, the number s of anomalous sequences and the number n of sequences. It is then shown that with s known, the constructed test is exponentially consistent if m is greater than a constant factor of log n, for any p and q, whereas with s unknown, m should has an order strictly greater than log n. Furthermore, it is shown that no test can be consistent for arbitrary p and q if m is less than a constant factor of log n, thus the order-level optimality of the proposed test is established. Numerical results are provided to demonstrate that our tests outperform (or perform as well as) the tests based on other competitive approaches under various cases.


Key-Object – A New Paradigm in Search?

@machinelearnbot

As we are all fond of saying, innovation follows pain points. Are we missing something in our uber-critical search capabilities that needs to be resolved? A colleague recently pointed me to a slim volume "Structured Search for Big Data" by Mikhail Gilula (published by Elsevier and available on Amazon) that argues that not only are our search tools deficient but that a complete revamp of the underlying key-word NoSQL DB structure is what's required. Use Google, Amazon, or any of the other life-critical search tools we've become so reliant upon and you are using key-word search on NoSQL. The pain that Gilula identifies is the length of time it takes the consumer to research and select complex merchandise for best deals resulting from the imprecision of the search results.


Combining Multiple Hypothesis Testing with Machine Learning Increases the Statistical Power of Genome-wide Association Studies

#artificialintelligence

The goal of genome-wide association studies (GWAS) (e.g. the WTCCC study1) is to examine the relationship between genetic markers such as single-nucleotide polymorphisms (SNPs) and individual traits, which are usually complex diseases or behavioral characteristics. Generally, a large number of statistical tests are performed in parallel, each SNP being individually tested for association2,3,4. The standard approach consists of computing individual, SNP-specific p-values corresponding to a statistical association test and comparing these p-values against some given significance threshold (say t*), meaning that precisely those SNPs with p-values smaller than t*are declared to be associated with the trait4,5,6. We refer to this approach as raw p-value thresholding (RPVT) and review some standard methods for choosing t*for the purpose of controlling multiple type I error rates (in particular, the family-wise error rate (FWER) and the expected number of false rejections (ENFR)) in the Methods Section. According to the GWAS catalog7,8 (last accessed 03-07-2015), the more than 1,400 GWAS published so far have led to the identification of more than 11,000 SNPs associated with about 800 human diseases and anthropometric traits with p-values using t* 1 10 5.


Key-Object – A New Paradigm in Search?

@machinelearnbot

As we are all fond of saying, innovation follows pain points. Are we missing something in our uber-critical search capabilities that needs to be resolved? A colleague recently pointed me to a slim volume "Structured Search for Big Data" by Mikhail Gilula (published by Elsevier and available on Amazon) that argues that not only are our search tools deficient but that a complete revamp of the underlying key-word NoSQL DB structure is what's required. Use Google, Amazon, or any of the other life-critical search tools we've become so reliant upon and you are using key-word search on NoSQL. The pain that Gilula identifies is the length of time it takes the consumer to research and select complex merchandise for best deals resulting from the imprecision of the search results.


From America to Viagra: the art of finding what you're not looking for

The Japan Times

STOCKHOLM – It is serendipity: from America to Viagra, history is full of great discoveries helped along by chance, as more than a century of Nobel prizes can attest. Among the chance discoveries that have been honored with the prestigious prize are X-rays (physics, 1901), penicillin (medicine, 1945), fullerenes that paved the way for nanotechnology (chemistry, 1996), conductive polymers (chemistry, 2000), and the bacteria responsible for ulcers (medicine, 2005). But, as the father of pasteurization Louis Pasteur noted in 1854, "In the fields of observation, chance only favors the prepared mind" -- a remark made in reference to the discovery of the link between electricity and magnetism by Danish scientist Hans Christian Orsted. Orsted happened to notice that a compass needle deflected from magnetic north when an electric current from a battery was switched on and off -- a pioneering discovery in electromagnetism. Like Pasteur, Dutch scientist Pek Van Andel also believes in the unexpected.


Hypothesis Testing is a Bad Idea (my talk at Warwick, England, 2:30pm Thurs 15 Sept)

#artificialintelligence

This is the conference, and here's my talk (will do Google hangout, just as with my recent talks in Bern, Strasbourg, etc): Through a series of examples, we consider problems with classical hypothesis testing, whether performed using classical p-values or confidence intervals, Bayes factors, or Bayesian inference using noninformative priors. We locate the problem not in the use of any particular statistical method but rather with larger problems of deterministic thinking and a misguided version of Popperianism in which the rejection of a straw-man null hypothesis is taken as confirmation of a preferred alternative. We suggest solutions involving multilevel modeling and informative Bayesian inference. The post Hypothesis Testing is a Bad Idea (my talk at Warwick, England, 2:30pm Thurs 15 Sept) appeared first on Statistical Modeling, Causal Inference, and Social Science. The post Hypothesis Testing is a Bad Idea (my talk at Warwick, England, 2:30pm Thurs 15 Sept) appeared first on All About Statistics.