Scientific Discovery
Retail as we know it has moved into a new paradigm TechNative
It offered'queue-less shopping' โ a self-service concept that allowed consumers to shop the store by pushing a metal trolley around the aisles rather than waiting in line at a counter to be served. Today, retailers are becoming increasingly reliant on customer experience innovations such as this to ensure their continuity, as the industry is entering the most transformational period of its experience in response to the current crisis hitting the UK high street. Already a disruptor with its convenience-focused online retail service, Amazon redoubled its efforts to disrupt brick-and-mortar retail outlets by launching its own physical store, Amazon Go, in 2016. By and large, Amazon Go resembled any other supermarket: products on shelves, arranged by aisles; an assortment of baskets and trolleys for transporting goods; and a bright, fresh, welcoming atmosphere to attract customers. It's revolutionary move, however, was to use intelligent innovations in IoT technology to provide the most convenient shopping experience yet.
The Everlasting Database: Statistical Validity at a Fair Price
Woodworth, Blake E., Feldman, Vitaly, Rosset, Saharon, Srebro, Nati
The problem of handling adaptivity in data analysis, intentional or not, permeates a variety of fields, including test-set overfitting in ML challenges and the accumulation of invalid scientific discoveries. We propose a mechanism for answering an arbitrarily long sequence of potentially adaptive statistical queries, by charging a price for each query and using the proceeds to collect additional samples. Crucially, we guarantee statistical validity without any assumptions on how the queries are generated. We also ensure with high probability that the cost for $M$ non-adaptive queries is $O(\log M)$, while the cost to a potentially adaptive user who makes $M$ queries that do not depend on any others is $O(\sqrt{M})$.
Robust Hypothesis Testing Using Wasserstein Uncertainty Sets
GAO, RUI, Xie, Liyan, Xie, Yao, Xu, Huan
We develop a novel computationally efficient and general framework for robust hypothesis testing. The new framework features a new way to construct uncertainty sets under the null and the alternative distributions, which are sets centered around the empirical distribution defined via Wasserstein metric, thus our approach is data-driven and free of distributional assumptions. We develop a convex safe approximation of the minimax formulation and show that such approximation renders a nearly-optimal detector among the family of all possible tests. By exploiting the structure of the least favorable distribution, we also develop a tractable reformulation of such approximation, with complexity independent of the dimension of observation space and can be nearly sample-size-independent in general. Real-data example using human activity data demonstrated the excellent performance of the new robust detector.
Robust Hypothesis Testing Using Wasserstein Uncertainty Sets
GAO, RUI, Xie, Liyan, Xie, Yao, Xu, Huan
We develop a novel computationally efficient and general framework for robust hypothesis testing. The new framework features a new way to construct uncertainty sets under the null and the alternative distributions, which are sets centered around the empirical distribution defined via Wasserstein metric, thus our approach is data-driven and free of distributional assumptions. We develop a convex safe approximation of the minimax formulation and show that such approximation renders a nearly-optimal detector among the family of all possible tests. By exploiting the structure of the least favorable distribution, we also develop a tractable reformulation of such approximation, with complexity independent of the dimension of observation space and can be nearly sample-size-independent in general. Real-data example using human activity data demonstrated the excellent performance of the new robust detector.
The Everlasting Database: Statistical Validity at a Fair Price
Woodworth, Blake E., Feldman, Vitaly, Rosset, Saharon, Srebro, Nati
The problem of handling adaptivity in data analysis, intentional or not, permeates a variety of fields, including test-set overfitting in ML challenges and the accumulation of invalid scientific discoveries. We propose a mechanism for answering an arbitrarily long sequence of potentially adaptive statistical queries, by charging a price for each query and using the proceeds to collect additional samples. Crucially, we guarantee statistical validity without any assumptions on how the queries are generated. We also ensure with high probability that the cost for $M$ non-adaptive queries is $O(\log M)$, while the cost to a potentially adaptive user who makes $M$ queries that do not depend on any others is $O(\sqrt{M})$.
Sling adds Discovery, Science to its lineup
Sling TV's line up of available channels is getting bigger. The streaming TV service is adding nine new channels from Discovery Networks that offer live and on-demand content, including the flagship Discovery Channel and MotorTrend. The best news for Sling subscribers: some of the channels will be added to your package for free. Access to the channels will be split across Sling's two separate service packages, both of which cost $25 per month. Sling Blue will get Discovery Channel, Investigation Discovery and TLC.
The Structure of Optimal Private Tests for Simple Hypotheses
Canonne, Clรฉment L., Kamath, Gautam, McMillan, Audra, Smith, Adam, Ullman, Jonathan
Hypothesis testing plays a central role in statistical inference, and is used in many settings where privacy concerns are paramount. This work answers a basic question about privately testing simple hypotheses: given two distributions $P$ and $Q$, and a privacy level $\varepsilon$, how many i.i.d. samples are needed to distinguish $P$ from $Q$ subject to $\varepsilon$-differential privacy, and what sort of tests have optimal sample complexity? Specifically, we characterize this sample complexity up to constant factors in terms of the structure of $P$ and $Q$ and the privacy level $\varepsilon$, and show that this sample complexity is achieved by a certain randomized and clamped variant of the log-likelihood ratio test. Our result is an analogue of the classical Neyman-Pearson lemma in the setting of private hypothesis testing. We also give an application of our result to the private change-point detection. Our characterization applies more generally to hypothesis tests satisfying essentially any notion of algorithmic stability, which is known to imply strong generalization bounds in adaptive data analysis, and thus our results have applications even when privacy is not a primary concern.
Amazon hit by unexplained data leak just days before Black Friday
Amazon has suffered a customer data leak less than two days before Black Friday. Amazon customer service contacted people to warn them that their names and email addresses had been compromised, though it is not yet clear how many customers were affected or how it happened. An Amazon spokesperson told The Independent: "We have fixed the issue and informed customers who may have been impacted." The customer message stated: "We're contacting you to let you know that our website inadvertently disclosed your name and email address due to a technical error. "The issue has been fixed.
A tutorial on MDL hypothesis testing for graph analysis
Bloem, Peter, de Rooij, Steven
When analysing graph structure, it can be difficult to determine whether patterns found are due to chance, or due to structural aspects of the process that generated the data. Hypothesis tests are often used to support such analyses. These allow us to make statistical inferences about which null models are responsible for the data, and they can be used as a heuristic in searching for meaningful patterns. The minimum description length (MDL) principle [6, 4] allows us to build such hypothesis tests, based on efficient descriptions of the data. Broadly: we translate the regularity we are interested in into a code for the data, and if this code describes the data more efficiently than a code corresponding to the null model, by a sufficient margin, we may reject the null model. This is a frequentist approach to MDL, based on hypothesis testing. Bayesian approaches to MDL for model selection rather than model rejection are more common, but for the purposes of pattern analysis, a hypothesis testing approach provides a more natural fit with existing literature. 1 We provide a brief illustration of this principle based on the running example of analysing the size of the largest clique in a graph. We illustrate how a code can be constructed to efficiently represent graphs with large cliques, and how the description length of the data under this code can be compared to the description length under a code corresponding to a null model to show that the null model is highly unlikely to have generated the data.
Using Any Surface to Realize a New Paradigm for Wireless Communications
Wireless communications have undeniably shaped our everyday lives. We expect ubiquitous connectivity to the Internet, with increasing demands for higher data rates and low lag everywhere: at work, at home, on the road, even with massive crowds of Internet users around us. Despite impressive breakthroughs in almost every part of our wireless devices--from antennas and hardware to operating software--this demand is getting increasingly challenging to address. The large scale of research efforts and investment in the fifth generation (5G) of wireless communications reflects the enormity of the challenge.1 A valuable and seemingly unnoticed resource could be exploited to meet this goal.