Education
Probably Approximately Correct
The best we can hope for when it comes to most decisions is to be probably approximately correctโa high probability of being about right. In finance, analysts compare proposed capital costs with discounted anticipated future cash flows to calculate a net present valueโa bunch of assumptions with the hope of being probably approximately correct. Insurance is a hedge against a big loss; it's based on the probability of bad stuff happeningโthe insurance company makes a little money if their calculations are probably approximately correct. A doctor takes a few data points and makes a diagnosis hoping she is probably approximately correct. School facilities planners estimate future enrollment trends and then school boards estimate the likelihood of a community support for a construction bond, both hope to be probably approximately correct.
Computers will outperform doctors at diagnosing illnesses, says government technology adviser
In 2014, the government brought in a new curriculum, which included coding lessons for children. But Prof Susskind said that the development of new, "self-coding" systems meant that such lessons were obsolete. He added: "I belong to the school of thought who don't believe it's a particularly great use of people's time and energy to code. Our thesis is that the next generation of systems will be writing themselves. Automatic code generation is already very common. "Low-level code generation is actually a great intellectual exercise, it's a bit like studying logic, but I don't believe that people learning to code in school will find in seven or eight years that they'll be employable for that reason alone.
Give a Robot a Fish
A sign on a door located on the ninth floor of UCLA's Boelter Hall reads, "Beware of Robot." Inside, stationed at the center of the room, is Tony. He stands more than 5 feet tall with a black torso, dark red rolling base, two large arms, an Internet router on his back and an Xbox One Kinect mounted on his head. Assembled part by part over the past year and costing more than 60,000, Tony has been programmed to open doors, fold clothes and assemble furniture. Surrounding him is a team of researchers who aim to eventually give him human-level cognition.
Amazon Enters Into Open-source Software World - TechStory
Amazon discreetly released a library called DSSTNE on GitHub under an open-source Apache license and made an entry into the world of open-source software for deep learning. DSSTNE (pronounced "Destiny") is an open source software library for training and deploying deep neural networks using GPUs. Amazon engineers built DSSTNE to solve deep learning problems at Amazon's scale. DSSTNE is built for production deployment of real-world deep learning applications, emphasizing speed and scale over experimental flexibility. "DSSTNE's network definition language is much simpler than Caffe's, as it would require only 33 lines of code to express the popular AlexNet image recognition model, whereas Caffe's language requires over 300 lines of code," Amazon wrote on the FAQ page.
Estimating Treatment Effects using Multiple Surrogates: The Role of the Surrogate Score and the Surrogate Index
Athey, Susan, Chetty, Raj, Imbens, Guido, Kang, Hyunseung
Estimating the long-term effects of treatments is of interest in many fields. A common challenge in estimating such treatment effects is that long-term outcomes are unobserved in the time frame needed to make policy decisions. One approach to overcome this missing data problem is to analyze treatments effects on an intermediate outcome, often called a statistical surrogate, if it satisfies the condition that treatment and outcome are independent conditional on the statistical surrogate. The validity of the surrogacy condition is often controversial. Here we exploit that fact that in modern datasets, researchers often observe a large number, possibly hundreds or thousands, of intermediate outcomes, thought to lie on or close to the causal chain between the treatment and the long-term outcome of interest. Even if none of the individual proxies satisfies the statistical surrogacy criterion by itself, using multiple proxies can be useful in causal inference. We focus primarily on a setting with two samples, an experimental sample containing data about the treatment indicator and the surrogates and an observational sample containing information about the surrogates and the primary outcome. We state assumptions under which the average treatment effect be identified and estimated with a high-dimensional vector of proxies that collectively satisfy the surrogacy assumption, and derive the bias from violations of the surrogacy assumption, and show that even if the primary outcome is also observed in the experimental sample, there is still information to be gained from using surrogates.
IBM's brilliant AI just helped teach a grad-level college course
A student in Ashok Goel's class last semester had a question: How long could the computer programs, or "agents," they were building take to solve problems? Since it was an online course, the student posted the question to the group discussion board. One teaching assistant replied, pointing to a portion of the assignment that set a 15 minute limit. The student clarified that their agent was running a little slow, and could take a bit longer. "It's fine if your agent takes a few minutes to run," she wrote.
SML: Syllabus
Scalable Machine Learning occurs when Statistics, Systems, Machine Learning and Data Mining are combined into flexible, often nonparametric, and scalable techniques for analyzing large amounts of data at internet scale. This class aims to teach methods which are going to power the next generation of internet applications. The class will cover systems and processing paradigms, an introduction to statistical analysis, algorithms for data streams, generalized linear methods (logistic models, support vector machines, etc.), large scale convex optimization, kernels, graphical models and inference algorithms such as sampling and variational approximations, and explore/exploit mechanisms. Applications include social recommender systems, real time analytics, spam filtering, topic models, and document analysis.
A Graph-Based Semi-Supervised k Nearest-Neighbor Method for Nonlinear Manifold Distributed Data Classification
Tu, Enmei, Zhang, Yaqian, Zhu, Lin, Yang, Jie, Kasabov, Nikola
$k$ Nearest Neighbors ($k$NN) is one of the most widely used supervised learning algorithms to classify Gaussian distributed data, but it does not achieve good results when it is applied to nonlinear manifold distributed data, especially when a very limited amount of labeled samples are available. In this paper, we propose a new graph-based $k$NN algorithm which can effectively handle both Gaussian distributed data and nonlinear manifold distributed data. To achieve this goal, we first propose a constrained Tired Random Walk (TRW) by constructing an $R$-level nearest-neighbor strengthened tree over the graph, and then compute a TRW matrix for similarity measurement purposes. After this, the nearest neighbors are identified according to the TRW matrix and the class label of a query point is determined by the sum of all the TRW weights of its nearest neighbors. To deal with online situations, we also propose a new algorithm to handle sequential samples based a local neighborhood reconstruction. Comparison experiments are conducted on both synthetic data sets and real-world data sets to demonstrate the validity of the proposed new $k$NN algorithm and its improvements to other version of $k$NN algorithms. Given the widespread appearance of manifold structures in real-world problems and the popularity of the traditional $k$NN algorithm, the proposed manifold version $k$NN shows promising potential for classifying manifold-distributed data.
Combining Multiple Clusterings via Crowd Agreement Estimation and Multi-Granularity Link Analysis
Huang, Dong, Lai, Jian-Huang, Wang, Chang-Dong
The clustering ensemble technique aims to combine multiple clusterings into a probably better and more robust clustering and has been receiving an increasing attention in recent years. There are mainly two aspects of limitations in the existing clustering ensemble approaches. Firstly, many approaches lack the ability to weight the base clusterings without access to the original data and can be affected significantly by the low-quality, or even ill clusterings. Secondly, they generally focus on the instance level or cluster level in the ensemble system and fail to integrate multi-granularity cues into a unified model. To address these two limitations, this paper proposes to solve the clustering ensemble problem via crowd agreement estimation and multigranularity link analysis. We present the normalized crowd agreement index (NCAI) to evaluate the quality of base clusterings in an unsupervised manner and thus weight the base clusterings in accordance with their clustering validity. To explore the relationship between clusters, the source aware connected triple (SACT) similarity is introduced with regard to their common neighbors and the source reliability. Present address: School of Information Science and Technology, Sun Yat-sen University, Guangzhou Higher Education Mega Center, Panyu District, Guangzhou, Guangdong, 510006, P. R. China. The experiments are conducted on eight real-world datasets. The experimental results demonstrate the effectiveness and robustness of the proposed methods. Keywords: Clustering ensemble, Clustering aggregation, Weighted evidence accumulation clustering, Graph partitioning with multi-granularity link analysis 1. Introduction Data clustering is a fundamental and very challenging problem in data mining and machine learning.