Supervised Learning
Roger Federer ties a Wimbledon record set by Jimmy Connors
Looking in fine form after two days of rest, Roger Federer equaled Jimmy Connors' Open-era record by reaching his 14th Wimbledon quarterfinal and added to his own mark by making it at least that far at a Grand Slam tournament for the 48th time. Federer, a seven-time champion at the All England Club, has not dropped a set in the tournament through four matches after beating unseeded American Steve Johnson 6-2, 6-3, 7-5 at Centre Court on Monday. Johnson was making his debut in the fourth round of a major. The No. 3-seeded Federer hadn't played since Friday, when he was the only man to finish a third-round match. He next faces No. 9 Marin Cilic, the 2014 US Open champion, who advanced when Kei Nishikori retired from their fourth-round match.
Structured Prediction Energy Networks
Belanger, David, McCallum, Andrew
We introduce structured prediction energy networks (SPENs), a flexible framework for structured prediction. A deep architecture is used to define an energy function of candidate labels, and then predictions are produced by using back-propagation to iteratively optimize the energy with respect to the labels. This deep architecture captures dependencies between labels that would lead to intractable graphical models, and performs structure learning by automatically learning discriminative features of the structured output. One natural application of our technique is multi-label classification, which traditionally has required strict prior assumptions about the interactions between labels to ensure tractable learning and prediction. We are able to apply SPENs to multi-label problems with substantially larger label sets than previous applications of structured prediction, while modeling high-order interactions using minimal structural assumptions. Overall, deep learning provides remarkable tools for learning features of the inputs to a prediction problem, and this work extends these techniques to learning features of structured outputs. Our experiments provide impressive performance on a variety of benchmark multi-label classification tasks, demonstrate that our technique can be used to provide interpretable structure learning, and illuminate fundamental trade-offs between feed-forward and iterative structured prediction.
Quantifying and Reducing Stereotypes in Word Embeddings
Bolukbasi, Tolga, Chang, Kai-Wei, Zou, James, Saligrama, Venkatesh, Kalai, Adam
Machine learning algorithms are optimized to model statistical properties of the training data. If the input data reflects stereotypes and biases of the broader society, then the output of the learning algorithm also captures these stereotypes. In this paper, we initiate the study of gender stereotypes in {\em word embedding}, a popular framework to represent text data. As their use becomes increasingly common, applications can inadvertently amplify unwanted stereotypes. We show across multiple datasets that the embeddings contain significant gender stereotypes, especially with regard to professions. We created a novel gender analogy task and combined it with crowdsourcing to systematically quantify the gender bias in a given embedding. We developed an efficient algorithm that reduces gender stereotype using just a handful of training examples while preserving the useful geometric properties of the embedding. We evaluated our algorithm on several metrics. While we focus on male/female stereotypes, our framework may be applicable to other types of embedding biases.
Predictive modelling, how to build ground-truth and extract features for action prediction? โข /r/MachineLearning
I have a dataset of users, each user has has daily information about his activities (numerical values representing some measurements of his physical activities). In addition, each user in each day has a boolean value that represents if he/she took a particular action. The data set is not fixed, so new activities information and action are added for each user each new day. Build a model that predicts which user is likely to take the action in the near future (e.g. in any of the next 7 days). My approach is to build feature vectors representing the activity values for each users over a period of time, and use the action column as a source of ground-truth.
Resource Constrained Structured Prediction
Bolukbasi, Tolga, Chang, Kai-Wei, Wang, Joseph, Saligrama, Venkatesh
We study the problem of structured prediction under test-time budget constraints. We propose a novel approach applicable to a wide range of structured prediction problems in computer vision and natural language processing. Our approach seeks to adaptively generate computationally costly features during test-time in order to reduce the computational cost of prediction while maintaining prediction performance. We show that training the adaptive feature generation system can be reduced to a series of structured learning problems, resulting in efficient training using existing structured learning algorithms. This framework provides theoretical justification for several existing heuristic approaches found in literature. We evaluate our proposed adaptive system on two structured prediction tasks, optical character recognition (OCR) and dependency parsing and show strong performance in reduction of the feature costs without degrading accuracy.
Linearly Independent Sets in Vector Spaces induced by Kernels โข /r/MachineLearning
I hope this post is okay (if not let me know). I'm attaching a pdf which rigorously defines my question. Briefly, what I'm wondering is this - for the set of data points {x1,...,xp} in a vector space, (say, Rn) under what conditions is the set {k(x1,),...,k(xp,)} (where k(,) is a kernel function) independent? What conditions must the set {x1,...,xp} and the kernel function have to ensure independence? If there isn't an immediate answer to this question I'll happily take recommendations for mathematical reading towards trying to answer this question.
Fearless Frenchman breaks hoverboard record, sets sights on the clouds
A fearless Frenchman, Franky Zapata, thinks one day people will be able to ride his hoverboard to pick up bread in the morning (it's a French thing). The jet ski champion on Saturday set a new Guinness World Record for the farthest hoverboard flight โ yes, just like in the movies โ off the coast of Sausset-les-Pins in the south of France. Mr. Zapata rode the 1,000 horsepower drone, standing on top of it, for 7,388 feet, or more than a mile. He hovered 165 feet above the surface of the water, "trailed by a fleet of boats and jet skis," as Guinness reports. His feat shattered the previous hoverboard travel record of 905 feet and 2 inches, set last year by Canadian inventor Catalin Alexandru Duru.
How To Extract Feature Vectors From Deep Neural Networks In Python Caffe
Convolutional Neural Networks are great at identifying all the information that makes an image distinct. When we train a deep neural network in Caffe to classify images, we specify a multilayered neural network with different types of layers like convolution, rectified linear unit, softmax loss, and so on. The last layer is the output layer that gives us the output tag with the corresponding confidence value. But sometimes it's useful for us to extract the feature vectors from various layers and use it for other purposes. Let's see how to do it in Python Caffe, shall we?
Multi-Instance Multi-Label Class Discovery: A Computational Approach for Assessing Bird Biodiversity
Briggs, Forrest (Facebook, Inc.) | Fern, Xiaoli Z. (Oregon State University) | Raich, Raviv (Oregon State University) | Betts, Matthew (Oregon State University)
Briggs et al. (2012b) proposed to represent audio Bioacoustic monitoring is a rapidly growing field, where the recordings of bird sound in the multi-instance multi-label goal is to learn about organisms such as birds and marine (MIML) framework (Zhou et al. 2012). In this formulation, mammals, by applying signal processing and machine learning an audio recording is transformed to a spectrogram, to audio recordings. In this paper, we consider the problem then automatically segmented into a collection of regions of class discovery from bird bioacoustics data. Given believed to be distinct utterances of bird sound. Each segment a large collection of audio recordings of birds (and other is then described by a feature vector that characterizes sounds in the environment), our goal is to automatically select its shape, texture, and time/frequency profiles. A recording a subset of recordings to be manually labeled by human is represented as a set of segment feature vectors (instances).