Supervised Learning
Creating Images by Learning Image Semantics Using Vector Space Models
Heath, Derrall (Brigham Young University) | Ventura, Dan (Brigham Young University)
When dealing with images and semantics, most computational systems attempt to automatically extract meaning from images. Here we attempt to go the other direction and autonomously create images that communicate concepts. We present an enhanced semantic model that is used to generate novel images that convey meaning. We employ a vector space model and a large corpus to learn vector representations of words and then train the semantic model to predict word vectors that could describe a given image. Once trained, the model autonomously guides the process of rendering images that convey particular concepts. A significant contribution is that, because of the semantic associations encoded in these word vectors, we can also render images that convey concepts on which the model was not explicitly trained. We evaluate the semantic model with an image clustering technique and demonstrate that the model is successful in creating images that communicate semantic relationships.
Aggregating Inter-Sentence Information to Enhance Relation Extraction
Zheng, Hao (Beihang University) | Li, Zhoujun (Beihang University) | Wang, Senzhang (Beihang University) | Yan, Zhao ( Beihang University ) | Zhou, Jianshe ( Capital Normal University )
Previous work for relation extraction from free text is mainly based on intra-sentence information. As relations might be mentioned across sentences, inter-sentence information can be leveraged to improve distantly supervised relation extraction. To effectively exploit inter-sentence information, we propose a ranking based approach, which first learns a scoring function based on a listwise learning-to-rank model and then uses it for multi-label relation extraction. Experimental results verify the effectiveness of our method for aggregating information across sentences. Additionally, to further improve the ranking of high-quality extractions, we propose an effective method to rank relations from different entity pairs. This method can be easily integrated into our overall relation extraction framework, and boosts the precision significantly.
A Generative Model of Words and Relationships from Multiple Sources
Hyland, Stephanie L. (Weill Cornell Graduate School of Medical Sciences/Memorial Sloan Kettering Cancer Center) | Karaletsos, Theofanis (Memorial Sloan Kettering Cancer Center) | Rรคtsch, Gunnar (Memorial Sloan Kettering Cancer Center)
Neural language models are a powerful tool to embed words into semantic vector spaces. However, learning such models generally relies on the availability of abundant and diverse training examples. In highly specialised domains this requirement may not be met due to difficulties in obtaining a large corpus, or the limited range of expression in average use. Such domains may encode prior knowledge about entities in a knowledge base or ontology. We propose a generative model which integrates evidence from diverse data sources, enabling the sharing of semantic information. We achieve this by generalising the concept of co-occurrence from distributional semantics to include other relationships between entities or words, which we model as affine transformations on the embedding space. We demonstrate the effectiveness of this approach by outperforming recent models on a link prediction task and demonstrating its ability to profit from partially or fully unobserved data training labels. We further demonstrate the usefulness of learning from different data sources with overlapping vocabularies.
Robustness of Bayesian Pool-Based Active Learning Against Prior Misspecification
Cuong, Nguyen Viet (National University of Singapore) | Ye, Nan (Queensland University of Technology) | Lee, Wee Sun (National University of Singapore)
We study the robustness of active learning (AL) algorithms against prior misspecification: whether an algorithm achieves similar performance using a perturbed prior as compared to using the true prior. In both the average and worst cases of the maximum coverage setting, we prove that all alpha-approximate algorithms are robust (i.e., near alpha-approximate) if the utility is Lipschitz continuous in the prior. We further show that robustness may not be achieved if the utility is non-Lipschitz. This suggests we should use a Lipschitz utility for AL if robustness is required. For the minimum cost setting, we can also obtain a robustness result for approximate AL algorithms. Our results imply that many commonly used AL algorithms are robust against perturbed priors. We then propose the use of a mixture prior to alleviate the problem of prior misspecification. We analyze the robustness of the uniform mixture prior and show experimentally that it performs reasonably well in practice.
Who are alike? Use BigObject feature vector to find similarities
Cluster Analysis is a common technique to group a set of objects in the way that the objects in the same group share certain attributes. It's commonly used in marketing and sales planning to define market segmentations. Here at BigObject we adopt a simple approach to exploring the similarities between objects. We simply calculate the "Feature Vector" based on given attributes and use the score to determine which objects are "alike." This is a simple example to show how to use BigObject to extract product features and then find similar products in your retail data.
Human dominoes record broken
On Thursday, Aaron's Inc., a Maryland-based appliance and electronics company, set a new Guinness World Record for the "largest human mattress dominoes" chain with 1,200 participants taking a total of 13 minutes and 38 seconds to complete the larger than life feat. Event organizers used two exhibit halls covering 70,000 square feet to set up 34 rows of mattresses. The first mattress was pushed over by Aaron's CEO John Robinson. "Breaking a Guinness World Records title has been a great team building event for the associates we have attending our National Managers meeting," said Robinson at the event. The event not only broke a world record but supported a great cause.
Machine Learning Methods: Classification without negative examples โ EFavDB
Here, we discuss some methods for carrying out classification when only positive examples are available. The latter half of our discussion borrows heavily from W.S. Lee and B. Liu, Proc. Follow @efavdb Follow us on twitter for new submission alerts! Logistic regression is a commonly used tool for estimating the level sets of a Boolean function y on a set of feature vectors \textbf{F}: In a sense, you can think of it as a method for playing the game "Battleship" on whatever data set you're interested in. Consider now a situation where all training examples given are positive -- i.e., no negative examples are available.
Robustness of Bayesian Pool-based Active Learning Against Prior Misspecification
Cuong, Nguyen Viet, Ye, Nan, Lee, Wee Sun
We study the robustness of active learning (AL) algorithms against prior misspecification: whether an algorithm achieves similar performance using a perturbed prior as compared to using the true prior. In both the average and worst cases of the maximum coverage setting, we prove that all $\alpha$-approximate algorithms are robust (i.e., near $\alpha$-approximate) if the utility is Lipschitz continuous in the prior. We further show that robustness may not be achieved if the utility is non-Lipschitz. This suggests we should use a Lipschitz utility for AL if robustness is required. For the minimum cost setting, we can also obtain a robustness result for approximate AL algorithms. Our results imply that many commonly used AL algorithms are robust against perturbed priors. We then propose the use of a mixture prior to alleviate the problem of prior misspecification. We analyze the robustness of the uniform mixture prior and show experimentally that it performs reasonably well in practice.
Maximum margin classifier working in a set of strings
Koyano, Hitoshi, Hayashida, Morihiro, Akutsu, Tatsuya
Numbers and numerical vectors account for a large portion of data. However, recently the amount of string data generated has increased dramatically. Consequently, classifying string data is a common problem in many fields. The most widely used approach to this problem is to convert strings into numerical vectors using string kernels and subsequently apply a support vector machine that works in a numerical vector space. However, this non-one-to-one conversion involves a loss of information and makes it impossible to evaluate, using probability theory, the generalization error of a learning machine, considering that the given data to train and test the machine are strings generated according to probability laws. In this study, we approach this classification problem by constructing a classifier that works in a set of strings. To evaluate the generalization error of such a classifier theoretically, probability theory for strings is required. Therefore, we first extend a limit theorem on the asymptotic behavior of a consensus sequence of strings, which is the counterpart of the mean of numerical vectors, as demonstrated in the probability theory on a metric space of strings developed by one of the authors and his colleague in a previous study [18]. Using the obtained result, we then demonstrate that our learning machine classifies strings in an asymptotically optimal manner. Furthermore, we demonstrate the usefulness of our machine in practical data analysis by applying it to predicting protein--protein interactions using amino acid sequences.
First-order Methods for Geodesically Convex Optimization
Geodesic convexity generalizes the notion of (vector space) convexity to nonlinear metric spaces. But unlike convex optimization, geodesically convex (g-convex) optimization is much less developed. In this paper we contribute to the understanding of g-convex optimization by developing iteration complexity analysis for several first-order algorithms on Hadamard manifolds. Specifically, we prove upper bounds for the global complexity of deterministic and stochastic (sub)gradient methods for optimizing smooth and nonsmooth g-convex functions, both with and without strong g-convexity. Our analysis also reveals how the manifold geometry, especially \emph{sectional curvature}, impacts convergence rates. To the best of our knowledge, our work is the first to provide global complexity analysis for first-order algorithms for general g-convex optimization.