If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."
However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …
Cover song identification is an important problem in the field of Music Information Retrieval. Most existing methods rely on hand-crafted features and sequence alignment methods, and further breakthrough is hard to achieve. In this paper, Convolutional Neural Networks (CNNs) are used for representation learning toward this task. We show that they could be naturally adapted to deal with key transposition in cover songs. Additionally, Temporal Pyramid Pooling is utilized to extract information on different scales and transform songs with different lengths into fixed-dimensional representations.
Lately, I've been working on a couple of scenarios that have reminded me of the importance of feature extraction in deep learning models. As a result, I would like to summarize some ideas I've outlined before about some of the principles of knowledge quality in deep learning and model and the applicability of representation learning to those scenarios. Understanding the characteristics of input datasets is an essential capability of machine learning algorithms. Given a specific input, machine learning models need to infer specific features about the data in order to perform some target actions. Representation learning or feature learning is the subdiscipline of the machine learning space that deals with extracting features or understanding the representation of a dataset.
The year 2019 was an excellent year for the developers, as almost all industry leaders open-sourced their machine learning tool kits. Open-sourcing not only help the users but also helps the tool itself as developers can contribute and add customisations that serve few complex applications. The benefit is mutual and also helps in accelerating the democratisation of ML. LIGHT (Learning in Interactive Games with Humans and Text) -- a large-scale fantasy text adventure game and research platform for training agents that can both talk and act, interacting either with other models or humans. The game uses natural language that's entirely written by the people who are playing the game.
Suppose that you have a very long list of string sequences, such as a list of amino acid structures ('PHE-SER-CYS', 'GLN-ARG-SER',…), product serial numbers ('AB121E', 'AB323', 'DN176'…), or users UIDs, and you are required to create a validation process of some kind that will detect anomalies in this sequence. An anomaly might be a string that follows a slightly different or unusual format than the others (whether it was created by mistake or on purpose) or just one that is extremely rare. To make things even more interesting, suppose that you don't know what is the correct format or structure that sequences suppose to follow. This is a relatively common problem (though with an uncommon twist) that many data scientists usually approach using one of the popular unsupervised ML algorithms, such as DBScan, Isolation Forest, etc. Many of these algorithms typically do a good job in finding anomalies or outliers by singling out data points that are relatively far from the others or from areas in which most data points lie.
When people create, it's not very often they achieve what they're looking for on the first try. Creating--whether it be a painting, a paper, or a machine learning model--is a process that has a starting point from which new elements and ideas are added and old ones are modified and discarded, sometimes again and again, until the work accomplishes its intended purpose: to evoke emotion, to convey a message, to complete a task. Since I began my work as a researcher, machine learning systems have gotten really good at a particular form of creation that has caught my attention: image generation. Looking at some of the images generated by systems such as BigGAN and ProGAN, you wouldn't be able to tell they were produced by a computer. In these advancements, my colleagues and I see an opportunity to help people create visuals and better express themselves through the medium--from improving the user experience when it comes to designing avatars in the gaming world to making the editing of personal photos and production of digital art in software like Photoshop, which can be challenging to those unfamiliar with such programs' capabilities, easier.
We consider the problem of online reinforcement learning when several state representations (mapping histories to a discrete state space) are available to the learning agent. At least one of these representations is assumed to induce a Markov decision process (MDP), and the performance of the agent is measured in terms of cumulative regret against the optimal policy giving the highest average reward in this MDP representation. We propose an algorithm (UCB-MS) with O(sqrt(T)) regret in any communicating Markov decision process. The regret bound shows that UCB-MS automatically adapts to the Markov model. This improves over the currently known best results in the literature that gave regret bounds of order O(T (2/3)).
Visual Question answering is a challenging problem requiring a combination of concepts from Computer Vision and Natural Language Processing. Most existing approaches use a two streams strategy, computing image and question features that are consequently merged using a variety of techniques. Nonetheless, very few rely on higher level image representations, which can capture semantic and spatial relationships. In this paper, we propose a novel graph-based approach for Visual Question Answering. Our method combines a graph learner module, which learns a question specific graph representation of the input image, with the recent concept of graph convolutions, aiming to learn image representations that capture question specific interactions.
Many neurons in the brain, such as place cells in the rodent hippocampus, have localized receptive fields, i.e., they respond to a small neighborhood of stimulus space. What is the functional significance of such representations and how can they arise? Here, we propose that localized receptive fields emerge in similarity-preserving networks of rectifying neurons that learn low-dimensional manifolds populated by sensory inputs. Numerical simulations of such networks on standard datasets yield manifold-tiling localized receptive fields. More generally, we show analytically that, for data lying on symmetric manifolds, optimal solutions of objectives, from which similarity-preserving networks are derived, have localized receptive fields.
We propose a deep generative model for learning to distill the hidden factors of variation within a set of labeled observations into two complementary codes. One code describes the factors of variation relevant to solving a specified task. The other code describes the remaining factors of variation that are irrelevant to solving this task. The only available source of supervision during the training process comes from our ability to distinguish among different observations belonging to the same category. Concrete examples include multiple images of the same object from different viewpoints, or multiple speech samples from the same speaker.
We describe an approach for unsupervised learning of a generic, distributed sentence encoder. Using the continuity of text from books, we train an encoder-decoder model that tries to reconstruct the surrounding sentences of an encoded passage. Sentences that share semantic and syntactic properties are thus mapped to similar vector representations. We next introduce a simple vocabulary expansion method to encode words that were not seen as part of training, allowing us to expand our vocabulary to a million words. After training our model, we extract and evaluate our vectors with linear models on 8 tasks: semantic relatedness, paraphrase detection, image-sentence ranking, question-type classification and 4 benchmark sentiment and subjectivity datasets.