Collaborating Authors

Deep metric learning improves lab of origin prediction of genetically engineered plasmids


Genome engineering is undergoing unprecedented development and is now becoming widely available. To ensure responsible biotechnology innovation and to reduce misuse of engineered DNA sequences, it is vital to develop tools to identify the lab-of-origin of engineered plasmids. Genetic engineering attribution (GEA), the ability to make sequence-lab associations, would support forensic experts in this process. Here, we propose a method, based on metric learning, that ranks the most likely labs-of-origin whilst simultaneously generating embeddings for plasmid sequences and labs. These embeddings can be used to perform various downstream tasks, such as clustering DNA sequences and labs, as well as using them as features in machine learning models.

Sequence Transduction with Recurrent Neural Networks Machine Learning

Many machine learning tasks can be expressed as the transformation---or \emph{transduction}---of input sequences into output sequences: speech recognition, machine translation, protein secondary structure prediction and text-to-speech to name but a few. One of the key challenges in sequence transduction is learning to represent both the input and output sequences in a way that is invariant to sequential distortions such as shrinking, stretching and translating. Recurrent neural networks (RNNs) are a powerful sequence learning architecture that has proven capable of learning such representations. However RNNs traditionally require a pre-defined alignment between the input and output sequences to perform transduction. This is a severe limitation since \emph{finding} the alignment is the most difficult aspect of many sequence transduction problems. Indeed, even determining the length of the output sequence is often challenging. This paper introduces an end-to-end, probabilistic sequence transduction system, based entirely on RNNs, that is in principle able to transform any input sequence into any finite, discrete output sequence. Experimental results for phoneme recognition are provided on the TIMIT speech corpus.

r/MachineLearning - [Discussion] Google Patents "Generating output sequences from input sequences using neural networks"


Between this, the GAN evaluation paper which happened to be really similar to a previously published paper by other authors, and DeepMind's PR machine while lacking in exhibiting the crucial details which make their Go models so good, I am definitely more and more disappointed in DeepMind ...

Artificial Intelligence -Modules Part II


In the above diagram there are many concepts needed to understand and a brief explanation of these concepts is beyond this article and you can find here. These are Kernel or Filter, Relu, Pooling, Convolution, Stride, Flatten, Fully connected. RNNs can be widely used in Text Analysis, Sequence Models, Time-Series Analysis and Video Processing. Whereas in Language Modeling tasks require modeling a large number of possible values (words in the vocabulary) per input feature. RNN and its variants can be easily understand in Colah's Blog.

Mystical Tutor: A Magic: The Gathering Design Assistant via Denoising Sequence-to-Sequence Learning

AAAI Conferences

Procedural Content Generation (PCG) has seen heavy focus on the generation of levels for video games, aesthetic content, and on rule creation, but has seen little use in other domains. Recently, the ready availability of Long Short Term Memory Recurrent Neural Networks (LSTM RNNs) has seen a rise in text based procedural generation, including card designs for Collectible Card Games (CCGs) like Hearthstone or Magic: The Gathering . In this work we present a mixed-initiative design tool, Mystical Tutor, that allows a user to type in a partial specification for a card and receive a full card design. This is achieved by using sequence-to-sequence learning as a denoising sequence autoencoder, allowing Mystical Tutor to learn how to translate from partial specifications to full.