Dealing with the shear size and complexity of today's massive data sets requires computational platforms that can analyze data in a parallelized and distributed fashion. A major bottleneck that arises in such modern distributed computing environments is that some of the worker nodes may run slow. These nodes a.k.a.~stragglers can significantly slow down computation as the slowest node may dictate the overall computational time. A recent computational framework, called encoded optimization, creates redundancy in the data to mitigate the effect of stragglers. In this paper we develop novel mathematical understanding for this framework demonstrating its effectiveness in much broader settings than was previously understood. We also analyze the convergence behavior of iterative encoded optimization algorithms, allowing us to characterize fundamental trade-offs between convergence rate, size of data set, accuracy, computational load (or data redundancy), and straggler toleration in this framework.
The artist and computer scientist Terence Broad built an autoencoder, a type of artificial neural network, and showed it the classic science-fiction film Blade Runner (1982). He trained the autoencoder to remember every individual frame of the film and to reconstruct each one as a memory, on view here. In the original film, a bounty hunter hunts down androids that are so well engineered that they are indistinguishable from humans. Here, we face a similar challenge, as we trying to identify the original film within the AI's program's perception of it. Terence Broad, Blade Runner--Autoencoded, 2016 Advance tickets are required.
Text data requires special preparation before you can start using it for predictive modeling. The text must be parsed to remove words, called tokenization. Then the words need to be encoded as integers or floating point values for use as input to a machine learning algorithm, called feature extraction (or vectorization). The scikit-learn library offers easy-to-use tools to perform both tokenization and feature extraction of your text data. In this tutorial, you will discover exactly how you can prepare your text data for predictive modeling in Python with scikit-learn.
We characterized the DNA-recognizing domains of the TAL effectors with respect to binding affinity and sequence specificity. To construct the staple proteins, we fused two TAL proteins via a custom peptide linker and tested for the ability to connect two separate double-helical DNA domains. For creating larger objects containing multiple staple protein connections, we identified a set of rules regarding the optimal spacing between these connections. On the basis of these rules, we could create megadalton-scale objects that realize a variety of structural motifs, such as custom curvatures, vertices, and corners. Each of those objects was built from a set of 12 double-TAL staple proteins and a template DNA double strand with designed sequence.