By way of explaining how a brain works logically, human associative memory is modeled with logical and memory neurons, corresponding to standard digital circuits. The resulting cognitive architecture incorporates basic psychological elements such as short term and long term memory. Novel to the architecture are memory searches using cues chosen pseudorandomly from short term memory. Recalls alternated with sensory images, many tens per second, are analyzed subliminally as an ongoing process, to determine a direction of attention in short term memory.
Information storage in DNA is the cornerstone of biology. Interestingly, prokaryotes can store information in specific loci in their DNA to remember encounters with invaders (such as bacteriophages--viruses that infect bacteria). Short samples of DNA from invaders are inserted as "spacers" into the CRISPR array. The array thus contains samples of DNA invaders in a defined locus that is recognized by Cas proteins that further process this information. This enables bacteria to adaptively and specifically respond to invading DNA that they have experienced before.
Memorization is worst-case generalization. Based on MacKay's information theoretic model of supervised machine learning, this article discusses how to practically estimate the maximum size of a neural network given a training data set. First, we present four easily applicable rules to analytically determine the capacity of neural network architectures. This allows the comparison of the efficiency of different network architectures independently of a task. Second, we introduce and experimentally validate a heuristic method to estimate the neural network capacity requirement for a given dataset and labeling. This allows an estimate of the required size of a neural network for a given problem. We conclude the article with a discussion on the consequences of sizing the network wrongly, which includes both increased computation effort for training as well as reduced generalization capability.
Machine learning models based on neural networks and deep learning are being rapidly adopted for many purposes. What those models learn, and what they may share, is a significant concern when the training data may contain secrets and the models are public -- e.g., when a model helps users compose text messages using models trained on all users' messages. This paper presents exposure: a simple-to-compute metric that can be applied to any deep learning model for measuring the memorization of secrets. Using this metric, we show how to extract those secrets efficiently using black-box API access. Further, we show that unintended memorization occurs early, is not due to over-fitting, and is a persistent issue across different types of models, hyperparameters, and training strategies. We experiment with both real-world models (e.g., a state-of-the-art translation model) and datasets (e.g., the Enron email dataset, which contains users' credit card numbers) to demonstrate both the utility of measuring exposure and the ability to extract secrets. Finally, we consider many defenses, finding some ineffective (like regularization), and others to lack guarantees. However, by instantiating our own differentially-private recurrent model, we validate that by appropriately investing in the use of state-of-the-art techniques, the problem can be resolved, with high utility.
This paper studies the relationship between the classification performed by deep neural networks and the $k$-NN decision at the embedding space of these networks. This simple important connection shown here provides a better understanding of the relationship between the ability of neural networks to generalize and their tendency to memorize the training data, which are traditionally considered to be contradicting to each other and here shown to be compatible and complementary. Our results support the conjecture that deep neural networks approach Bayes optimal error rates.