# United States

### Deep biomarkers of aging and longevity: From research to applications

IMAGE: Using age predictors within specified age groups to infer causality and identify therapeutic interventions. The deep age predictors can help advance aging research by establishing causal relationships in nonlinear systems. Deep aging clocks can be used for identification of novel therapeutic targets, evaluating the efficacy of various interventions, data quality control, data economics, prediction of health trajectories, mortality, and many other applications. Dr. Alex Zhavoronkov from Insilico Medicine, Hong Kong Science and Technology Park, in Hong Kong, China & The Buck Institute for Research on Aging in Novato, California, USA as well as The Biogerontology Research Foundation in London, UK said "The recent hype cycle in artificial intelligence (AI) resulted in substantial investment in machine learning and increase in available talent in almost every industry and country." Over many generations humans have evolved to develop from a single-cell embryo within a female organism, come out, grow with the help of other humans, reach reproductive age, reproduce, take care of the young, and gradually decline.

### Spherical Text Embedding

Unsupervised text embedding has shown great power in a wide range of NLP tasks. While text embeddings are typically learned in the Euclidean space, directional similarity is often more effective in tasks such as word similarity and document clustering, which creates a gap between the training stage and usage stage of text embedding. To close this gap, we propose a spherical generative model based on which unsupervised word and paragraph embeddings are jointly learned. To learn text embeddings in the spherical space, we develop an efficient optimization algorithm with convergence guarantee based on Riemannian optimization. Our model enjoys high efficiency and achieves state-of-the-art performances on various text embedding tasks including word similarity and document clustering.

### Scalable Inference for Neuronal Connectivity from Calcium Imaging

Fluorescent calcium imaging provides a potentially powerful tool for inferring connectivity in neural circuits with up to thousands of neurons. However, a key challenge in using calcium imaging for connectivity detection is that current systems often have a temporal response and frame rate that can be orders of magnitude slower than the underlying neural spiking process. Bayesian inference based on expectation-maximization (EM) have been proposed to overcome these limitations, but they are often computationally demanding since the E-step in the EM procedure typically involves state estimation in a high-dimensional nonlinear dynamical system. In this work, we propose a computationally fast method for the state estimation based on a hybrid of loopy belief propagation and approximate message passing (AMP). The key insight is that a neural system as viewed through calcium imaging can be factorized into simple scalar dynamical systems for each neuron with linear interconnections between the neurons.

### Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Gradient Estimators for Reinforcement Learning

Gradient-based methods for optimisation of objectives in stochastic settings with unknown or intractable dynamics require estimators of derivatives. We derive an objective that, under automatic differentiation, produces low-variance unbiased estimators of derivatives at any order. Our objective is compatible with arbitrary advantage estimators, which allows the control of the bias and variance of any-order derivatives when using function approximation. Furthermore, we propose a method to trade off bias and variance of higher order derivatives by discounting the impact of more distant causal dependencies. We demonstrate the correctness and utility of our estimator in analytically tractable MDPs and in meta-reinforcement-learning for continuous control.

### Improved Expressivity Through Dendritic Neural Networks

A typical biological neuron, such as a pyramidal neuron of the neocortex, receives thousands of afferent synaptic inputs on its dendrite tree and sends the efferent axonal output downstream. In typical artificial neural networks, dendrite trees are modeled as linear structures that funnel weighted synaptic inputs to the cell bodies. However, numerous experimental and theoretical studies have shown that dendritic arbors are far more than simple linear accumulators. That is, synaptic inputs can actively modulate their neighboring synaptic activities; therefore, the dendritic structures are highly nonlinear. In this study, we model such local nonlinearity of dendritic trees with our dendritic neural network (DENN) structure and apply this structure to typical machine learning tasks.

### PyTorch: An Imperative Style, High-Performance Deep Learning Library

Deep learning frameworks have often focused on either usability or speed, but not both. PyTorch is a machine learning library that shows that these two goals are in fact compatible: it was designed from first principles to support an imperative and Pythonic programming style that supports code as a model, makes debugging easy and is consistent with other popular scientific computing libraries, while remaining efficient and supporting hardware accelerators such as GPUs. In this paper, we detail the principles that drove the implementation of PyTorch and how they are reflected in its architecture. We emphasize that every aspect of PyTorch is a regular Python program under the full control of its user. We also explain how the careful and pragmatic implementation of the key components of its runtime enables them to work together to achieve compelling performance.

### Graphical Models for Inference with Missing Data

We address the problem of deciding whether there exists a consistent estimator of a given relation Q, when data are missing not at random. We employ a formal representation called Missingness Graphs' to explicitly portray the causal mechanisms responsible for missingness and to encode dependencies between these mechanisms and the variables being measured. Using this representation, we define the notion of \textit{recoverability} which ensures that, for a given missingness-graph $G$ and a given query $Q$ an algorithm exists such that in the limit of large samples, it produces an estimate of $Q$ \textit{as if} no data were missing. We further present conditions that the graph should satisfy in order for recoverability to hold and devise algorithms to detect the presence of these conditions. Papers published at the Neural Information Processing Systems Conference.

### Sparse Attentive Backtracking: Temporal Credit Assignment Through Reminding

Learning long-term dependencies in extended temporal sequences requires credit assignment to events far back in the past. The most common method for training recurrent neural networks, back-propagation through time (BPTT), requires credit information to be propagated backwards through every single step of the forward computation, potentially over thousands or millions of time steps. This becomes computationally expensive or even infeasible when used with long sequences. Importantly, biological brains are unlikely to perform such detailed reverse replay over very long sequences of internal states (consider days, months, or years.) However, humans are often reminded of past memories or mental states which are associated with the current mental state.

### Randomized Experimental Design for Causal Graph Discovery

We examine the number of controlled experiments required to discover a causal graph. Hauser and Buhlmann showed that the number of experiments required is logarithmic in the cardinality of maximum undirected clique in the essential graph. Their lower bounds, however, assume that the experiment designer cannot use randomization in selecting the experiments. We show that significant improvements are possible with the aid of randomization – in an adversarial (worst-case) setting, the designer can then recover the causal graph using at most O(log log n) experiments in expectation. This bound cannot be improved; we show it is tight for some causal graphs.