Goto

Collaborating Authors

 ribosome


SparseFourierBackpropagationinCryo-EM Reconstruction

Neural Information Processing Systems

Thepresence of multiple structural states in the data represents a major bottleneck in existing processing pipelines, often requiring expert user supervision.


Modular Machine Learning with Applications to Genetic Circuit Composition

Wang, Jichi, Sontag, Eduardo D., Del Vecchio, Domitilla

arXiv.org Artificial Intelligence

In several applications, including in synthetic biology, one often has input/output data on a system composed of many modules, and although the modules' input/output functions and signals may be unknown, knowledge of the composition architecture can significantly reduce the amount of training data required to learn the system's input/output mapping. Learning the modules' input/output functions is also necessary for designing new systems from different composition architectures. Here, we propose a modular learning framework, which incorporates prior knowledge of the system's compositional structure to (a) identify the composing modules' input/output functions from the system's input/output data and (b) achieve this by using a reduced amount of data compared to what would be required without knowledge of the compositional structure. To achieve this, we introduce the notion of modular identifiability, which allows recovery of modules' input/output functions from a subset of the system's input/output data, and provide theoretical guarantees on a class of systems motivated by genetic circuits. We demonstrate the theory on computational studies showing that a neural network (NNET) that accounts for the compositional structure can learn the composing modules' input/output functions and predict the system's output on inputs outside of the training set distribution. By contrast, a neural network that is agnostic of the structure is unable to predict on inputs that fall outside of the training set distribution. By reducing the need for experimental data and allowing module identification, this framework offers the potential to ease the design of synthetic biological circuits and of multi-module systems more generally.




A Comprehensive Review on RNA Subcellular Localization Prediction

Zhang, Cece, Zhu, Xuehuan, Peterson, Nick, Wang, Jieqiong, Wan, Shibiao

arXiv.org Artificial Intelligence

The subcellular localization of RNAs, including long non-coding RNAs (lncRNAs), messenger RNAs (mRNAs), microRNAs (miRNAs) and other smaller RNAs, plays a critical role in determining their biological functions. For instance, lncRNAs are predominantly associated with chromatin and act as regulators of gene transcription and chromatin structure, while mRNAs are distributed across the nucleus and cytoplasm, facilitating the transport of genetic information for protein synthesis. Understanding RNA localization sheds light on processes like gene expression regulation with spatial and temporal precision. However, traditional wet lab methods for determining RNA localization, such as in situ hybridization, are often time-consuming, resource-demanding, and costly. To overcome these challenges, computational methods leveraging artificial intelligence (AI) and machine learning (ML) have emerged as powerful alternatives, enabling large-scale prediction of RNA subcellular localization. This paper provides a comprehensive review of the latest advancements in AI-based approaches for RNA subcellular localization prediction, covering various RNA types and focusing on sequence-based, image-based, and hybrid methodologies that combine both data types. We highlight the potential of these methods to accelerate RNA research, uncover molecular pathways, and guide targeted disease treatments. Furthermore, we critically discuss the challenges in AI/ML approaches for RNA subcellular localization, such as data scarcity and lack of benchmarks, and opportunities to address them. This review aims to serve as a valuable resource for researchers seeking to develop innovative solutions in the field of RNA subcellular localization and beyond.


Calibrating LLMs with Preference Optimization on Thought Trees for Generating Rationale in Science Question Scoring

Li, Jiazheng, Xu, Hainiu, Sun, Zhaoyue, Zhou, Yuxiang, West, David, Aloisi, Cesare, He, Yulan

arXiv.org Artificial Intelligence

Generating rationales that justify scoring decisions has been a promising way to facilitate explainability in automated scoring systems. However, existing methods do not match the accuracy of classifier-based methods. Plus, the generated rationales often contain hallucinated information. To address these issues, we propose a novel framework capable of generating more faithful rationales and, more importantly, matching performance with classifier-based black-box scoring systems. We first mimic the human assessment process by querying Large Language Models (LLMs) to generate a thought tree. We then summarise intermediate assessment decisions from each thought tree path for creating synthetic rationale data and rationale preference data. Finally, we utilise the generated synthetic data to calibrate LLMs through a two-step training process: supervised fine-tuning and preference optimization. Extensive experimental results demonstrate that our framework achieves a 38% assessment performance improvement in the QWK score compared to prior work while producing higher-quality rationales, as recognised by human evaluators and LLMs. Our work sheds light on the effectiveness of performing preference optimization using synthetic preference data obtained from thought tree paths.


The Sci-Fi Dream of a 'Molecular Computer' Is Getting More Real

WIRED

"Chemists like me have been working on trying to turn molecules into machines for about 25 years now," says Leigh, an organic chemist from the University of Manchester in the United Kingdom. You're building on all those that went before you." In 1936, English mathematician Alan Turing imagined an autonomous machine capable of carrying out any precisely coded algorithm. The hypothetical machine would read a strip of tape dotted with symbols that, when interpreted sequentially, would instruct the machine to act. It might transcribe, translate, or compute--turning code into a message, or a math problem into an answer.


Amortized Inference for Heterogeneous Reconstruction in Cryo-EM

Levy, Axel, Wetzstein, Gordon, Martel, Julien, Poitevin, Frederic, Zhong, Ellen D.

arXiv.org Artificial Intelligence

Cryo-electron microscopy (cryo-EM) is an imaging modality that provides unique insights into the dynamics of proteins and other building blocks of life. The algorithmic challenge of jointly estimating the poses, 3D structure, and conformational heterogeneity of a biomolecule from millions of noisy and randomly oriented 2D projections in a computationally efficient manner, however, remains unsolved. Our method, cryoFIRE, performs ab initio heterogeneous reconstruction with unknown poses in an amortized framework, thereby avoiding the computationally expensive step of pose search while enabling the analysis of conformational heterogeneity. Poses and conformation are jointly estimated by an encoder while a physics-based decoder aggregates the images into an implicit neural representation of the conformational space. We show that our method can provide one order of magnitude speedup on datasets containing millions of images without any loss of accuracy. We validate that the joint estimation of poses and conformations can be amortized over the size of the dataset. For the first time, we prove that an amortized method can extract interpretable dynamic information from experimental datasets.


Self-replicating protein factories are a step towards artificial life

New Scientist

Tiny protein factories, or ribosomes, have been made to self-replicate outside a living cell for the first time. The achievement is a crucial step towards building self-replicating artificial cells from scratch and understanding how the first living things started reproducing themselves. Ribosomes are where the genetic code gets translated into proteins – complex molecules that make up the machinery of living cells. In order for the earliest life to get going, many researchers think ribosomes must have been able to assemble and replicate before there were cells.


Automatic post-picking using MAPPOS improves particle image detection from Cryo-EM micrographs

Norousi, Ramin, Wickles, Stephan, Leidig, Christoph, Becker, Thomas, Schmid, Volker J., Beckmann, Roland, Tresch, Achim

arXiv.org Machine Learning

Cryo-electron microscopy (cryo-EM) studies using single particle reconstruction are extensively used to reveal structural information on macromolecular complexes. Aiming at the highest achievable resolution, state of the art electron microscopes automatically acquire thousands of high-quality micrographs. Particles are detected on and boxed out from each micrograph using fully- or semi-automated approaches. However, the obtained particles still require laborious manual post-picking classification, which is one major bottleneck for single particle analysis of large datasets. We introduce MAPPOS, a supervised post-picking strategy for the classification of boxed particle images, as additional strategy adding to the already efficient automated particle picking routines. MAPPOS employs machine learning techniques to train a robust classifier from a small number of characteristic image features. In order to accurately quantify the performance of MAPPOS we used simulated particle and non-particle images. In addition, we verified our method by applying it to an experimental cryo-EM dataset and comparing the results to the manual classification of the same dataset. Comparisons between MAPPOS and manual post-picking classification by several human experts demonstrated that merely a few hundred sample images are sufficient for MAPPOS to classify an entire dataset with a human-like performance. MAPPOS was shown to greatly accelerate the throughput of large datasets by reducing the manual workload by orders of magnitude while maintaining a reliable identification of non-particle images.