Goto

Collaborating Authors

 Biofuel


GraphGDel: Constructing and Learning Graph Representations of Genome-Scale Metabolic Models for Growth-Coupled Gene Deletion Prediction

Yang, Ziwei, Tamura, Takeyuki

arXiv.org Artificial Intelligence

In genome-scale constraint-based metabolic models, gene deletion strategies are essential for achieving growth-coupled production, where cell growth and target metabolite synthesis occur simultaneously. Despite the inherently networked nature of genome-scale metabolic models, existing computational approaches rely primarily on sequential data and lack graph representations that capture their complex relationships, as both well-defined graph constructions and learning frameworks capable of exploiting them remain largely unexplored. To address this gap, we present a twofold solution. First, we introduce a systematic pipeline for constructing graph representations from constraint-based metabolic models. Second, we develop a deep learning framework that integrates these graph representations with gene and metabolite sequence data to predict growth-coupled gene deletion strategies. Across three metabolic models of varying scale, our approach consistently outperforms established baselines, achieves improvements of 14.04%, 16.26%, and 13.18% in overall accuracy. The source code and example datasets are available at: https://github.com/MetNetComp/GraphGDel.



Advanced Deep Learning Methods for Protein Structure Prediction and Design

Wang, Tianyang, Zhang, Yichao, Deng, Ningyuan, Song, Xinyuan, Bi, Ziqian, Yao, Zheyu, Chen, Keyu, Li, Ming, Niu, Qian, Liu, Junyu, Peng, Benji, Zhang, Sen, Liu, Ming, Zhang, Li, Pan, Xuanhe, Wang, Jinlang, Feng, Pohsun, Wen, Yizhu, Yan, Lawrence KQ, Tseng, Hongming, Zhong, Yan, Wang, Yunze, Qin, Ziyuan, Jing, Bowen, Yang, Junjie, Zhou, Jun, Liang, Chia Xin, Song, Junhao

arXiv.org Artificial Intelligence

After AlphaFold won the Nobel Prize, protein prediction with deep learning once again became a hot topic. We comprehensively explore advanced deep learning methods applied to protein structure prediction and design. It begins by examining recent innovations in prediction architectures, with detailed discussions on improvements such as diffusion based frameworks and novel pairwise attention modules. The text analyses key components including structure generation, evaluation metrics, multiple sequence alignment processing, and network architecture, thereby illustrating the current state of the art in computational protein modelling. Subsequent chapters focus on practical applications, presenting case studies that range from individual protein predictions to complex biomolecular interactions. Strategies for enhancing prediction accuracy and integrating deep learning techniques with experimental validation are thoroughly explored. The later sections review the industry landscape of protein design, highlighting the transformative role of artificial intelligence in biotechnology and discussing emerging market trends and future challenges. Supplementary appendices provide essential resources such as databases and open source tools, making this volume a valuable reference for researchers and students.


Grangemouth could be converted into leading green fuels hub, Swinney says

The Guardian > Energy

There is a realistic chance that one of the UK's largest oil refineries can be converted into a hub for green chemicals, sustainable fuels and plastics, Scotland's first minister says. Grangemouth oil refinery, which is being shut down by its UK and Chinese owners PetroIneos this year with the loss of 400 jobs, could become a world leader in low carbon chemicals and green fuels, John Swinney told media on Wednesday. The refinery's closure, after 100 years of production, is expected to hit up to 2,000 jobs in the east of Scotland. Trade union leaders and policymakers see Grangemouth as a casestudy in ensuring the transition from oil and gas is fair and just. Swinney said workers and local businesses faced "enormous difficulties".


Multi-Objective Bayesian Optimization for Networked Black-Box Systems: A Path to Greener Profits and Smarter Designs

Kudva, Akshay, Tang, Wei-Ting, Paulson, Joel A.

arXiv.org Machine Learning

Designing modern industrial systems requires balancing several competing objectives, such as profitability, resilience, and sustainability, while accounting for complex interactions between technological, economic, and environmental factors. Multi-objective optimization (MOO) methods are commonly used to navigate these tradeoffs, but selecting the appropriate algorithm to tackle these problems is often unclear, particularly when system representations vary from fully equation-based (white-box) to entirely data-driven (black-box) models. While grey-box MOO methods attempt to bridge this gap, they typically impose rigid assumptions on system structure, requiring models to conform to the underlying structural assumptions of the solver rather than the solver adapting to the natural representation of the system of interest. In this chapter, we introduce a unifying approach to grey-box MOO by leveraging network representations, which provide a general and flexible framework for modeling interconnected systems as a series of function nodes that share various inputs and outputs. Specifically, we propose MOBONS, a novel Bayesian optimization-inspired algorithm that can efficiently optimize general function networks, including those with cyclic dependencies, enabling the modeling of feedback loops, recycle streams, and multi-scale simulations - features that existing methods fail to capture. Furthermore, MOBONS incorporates constraints, supports parallel evaluations, and preserves the sample efficiency of Bayesian optimization while leveraging network structure for improved scalability. We demonstrate the effectiveness of MOBONS through two case studies, including one related to sustainable process design. By enabling efficient MOO under general graph representations, MOBONS has the potential to significantly enhance the design of more profitable, resilient, and sustainable engineering systems.


Safe Reinforcement Learning for Real-World Engine Control

Bedei, Julian, Koch, Lucas, Badalian, Kevin, Winkler, Alexander, Schaber, Patrick, Andert, Jakob

arXiv.org Artificial Intelligence

This work introduces a toolchain for applying Reinforcement Learning (RL), specifically the Deep Deterministic Policy Gradient (DDPG) algorithm, in safety-critical real-world environments. As an exemplary application, transient load control is demonstrated on a single-cylinder internal combustion engine testbench in Homogeneous Charge Compression Ignition (HCCI) mode, that offers high thermal efficiency and low emissions. However, HCCI poses challenges for traditional control methods due to its nonlinear, autoregressive, and stochastic nature. RL provides a viable solution, however, safety concerns, such as excessive pressure rise rates, must be addressed when applying to HCCI. A single unsuitable control input can severely damage the engine or cause misfiring and shut down. Additionally, operating limits are not known a priori and must be determined experimentally. To mitigate these risks, real-time safety monitoring based on the k-nearest neighbor algorithm is implemented, enabling safe interaction with the testbench. The feasibility of this approach is demonstrated as the RL agent learns a control policy through interaction with the testbench. A root mean square error of 0.1374 bar is achieved for the indicated mean effective pressure, comparable to neural network-based controllers from the literature. The toolchain's flexibility is further demonstrated by adapting the agent's policy to increase ethanol energy shares, promoting renewable fuel use while maintaining safety. This RL approach addresses the longstanding challenge of applying RL to safety-critical real-world environments. The developed toolchain, with its adaptability and safety mechanisms, paves the way for future applicability of RL in engine testbenches and other safety-critical settings.


Predicting NOx emissions in Biochar Production Plants using Machine Learning

Köppel, Marius, Witzig, Niklas, Klausmann, Tim, Cerrato, Mattia, Schweitzer, Tobias, Weber, Jochen, Yilmaz, Erdem, Chimbo, Juan, del Campo, Bernardo, Davila, Lissete, Barreno, David

arXiv.org Artificial Intelligence

The global Biochar Industry has witnessed a surge in biochar production, with a total of 350k mt/year production in 2023. With the pressing climate goals set and the potential of Biochar Carbon Removal (BCR) as a climate-relevant technology, scaling up the number of new plants to over 1000 facilities per year by 2030 becomes imperative. However, such a massive scale-up presents not only technical challenges but also control and regulation issues, ensuring maximal output of plants while conforming to regulatory requirements. In this paper, we present a novel method of optimizing the process of a biochar plant based on machine learning methods. We show how a standard Random Forest Regressor can be used to model the states of the pyrolysis machine, the physics of which remains highly complex. This model then serves as a surrogate of the machine -- reproducing several key outcomes of the machine -- in a numerical optimization. This, in turn, could enable us to reduce NOx emissions -- a key regulatory goal in that industry -- while achieving maximal output still. In a preliminary test our approach shows remarkable results, proves to be applicable on two different machines from different manufacturers, and can be implemented on standard Internet of Things (IoT) devices more generally.


MoIN: Mixture of Introvert Experts to Upcycle an LLM

Tejankar, Ajinkya, Navaneet, KL, Panchal, Ujjawal, Pourahmadi, Kossar, Pirsiavash, Hamed

arXiv.org Artificial Intelligence

The goal of this paper is to improve (upcycle) an existing large language model without the prohibitive requirements of continued pre-training of the full-model. The idea is to split the pre-training data into semantically relevant groups and train an expert on each subset. An expert takes the form of a lightweight adapter added on the top of a frozen base model. During inference, an incoming query is first routed to the most relevant expert which is then loaded onto the base model for the forward pass. Unlike typical Mixture of Experts (MoE) models, the experts in our method do not work with other experts for a single query. Hence, we dub them "introvert" experts. Freezing the base model and keeping the experts as lightweight adapters allows extreme parallelism during training and inference. Training of all experts can be done in parallel without any communication channels between them. Similarly, the inference can also be heavily parallelized by distributing experts on different GPUs and routing each request to the GPU containing its relevant expert. We implement a proof-of-concept version of this method and show the validity of our approach.


A generalizable framework for unlocking missing reactions in genome-scale metabolic networks using deep learning

Liu, Xiaoyi, Yang, Hongpeng, Ai, Chengwei, Dong, Ruihan, Ding, Yijie, Yuan, Qianqian, Tang, Jijun, Guo, Fei

arXiv.org Artificial Intelligence

Incomplete knowledge of metabolic processes hinders the accuracy of GEnome-scale Metabolic models (GEMs), which in turn impedes advancements in systems biology and metabolic engineering. Existing gap-filling methods typically rely on phenotypic data to minimize the disparity between computational predictions and experimental results. However, there is still a lack of an automatic and precise gap-filling method for initial state GEMs before experimental data and annotated genomes become available. In this study, we introduce CLOSEgaps, a deep learning-driven tool that addresses the gap-filling issue by modeling it as a hyperedge prediction problem within GEMs. Specifically, CLOSEgaps maps metabolic networks as hypergraphs and learns their hyper-topology features to identify missing reactions and gaps by leveraging hypothetical reactions. This innovative approach allows for the characterization and curation of both known and hypothetical reactions within metabolic networks. Extensive results demonstrate that CLOSEgaps accurately gap-filling over 96% of artificially introduced gaps for various GEMs. Furthermore, CLOSEgaps enhances phenotypic predictions for 24 GEMs and also finds a notable improvement in producing four crucial metabolites (Lactate, Ethanol, Propionate, and Succinate) in two organisms. As a broadly applicable solution for any GEM, CLOSEgaps represents a promising model to automate the gap-filling process and uncover missing connections between reactions and observed metabolic phenotypes.


How Susceptible are LLMs to Influence in Prompts?

Anagnostidis, Sotiris, Bulian, Jannis

arXiv.org Artificial Intelligence

Large Language Models (LLMs) are highly sensitive to prompts, including additional context provided therein. As LLMs grow in capability, understanding their prompt-sensitivity becomes increasingly crucial for ensuring reliable and robust performance, particularly since evaluating these models becomes more challenging. In this work, we investigate how current models (Llama, Mixtral, Falcon) respond when presented with additional input from another model, mimicking a scenario where a more capable model -- or a system with access to more external information -- provides supplementary information to the target model. Across a diverse spectrum of question-answering tasks, we study how an LLM's response to multiple-choice questions changes when the prompt includes a prediction and explanation from another model. Specifically, we explore the influence of the presence of an explanation, the stated authoritativeness of the source, and the stated confidence of the supplementary input. Our findings reveal that models are strongly influenced, and when explanations are provided they are swayed irrespective of the quality of the explanation. The models are more likely to be swayed if the input is presented as being authoritative or confident, but the effect is small in size. This study underscores the significant prompt-sensitivity of LLMs and highlights the potential risks of incorporating outputs from external sources without thorough scrutiny and further validation. As LLMs continue to advance, understanding and mitigating such sensitivities will be crucial for their reliable and trustworthy deployment.