Qian, Ying
Enhancing Retrosynthesis with Conformer: A Template-Free Method
Zhuang, Jiaxi, Zhang, Qian, Qian, Ying
Retrosynthesis plays a crucial role in the fields of organic synthesis and drug development, where the goal is to identify suitable reactants that can yield a target product molecule. Although existing methods have achieved notable success, they typically overlook the 3D conformational details and internal spatial organization of molecules. This oversight makes it challenging to predict reactants that conform to genuine chemical principles, particularly when dealing with complex molecular structures, such as polycyclic and heteroaromatic compounds. In response to this challenge, we introduce a novel transformer-based, template-free approach that incorporates 3D conformer data and spatial information. Our approach includes an Atom-align Fusion module that integrates 3D positional data at the input stage, ensuring correct alignment between atom tokens and their respective 3D coordinates. Additionally, we propose a Distance-weighted Attention mechanism that refines the self-attention process, constricting the model s focus to relevant atom pairs in 3D space. Extensive experiments on the USPTO-50K dataset demonstrate that our model outperforms previous template-free methods, setting a new benchmark for the field. A case study further highlights our method s ability to predict reasonable and accurate reactants.
Physics-informed deep learning for infectious disease forecasting
Qian, Ying, Marty, รric, Basu, Avranil, O'Dea, Eamon B., Wang, Xianqiao, Fox, Spencer, Rohani, Pejman, Drake, John M., Li, He
Accurate forecasting of contagious illnesses has become increasingly important to public health policymaking, and better prediction could prevent the loss of millions of lives. To better prepare for future pandemics, it is essential to improve forecasting methods and capabilities. In this work, we propose a new infectious disease forecasting model based on physics-informed neural networks (PINNs), an emerging area of scientific machine learning. The proposed PINN model incorporates dynamical systems representations of disease transmission into the loss function, thereby assimilating epidemiological theory and data using neural networks (NNs). Our approach is designed to prevent model overfitting, which often occurs when training deep learning models with observation data alone. In addition, we employ an additional sub-network to account for mobility, vaccination, and other covariates that influence the transmission rate, a key parameter in the compartment model. To demonstrate the capability of the proposed model, we examine the performance of the model using state-level COVID-19 data in California. Our simulation results show that predictions of PINN model on the number of cases, deaths, and hospitalizations are consistent with existing benchmarks. In particular, the PINN model outperforms the basic NN model and naive baseline forecast. We also show that the performance of the PINN model is comparable to a sophisticated Gaussian infection state space with time dependence (GISST) forecasting model that integrates the compartment model with a data observation model and a regression model for inferring parameters in the compartment model. Nonetheless, the PINN model offers a simpler structure and is easier to implement. Our results show that the proposed forecaster could potentially serve as a new computational tool to enhance the current capacity of infectious disease forecasting.
Adaptive Inference: Theoretical Limits and Unexplored Opportunities
Hor, Soheil, Qian, Ying, Pilanci, Mert, Arbabian, Amin
This paper introduces the first theoretical framework for quantifying the efficiency and performance gain opportunity size of adaptive inference algorithms. We provide new approximate and exact bounds for the achievable efficiency and performance gains, supported by empirical evidence demonstrating the potential for 10-100x efficiency improvements in both Computer Vision and Natural Language Processing tasks without incurring any performance penalties. Additionally, we offer insights on improving achievable efficiency gains through the optimal selection and design of adaptive inference state spaces.
CasCIFF: A Cross-Domain Information Fusion Framework Tailored for Cascade Prediction in Social Networks
Zhu, Hongjun, Yuan, Shun, Liu, Xin, Chen, Kuo, Jia, Chaolong, Qian, Ying
Existing approaches for information cascade prediction fall into three main categories: feature-driven methods, point process-based methods, and deep learning-based methods. Among them, deep learning-based methods, characterized by its superior learning and representation capabilities, mitigates the shortcomings inherent of the other methods. However, current deep learning methods still face several persistent challenges. In particular, accurate representation of user attributes remains problematic due to factors such as fake followers and complex network configurations. Previous algorithms that focus on the sequential order of user activations often neglect the rich insights offered by activation timing. Furthermore, these techniques often fail to holistically integrate temporal and structural aspects, thus missing the nuanced propagation trends inherent in information cascades.To address these issues, we propose the Cross-Domain Information Fusion Framework (CasCIFF), which is tailored for information cascade prediction. This framework exploits multi-hop neighborhood information to make user embeddings robust. When embedding cascades, the framework intentionally incorporates timestamps, endowing it with the ability to capture evolving patterns of information diffusion. In particular, the CasCIFF seamlessly integrates the tasks of user classification and cascade prediction into a consolidated framework, thereby allowing the extraction of common features that prove useful for all tasks, a strategy anchored in the principles of multi-task learning.