nmr
MolSpectLLM: A Molecular Foundation Model Bridging Spectroscopy, Molecule Elucidation, and 3D Structure Generation
Shen, Shuaike, Xie, Jiaqing, Yang, Zhuo, Zhang, Antong, Sun, Shuzhou, Gao, Ben, Fu, Tianfan, Qi, Biqing, Li, Yuqiang
Recent advances in molecular foundation models have shown impressive performance in molecular property prediction and de novo molecular design, with promising applications in areas such as drug discovery and reaction prediction. Nevertheless, most existing approaches rely exclusively on SMILES representations and overlook both experimental spectra and 3D structural information-two indispensable sources for capturing molecular behavior in real-world scenarios. This limitation reduces their effectiveness in tasks where stereochemistry, spatial conformation, and experimental validation are critical. To overcome these challenges, we propose MolSpectLLM, a molecular foundation model pretrained on Qwen2.5-7B that unifies experimental spectroscopy with molecular 3D structure. By explicitly modeling molecular spectra, MolSpectLLM achieves state-of-the-art performance on spectrum-related tasks, with an average accuracy of 0.53 across NMR, IR, and MS benchmarks. MolSpectLLM also shows strong performance on the spectra analysis task, obtaining 15.5% sequence accuracy and 41.7% token accuracy on Spectra-to-SMILES, substantially outperforming large general-purpose LLMs. More importantly, MolSpectLLM not only achieves strong performance on molecular elucidation tasks, but also generates accurate 3D molecular structures directly from SMILES or spectral inputs, bridging spectral analysis, molecular elucidation, and molecular design. Code are available at \href{https://github.com/Eurekashen/MolSpectLLM}{https://github.com/Eurekashen/MolSpectLLM}.
AI-enabled prediction of NMR spectroscopy: Deducing 2-D NMR of carbohydrate
Li, Yunrui, Xu, Hao, Hong, Pengyu
In the dynamic field of nuclear magnetic resonance (NMR) spectroscopy, artificial intelligence (AI) has ushered in a transformative era for molecular studies. AI-driven NMR prediction, powered by advanced machine learning and predictive algorithms, has fundamentally reshaped the interpretation of NMR spectra. This innovation empowers us to forecast spectral patterns swiftly and accurately across a broad spectrum of molecular structures. Furthermore, the advent of generative modeling offers a groundbreaking approach, making it feasible to make informed prediction of 2D NMR from chemical language (such as SMILES, IUPAC Name). Our method mirrors the multifaceted nature of NMR imaging experiments, producing 2D NMRs for the same molecule based on different conditions, such as solvents and temperatures. Our methodology is versatile, catering to both monosaccharide-derived small molecules, oligosaccharides and large polysaccharides. A deeper exploration of the discrepancies in these predictions can provide insights into the influence of elements such as functional groups, repeating units, and the modification of the monomers on the outcomes. Given the complex nature involved in the generation of 2D NMRs, our objective is to fully leverage the potential of AI to enhance the precision, efficiency, and comprehensibility of NMR spectral analysis, ultimately advancing both the field of NMR spectroscopy and the broader realm of molecular research.
Generative Active Learning for the Search of Small-molecule Protein Binders
Korablyov, Maksym, Liu, Cheng-Hao, Jain, Moksh, van der Sloot, Almer M., Jolicoeur, Eric, Ruediger, Edward, Nica, Andrei Cristian, Bengio, Emmanuel, Lapchevskyi, Kostiantyn, St-Cyr, Daniel, Schuetz, Doris Alexandra, Butoi, Victor Ion, Rector-Brooks, Jarrid, Blackburn, Simon, Feng, Leo, Nekoei, Hadi, Gottipati, SaiKrishna, Vijayan, Priyesh, Gupta, Prateek, Rampášek, Ladislav, Avancha, Sasikanth, Bacon, Pierre-Luc, Hamilton, William L., Paige, Brooks, Misra, Sanchit, Jastrzebski, Stanislaw Kamil, Kaul, Bharat, Precup, Doina, Hernández-Lobato, José Miguel, Segler, Marwin, Bronstein, Michael, Marinier, Anne, Tyers, Mike, Bengio, Yoshua
Despite substantial progress in machine learning for scientific discovery in recent years, truly de novo design of small molecules which exhibit a property of interest remains a significant challenge. We introduce LambdaZero, a generative active learning approach to search for synthesizable molecules. Powered by deep reinforcement learning, LambdaZero learns to search over the vast space of molecules to discover candidates with a desired property. We apply LambdaZero with molecular docking to design novel small molecules that inhibit the enzyme soluble Epoxide Hydrolase 2 (sEH), while enforcing constraints on synthesizability and drug-likeliness. LambdaZero provides an exponential speedup in terms of the number of calls to the expensive molecular docking oracle, and LambdaZero de novo designed molecules reach docking scores that would otherwise require the virtual screening of a hundred billion molecules. Importantly, LambdaZero discovers novel scaffolds of synthesizable, drug-like inhibitors for sEH. In in vitro experimental validation, a series of ligands from a generated quinazoline-based scaffold were synthesized, and the lead inhibitor N-(4,6-di(pyrrolidin-1-yl)quinazolin-2-yl)-N-methylbenzamide (UM0152893) displayed sub-micromolar enzyme inhibition of sEH.
Top Things You Should Know About Numerai (NMR)
Numerai is a machine learning stock market prediction platform seeking to build the world's largest hedge fund. The project continuously runs "the hardest data science tournament on the planet" with the goal of crowdsourcing an excellent financial model for predicting the stock market, among other things. Now, before we dive in, the following piece is similar to my latest articles on Hegic (HEGIC), Ocean Protocol (OCEAN), and Quantstamp (QSP), so if you haven't already seen those, be sure to check them out as well. Numerai is a unique project that's tackling a complicated data science problem by crowdsourcing data scientists who are provided with clean and regularized stock market data that has been encrypted and obfuscated so it can be given out for free. Users (data scientists) who sign up with Numerai can download their cleaned data to create models that predict stock market movements.
Reinforcement Learning with Non-Markovian Rewards
The standard RL world model is that of a Markov Decision Process (MDP). A basic premise of MDPs is that the rewards depend on the last state and action only. Yet, many real-world rewards are non-Markovian. For example, a reward for bringing coffee only if requested earlier and not yet served, is non-Markovian if the state only records current requests and deliveries. Past work considered the problem of modeling and solving MDPs with non-Markovian rewards (NMR), but we know of no principled approaches for RL with NMR. Here, we address the problem of policy learning from experience with such rewards. We describe and evaluate empirically four combinations of the classical RL algorithm Q-learning and R-max with automata learning algorithms to obtain new RL algorithms for domains with NMR. We also prove that some of these variants converge to an optimal policy in the limit.
Accelerated Nuclear Magnetic Resonance Spectroscopy with Deep Learning
Qu, Xiaobo, Huang, Yihui, Lu, Hengfa, Qiu, Tianyu, Guo, Di, Orekhov, Vladislav, Chen, Zhong
Nuclear magnetic resonance (NMR) spectroscopy serves as an indispensable tool in chemistry and biology but often suffers from long experimental time. We present a proof-of-concept of harnessing deep learning and neural network for high-quality, reliable, and very fast NMR spectra reconstruction from limited experimental data. We show that the neural network training can be achieved using solely synthetic NMR signal, which lifts the prohibiting demand for large volume of realistic training data usually required in the deep learning approach.
Using Frequent Pattern Mining To Identify Behaviors In A Naked Mole Rat Colony
Imberman, Susan P. (College of Staten Island, Graduate Center, City University of New York) | Kress, Michael E. (College of Staten Island, Graduate Center, City University of New York) | McCloskey, Dan P. (College of Staten Island, CSI/IBR Center for Developmental Neuroscience)
Animal behavior analysis has, in the past, taken a very low tech approach, with direct observer surveillance and automated video surveillance as the norm. These methods are insufficient when one wants to study interactions between large numbers of animals in their housing environment. In this paper we use a housing environment that has been equipped with a system of RFID sensors. RFID transponders were implanted into the study animal, the naked mole rat. The resulting data was analyzed using principal component analysis and frequent pattern mining. Results showed that these methods can identify time periods of high behavioral activity from that of low activity, along with which groups of animals interacted with one another