Li, Zhiyi
Orb: A Fast, Scalable Neural Network Potential
Neumann, Mark, Gin, James, Rhodes, Benjamin, Bennett, Steven, Li, Zhiyi, Choubisa, Hitarth, Hussey, Arthur, Godwin, Jonathan
The design of new functional materials has been a critical part of emerging technologies over the past century. Advancements in energy storage, drug delivery, solar energy, filtration, carbon capture and semiconductors have accelerated due to the discovery of entire classes of materials with application specific properties, such as Perovskites and metal-organic frameworks (MOFs). However, ab initio computational methods [2] for designing new inorganic materials are slow and scale poorly to realistically sized systems. New methods using deep learning offer a way to achieve ab initio accuracy with dramatically increased speed and scalability. In recent years, deep learning methods have demonstrated their ability to approximate extremely complex natural distributions across a diverse range of application areas including vision, biology and spatial processing, by focusing on architectures that are embarrassingly parallel and can be run efficiently on modern hardware [46, 7], despite lacking architectural biases which would suit the target domain.
Comparative Analysis of Extrinsic Factors for NER in French
Yang, Grace, Li, Zhiyi, Liu, Yadong, Park, Jungyeul
Named entity recognition (NER) is a crucial task that aims to identify structured information, which is often replete with complex, technical terms and a high degree of variability. Accurate and reliable NER can facilitate the extraction and analysis of important information. However, NER for other than English is challenging due to limited data availability, as the high expertise, time, and expenses are required to annotate its data. In this paper, by using the limited data, we explore various factors including model structure, corpus annotation scheme and data augmentation techniques to improve the performance of a NER model for French. Our experiments demonstrate that these approaches can significantly improve the model's F1 score from original CRF score of 62.41 to 79.39. Our findings suggest that considering different extrinsic factors and combining these techniques is a promising approach for improving NER performance where the size of data is limited.
Statistical Properties of Robust Satisficing
Li, Zhiyi, Xu, Yunbei, Zhan, Ruohan
The Robust Satisficing (RS) model is an emerging approach to robust optimization, offering streamlined procedures and robust generalization across various applications. However, the statistical theory of RS remains unexplored in the literature. This paper fills in the gap by comprehensively analyzing the theoretical properties of the RS model. Notably, the RS structure offers a more straightforward path to deriving statistical guarantees compared to the seminal Distributionally Robust Optimization (DRO), resulting in a richer set of results. In particular, we establish two-sided confidence intervals for the optimal loss without the need to solve a minimax optimization problem explicitly. We further provide finite-sample generalization error bounds for the RS optimizer. Importantly, our results extend to scenarios involving distribution shifts, where discrepancies exist between the sampling and target distributions. Our numerical experiments show that the RS model consistently outperforms the baseline empirical risk minimization in small-sample regimes and under distribution shifts. Furthermore, compared to the DRO model, the RS model exhibits lower sensitivity to hyperparameter tuning, highlighting its practicability for robustness considerations.
Feynman Diagrams as Computational Graphs
Hou, Pengcheng, Wang, Tao, Cerkoney, Daniel, Cai, Xiansheng, Li, Zhiyi, Deng, Youjin, Wang, Lei, Chen, Kun
We propose a computational graph representation of high-order Feynman diagrams in Quantum Field Theory (QFT), applicable to any combination of spatial, temporal, momentum, and frequency domains. Utilizing the Dyson-Schwinger and parquet equations, our approach effectively organizes these diagrams into a fractal structure of tensor operations, significantly reducing computational redundancy. This approach not only streamlines the evaluation of complex diagrams but also facilitates an efficient implementation of the field-theoretic renormalization scheme, crucial for enhancing perturbative QFT calculations. Key to this advancement is the integration of Taylor-mode automatic differentiation, a key technique employed in machine learning packages to compute higher-order derivatives efficiently on computational graphs. To operationalize these concepts, we develop a Feynman diagram compiler that optimizes diagrams for various computational platforms, utilizing machine learning frameworks. Demonstrating this methodology's effectiveness, we apply it to the three-dimensional uniform electron gas problem, achieving unprecedented accuracy in calculating the quasiparticle effective mass at metal density. Our work demonstrates the synergy between QFT and machine learning, establishing a new avenue for applying AI techniques to complex quantum many-body problems.
Towards Foundational Models for Molecular Learning on Large-Scale Multi-Task Datasets
Beaini, Dominique, Huang, Shenyang, Cunha, Joao Alex, Li, Zhiyi, Moisescu-Pareja, Gabriela, Dymov, Oleksandr, Maddrell-Mander, Samuel, McLean, Callum, Wenkel, Frederik, Müller, Luis, Mohamud, Jama Hussein, Parviz, Ali, Craig, Michael, Koziarski, Michał, Lu, Jiarui, Zhu, Zhaocheng, Gabellini, Cristian, Klaser, Kerstin, Dean, Josef, Wognum, Cas, Sypetkowski, Maciej, Rabusseau, Guillaume, Rabbany, Reihaneh, Tang, Jian, Morris, Christopher, Koutis, Ioannis, Ravanelli, Mirco, Wolf, Guy, Tossou, Prudencio, Mary, Hadrien, Bois, Therence, Fitzgibbon, Andrew, Banaszewski, Błażej, Martin, Chad, Masters, Dominic
Recently, pre-trained foundation models have enabled significant advancements in multiple fields. In molecular machine learning, however, where datasets are often hand-curated, and hence typically small, the lack of datasets with labeled features, and codebases to manage those datasets, has hindered the development of foundation models. In this work, we present seven novel datasets categorized by size into three distinct categories: ToyMix, LargeMix and UltraLarge. These datasets push the boundaries in both the scale and the diversity of supervised labels for molecular learning. They cover nearly 100 million molecules and over 3000 sparsely defined tasks, totaling more than 13 billion individual labels of both quantum and biological nature. In comparison, our datasets contain 300 times more data points than the widely used OGB-LSC PCQM4Mv2 dataset, and 13 times more than the quantum-only QM1B dataset. In addition, to support the development of foundational models based on our proposed datasets, we present the Graphium graph machine learning library which simplifies the process of building and training molecular machine learning models for multi-task and multi-level molecular datasets. Finally, we present a range of baseline results as a starting point of multi-task and multi-level training on these datasets. Empirically, we observe that performance on low-resource biological datasets show improvement by also training on large amounts of quantum data. This indicates that there may be potential in multi-task and multi-level training of a foundation model and fine-tuning it to resource-constrained downstream tasks.
Extrinsic Factors Affecting the Accuracy of Biomedical NER
Li, Zhiyi, Zhang, Shengjie, Song, Yujie, Park, Jungyeul
Biomedical named entity recognition (NER) is a critial task that aims to identify structured information in clinical text, which is often replete with complex, technical terms and a high degree of variability. Accurate and reliable NER can facilitate the extraction and analysis of important biomedical information, which can be used to improve downstream applications including the healthcare system. However, NER in the biomedical domain is challenging due to limited data availability, as the high expertise, time, and expenses are required to annotate its data. In this paper, by using the limited data, we explore various extrinsic factors including the corpus annotation scheme, data augmentation techniques, semi-supervised learning and Brill transformation, to improve the performance of a NER model on a clinical text dataset (i2b2 2012, \citet{sun-rumshisky-uzuner:2013}). Our experiments demonstrate that these approaches can significantly improve the model's F1 score from original 73.74 to 77.55. Our findings suggest that considering different extrinsic factors and combining these techniques is a promising approach for improving NER performance in the biomedical domain where the size of data is limited.
GPS++: Reviving the Art of Message Passing for Molecular Property Prediction
Masters, Dominic, Dean, Josef, Klaser, Kerstin, Li, Zhiyi, Maddrell-Mander, Sam, Sanders, Adam, Helal, Hatem, Beker, Deniz, Fitzgibbon, Andrew, Huang, Shenyang, Rampášek, Ladislav, Beaini, Dominique
We present GPS++, a hybrid Message Passing Neural Network / Graph Transformer model for molecular property prediction. Our model integrates a well-tuned local message passing component and biased global attention with other key ideas from prior literature to achieve state-of-the-art results on large-scale molecular dataset PCQM4Mv2. Through a thorough ablation study we highlight the impact of individual components and find that nearly all of the model's performance can be maintained without any use of global self-attention, showing that message passing is still a competitive approach for 3D molecular property prediction despite the recent dominance of graph transformers. We also find that our approach is significantly more accurate than prior art when 3D positional information is not available.
PopSparse: Accelerated block sparse matrix multiplication on IPU
Li, Zhiyi, Orr, Douglas, Ohan, Valeriu, Da costa, Godfrey, Murray, Tom, Sanders, Adam, Beker, Deniz, Masters, Dominic
Reducing the computational cost of running large scale neural networks using sparsity has attracted great attention in the deep learning community. While much success has been achieved in reducing FLOP and parameter counts while maintaining acceptable task performance, achieving actual speed improvements has typically been much more difficult, particularly on general purpose accelerators (GPAs) such as NVIDIA GPUs using low precision number formats. In this work we introduce PopSparse, a library that enables fast sparse operations on Graphcore IPUs by leveraging both the unique hardware characteristics of IPUs as well as any block structure defined in the data. We target two different types of sparsity: static, where the sparsity pattern is fixed at compile-time; and dynamic, where it can change each time the model is run. Results indicate that the PopSparse implementations are faster than dense matrix multiplications on IPU at a range of sparsity levels with large matrix size and block size. Furthermore, static sparsity in general outperforms dynamic sparsity. While previous work on GPAs has shown speedups only for very high sparsity (typically 99% and above), the present work demonstrates that our static sparse implementation outperforms equivalent dense calculations in FP16 at lower sparsity (around 90%). IPU code is available to view and run at ipu.dev/sparsity-benchmarks, GPU code will be made available shortly. The topic of sparsity has gained significant attention in the field of deep learning research due to its potential for increased computational efficiency, reduced model size, and closer alignment with brain-like computation. The notion of sparsity in deep learning most commonly refers to the idea of sparsifying the model weights with the aim of reducing the associated storage and compute costs.
GPS++: An Optimised Hybrid MPNN/Transformer for Molecular Property Prediction
Masters, Dominic, Dean, Josef, Klaser, Kerstin, Li, Zhiyi, Maddrell-Mander, Sam, Sanders, Adam, Helal, Hatem, Beker, Deniz, Rampášek, Ladislav, Beaini, Dominique
This technical report presents GPS++, the first-place solution to the Open Graph Benchmark Large-Scale Challenge (OGB-LSC 2022) for the PCQM4Mv2 molecular property prediction task. Our approach implements several key principles from the prior literature. At its core our GPS++ method is a hybrid MPNN/Transformer model that incorporates 3D atom positions and an auxiliary denoising task. The effectiveness of GPS++ is demonstrated by achieving 0.0719 mean absolute error on the independent test-challenge PCQM4Mv2 split. Thanks to Graphcore IPU acceleration, GPS++ scales to deep architectures (16 layers), training at 3 minutes per epoch, and large ensemble (112 models), completing the final predictions in 1 hour 32 minutes, well under the 4 hour inference budget allocated. Our implementation is publicly available at: https://github.com/graphcore/ogb-lsc-pcqm4mv2.