Goto

Collaborating Authors

 Asia



Learning Superconductivity from Ordered and Disordered Material Structures Pin Chen

Neural Information Processing Systems

However, some critical aspects of it, such as the relationship between superconductivity and materials' chemical/structural features, still need to be understood. Recent successes of data-driven approaches in material science strongly inspire researchers to study this relationship with them, but a corresponding dataset is still lacking.








MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models

Neural Information Processing Systems

Experiments of pretraining 410M and 1B models on the C4 dataset demonstrate that MA TES significantly outperforms random data selection on extensive downstream tasks. It doubles the gains achieved by the state-of-the-art data selection approach that leverages larger reference models and reduces the total FLOPs required to reach certain performances by half. Further analyses validate the effectiveness of the locally probed oracle data influence and the approximation with data influence models. Our code is open-sourced at https://github.com/cxcscmu/MA


Learning Invariant Molecular Representation in Latent Discrete Space Xiang Zhuang

Neural Information Processing Systems

Molecular representation learning lays the foundation for drug discovery. However, existing methods suffer from poor out-of-distribution (OOD) generalization, particularly when data for training and testing originate from different environments.