Smith, John R.
Accelerating Material Design with the Generative Toolkit for Scientific Discovery
Manica, Matteo, Born, Jannis, Cadow, Joris, Christofidellis, Dimitrios, Dave, Ashish, Clarke, Dean, Teukam, Yves Gaetan Nana, Giannone, Giorgio, Hoffman, Samuel C., Buchan, Matthew, Chenthamarakshan, Vijil, Donovan, Timothy, Hsu, Hsiang Han, Zipoli, Federico, Schilter, Oliver, Kishimoto, Akihiro, Hamada, Lisa, Padhi, Inkit, Wehden, Karl, McHugh, Lauren, Khrabrov, Alexy, Das, Payel, Takeda, Seiji, Smith, John R.
The rapid technological progress in the last centuries has been largely fueled by the success of the scientific method. However, in some of the most important fields, such as material or drug discovery, the productivity has been decreasing dramatically (Smietana et al., 2016) and by today it can take almost a decade to discover a new material and cost upwards of $10-$100 million. One of the most daunting challenges in materials discovery is hypothesis generation. The reservoir of natural products and their derivatives has been largely emptied (Atanasov et al., 2021) and bottom-up human-driven hypotheses have shown that it is extremely challenging to identify and select novel and useful candidates in search spaces that are overwhelming in size, e.g., the chemical space for drug-like molecules is estimated to contain > 10
Low-Rank Similarity Metric Learning in High Dimensions
Liu, Wei (IBM T. J. Watson Research Center) | Mu, Cun (Columbia University) | Ji, Rongrong (Xiamen University) | Ma, Shiqian (The Chinese University of Hong Kong) | Smith, John R. (IBM T. J. Watson Research Center) | Chang, Shih-Fu (Columbia University)
Metric learning has become a widespreadly used tool in machine learning. To reduce expensive costs brought in by increasing dimensionality, low-rank metric learning arises as it can be more economical in storage and computation. However, existing low-rank metric learning algorithms usually adopt nonconvex objectives, and are hence sensitive to the choice of a heuristic low-rank basis. In this paper, we propose a novel low-rank metric learning algorithm to yield bilinear similarity functions. This algorithm scales linearly with input dimensionality in both space and time, therefore applicable to high-dimensional data domains. A convex objective free of heuristics is formulated by leveraging trace norm regularization to promote low-rankness. Crucially, we prove that all globally optimal metric solutions must retain a certain low-rank structure, which enables our algorithm to decompose the high-dimensional learning task into two steps: an SVD-based projection and a metric learning problem with reduced dimensionality. The latter step can be tackled efficiently through employing a linearized Alternating Direction Method of Multipliers. The efficacy of the proposed algorithm is demonstrated through experiments performed on four benchmark datasets with tens of thousands of dimensions.