Machine Learning Reveals Composition Dependent Thermal Stability in Halide Perovskites

Hering, Abigail R., Dubey, Mansha, Hosseini, Elahe, Srivastava, Meghna, An, Yu, Correa-Baena, Juan-Pablo, Homayoun, Houman, Leite, Marina S.

arXiv.org Artificial Intelligence 

The whiskers extend to 4x the IQR ( Supplementary Figure 1), which is a conservative threshold that ensures only the most extreme variations in PL are classified as outliers (denoted by diamond symbols). Outliers in PL property distributions may indicate experimental errors, sample inconsistencies, or data proces sing anomalies, thus, they are removed from the ML analysis. Data Visualization: PCA orthogonally transforms the original variables into a new set of linearly uncorrelated variables termed principal components (PCs). The first PC captures the maximum variance present in the data, and each subsequent component has the highest variance p ossible under the constraint of being orthogonal to the preceding ones. The methodology involves standardizing the dataset, calculating the covariance matrix, and then extracting the eigenvalues and eigenvectors of this matrix, which, in tur n, dictate the magnitude and direction of the new space, respectively. By projecting the original data along these new axes, PCA provides a means to reduce the dimensionality of the dataset. Supplementary Figure 1A illustrates the distribution of the samples in the space defined by the PCs, with each point representing a single sample's location within this novel coordinate system. Here, the colors indicate the value of each PL property, offering a visual insight into how these factors correlate with the PCs.