Goto

Collaborating Authors

 screening


Supplement WelQrate: Defining the Gold Standard in Small Molecule Drug Discovery Benchmarking T able of Contents

Neural Information Processing Systems

If taking a closer look at the MedDRA classification on the system organ level on its website, we can find a claim of "System Organ Classes (SOCs) which are groupings by aetiology (e.g. However, as claimed in the original paper, "It should be noted that we did not perform any preprocessing of our datasets, such as Tab. These datasets appear in MoleculeNet as well. As mentioned in the introduction in the main paper, there are also issues with inconsistent representations and undefined stereochemistry. We list an example for each in Figure 1 and Figure 1.


H-nobs: Achieving Certified Fairness and Robustness in Distributed Learning on Heterogeneous Datasets

Neural Information Processing Systems

Fairness and robustness are two important goals in the desig n of modern distributed learning systems. Despite a few prior works attemp ting to achieve both fairness and robustness, some key aspects of this direction remain underexplored. In this paper, we try to answer three largely unnoticed and un addressed questions that are of paramount significance to this topic: (i) What mak es jointly satisfying fairness and robustness difficult?


ToDD: TopologicalCompoundFingerprintingin Computer-AidedDrugDiscovery

Neural Information Processing Systems

In computer-aided drug discovery (CADD), virtual screening (VS) is used for identifying the drug candidates that are most likely tobind toamolecular target inalargelibraryofcompounds.




AI-assisted mammograms cut risk of developing aggressive breast cancer

New Scientist

People who are screened for breast cancer by AI-supported radiologists are less likely to develop aggressive cancers before their next screening round than those who are screened by radiologists alone, raising hopes that AI-assisted screening could save lives. "This is the first randomised controlled trial on the use of AI in mammography screening," says Kristina Lång at Lund University in Sweden. The AI-supported approach involves using the software - which has been trained on more than 200,000 mammography scans from 10 countries - to rank the likelihood of cancer being present in mammograms on a scale of 1 to 10, based on visual patterns in the scans. The scans receiving a score of 1 to 9 are then assessed by one experienced radiologist, while scans receiving a score of 10 - indicating cancer is most likely to be present - are assessed by two experienced radiologists. An earlier study found that this approach could detect 29 per cent more cancers than standard screening, where each mammogram is assessed by two radiologists, without increasing the rate of false detections - where a growth is flagged but follow-up tests reveal it isn't actually there or wouldn't go on to cause problems.


Finger-prick diabetes blood test could be early warning for children

BBC News

All UK children could be offered screening for type 1 diabetes using a simple finger-prick blood test, say researchers who have been running a large study. Currently, many young people go undiagnosed and risk developing a life-threatening complication called diabetic ketoacidosis that needs urgent hospital treatment. Identifying diabetes earlier could help avoid this and mean treatments to control problematic blood sugar levels can be given sooner. Some 17,000 children aged three to 13 have already been checked as part of the ELSA (Early Surveillance for Autoimmune diabetes) study, funded by diabetes charities. Imogen, who is 12 and from the West Midlands, is one of those found to have diabetes thanks to the screening.



Fine-Tuning ChemBERTa for Predicting Inhibitory Activity Against TDP1 Using Deep Learning

Zeng, Baichuan

arXiv.org Artificial Intelligence

Predicting the inhibitory potency of small molecules against Tyrosyl-DNA Phosphodiesterase 1 (TDP1) -- a key target in overcoming cancer chemoresistance--remains a critical challenge in early drug discovery. We present a deep learning framework for the quantitative regression of pIC50 values from molecular Simplified Molecular Input Line Entry System (SMILES) strings using fine-tuned variants of ChemBERTa, a pre-trained chemical language model. Leveraging a large-scale consensus dataset of 177,092 compounds, we systematically evaluate two pre-training strategies--Masked Language Modeling (MLM) and Masked Token Regression (MTR)--under stratified data splits and sample weighting to address severe activity imbalance which only 2.1% are active. Our approach outperforms classical baselines Random Predictor in both regression accuracy and virtual screening utility, and has competitive performance compared to Random Forest, achieving high enrichment factor EF@1% 17.4 and precision Precision@1% 37.4 among top-ranked predictions. The resulting model, validated through rigorous ablation and hyperparameter studies, provides a robust, ready-to-deploy tool for prioritizing TDP1 inhibitors for experimental testing. By enabling accurate, 3D-structure-free pIC50 prediction directly from SMILES, this work demonstrates the transformative potential of chemical transformers in accelerating target-specific drug discovery.


Lattice-to-total thermal conductivity ratio: a phonon-glass electron-crystal descriptor for data-driven thermoelectric design

Sun, Yifan, Li, Zhi, Imamura, Tetsuya, Ohishi, Yuji, Wolverton, Chris, Kurosaki, Ken

arXiv.org Artificial Intelligence

Thermoelectrics (TEs) are promising candidates for energy harvesting with performance quantified by figure of merit, $ZT$. To accelerate the discovery of high-$ZT$ materials, efforts have focused on identifying compounds with low thermal conductivity $κ$. Using a curated dataset of 71,913 entries, we show that high-$ZT$ materials reside not only in the low-$κ$ regime but also cluster near a lattice-to-total thermal conductivity ratio ($κ_\mathrm{L}/κ$) of approximately 0.5, consistent with the phonon-glass electron-crystal design concept. Building on this insight, we construct a framework consisting of two machine learning models for the lattice and electronic components of thermal conductivity that jointly provide both $κ$ and $κ_\mathrm{L}/κ$ for screening and guiding the optimization of TE materials. Among 104,567 compounds screened, our models identify 2,522 ultralow-$κ$ candidates. Follow-up case studies demonstrate that this framework can reliably provide optimization strategies by suggesting new dopants and alloys that shift pristine materials toward the $κ_\mathrm{L}/κ$ approaching 0.5 regime. Ultimately, by integrating rapid screening with PGEC-guided optimization, our data-driven framework effectively bridges the critical gap between materials discovery and performance enhancement.