Shah, Prashant
Federated Discrete Denoising Diffusion Model for Molecular Generation with OpenFL
Ta, Kevin, Foley, Patrick, Thieme, Mattson, Pandey, Abhishek, Shah, Prashant
Generating unique molecules with biochemically desired properties to serve as viable drug candidates is a difficult task that requires specialized domain expertise. In recent years, diffusion models have shown promising results in accelerating the drug design process through AI-driven molecular generation. However, training these models requires massive amounts of data, which are often isolated in proprietary silos. OpenFL is a federated learning framework that enables privacy-preserving collaborative training across these decentralized data sites. In this work, we present a federated discrete denoising diffusion model that was trained using OpenFL. The federated model achieves comparable performance with a model trained on centralized data when evaluating the uniqueness and validity of the generated molecules. This demonstrates the utility of federated learning in the drug design process. OpenFL is available at: https://github.com/securefederatedai/openfl
Privacy-Enhancing Collaborative Information Sharing through Federated Learning -- A Case of the Insurance Industry
Dong, Panyi, Quan, Zhiyu, Edwards, Brandon, Wang, Shih-han, Feng, Runhuan, Wang, Tianyang, Foley, Patrick, Shah, Prashant
The report demonstrates the benefits (in terms of improved claims loss modeling) of harnessing the value of Federated Learning (FL) to learn a single model across multiple insurance industry datasets without requiring the datasets themselves to be shared from one company to another. The application of FL addresses two of the most pressing concerns: limited data volume and data variety, which are caused by privacy concerns, the rarity of claim events, the lack of informative rating factors, etc.. During each round of FL, collaborators compute improvements on the model using their local private data, and these insights are combined to update a global model. Such aggregation of insights allows for an increase to the effectiveness in forecasting claims losses compared to models individually trained at each collaborator. Critically, this approach enables machine learning collaboration without the need for raw data to leave the compute infrastructure of each respective data owner. Additionally, the open-source framework, OpenFL, that is used in our experiments is designed so that it can be run using confidential computing as well as with additional algorithmic protections against leakage of information via the shared model updates. In such a way, FL is implemented as a privacy-enhancing collaborative learning technique that addresses the challenges posed by the sensitivity and privacy of data in traditional machine learning solutions. This paper's application of FL can also be expanded to other areas including fraud detection, catastrophe modeling, etc., that have a similar need to incorporate data privacy into machine learning collaborations. Our framework and empirical results provide a foundation for future collaborations among insurers, regulators, academic researchers, and InsurTech experts.
GaNDLF: A Generally Nuanced Deep Learning Framework for Scalable End-to-End Clinical Workflows in Medical Imaging
Pati, Sarthak, Thakur, Siddhesh P., Hamamcı, İbrahim Ethem, Baid, Ujjwal, Baheti, Bhakti, Bhalerao, Megh, Güley, Orhun, Mouchtaris, Sofia, Lang, David, Thermos, Spyridon, Gotkowski, Karol, González, Camila, Grenko, Caleb, Getka, Alexander, Edwards, Brandon, Sheller, Micah, Wu, Junwen, Karkada, Deepthi, Panchumarthy, Ravi, Ahluwalia, Vinayak, Zou, Chunrui, Bashyam, Vishnu, Li, Yuemeng, Haghighi, Babak, Chitalia, Rhea, Abousamra, Shahira, Kurc, Tahsin M., Gastounioti, Aimilia, Er, Sezgin, Bergman, Mark, Saltz, Joel H., Fan, Yong, Shah, Prashant, Mukhopadhyay, Anirban, Tsaftaris, Sotirios A., Menze, Bjoern, Davatzikos, Christos, Kontos, Despina, Karargyris, Alexandros, Umeton, Renato, Mattson, Peter, Bakas, Spyridon
Deep Learning (DL) has the potential to optimize machine learning in both the scientific and clinical communities. However, greater expertise is required to develop DL algorithms, and the variability of implementations hinders their reproducibility, translation, and deployment. Here we present the community-driven Generally Nuanced Deep Learning Framework (GaNDLF), with the goal of lowering these barriers. GaNDLF makes the mechanism of DL development, training, and inference more stable, reproducible, interpretable, and scalable, without requiring an extensive technical background. GaNDLF aims to provide an end-to-end solution for all DL-related tasks in computational precision medicine. We demonstrate the ability of GaNDLF to analyze both radiology and histology images, with built-in support for k-fold cross-validation, data augmentation, multiple modalities and output classes. Our quantitative performance evaluation on numerous use cases, anatomies, and computational tasks supports GaNDLF as a robust application framework for deployment in clinical workflows.
Federated Learning Enables Big Data for Rare Cancer Boundary Detection
Pati, Sarthak, Baid, Ujjwal, Edwards, Brandon, Sheller, Micah, Wang, Shih-Han, Reina, G Anthony, Foley, Patrick, Gruzdev, Alexey, Karkada, Deepthi, Davatzikos, Christos, Sako, Chiharu, Ghodasara, Satyam, Bilello, Michel, Mohan, Suyash, Vollmuth, Philipp, Brugnara, Gianluca, Preetha, Chandrakanth J, Sahm, Felix, Maier-Hein, Klaus, Zenk, Maximilian, Bendszus, Martin, Wick, Wolfgang, Calabrese, Evan, Rudie, Jeffrey, Villanueva-Meyer, Javier, Cha, Soonmee, Ingalhalikar, Madhura, Jadhav, Manali, Pandey, Umang, Saini, Jitender, Garrett, John, Larson, Matthew, Jeraj, Robert, Currie, Stuart, Frood, Russell, Fatania, Kavi, Huang, Raymond Y, Chang, Ken, Balana, Carmen, Capellades, Jaume, Puig, Josep, Trenkler, Johannes, Pichler, Josef, Necker, Georg, Haunschmidt, Andreas, Meckel, Stephan, Shukla, Gaurav, Liem, Spencer, Alexander, Gregory S, Lombardo, Joseph, Palmer, Joshua D, Flanders, Adam E, Dicker, Adam P, Sair, Haris I, Jones, Craig K, Venkataraman, Archana, Jiang, Meirui, So, Tiffany Y, Chen, Cheng, Heng, Pheng Ann, Dou, Qi, Kozubek, Michal, Lux, Filip, Michálek, Jan, Matula, Petr, Keřkovský, Miloš, Kopřivová, Tereza, Dostál, Marek, Vybíhal, Václav, Vogelbaum, Michael A, Mitchell, J Ross, Farinhas, Joaquim, Maldjian, Joseph A, Yogananda, Chandan Ganesh Bangalore, Pinho, Marco C, Reddy, Divya, Holcomb, James, Wagner, Benjamin C, Ellingson, Benjamin M, Cloughesy, Timothy F, Raymond, Catalina, Oughourlian, Talia, Hagiwara, Akifumi, Wang, Chencai, To, Minh-Son, Bhardwaj, Sargam, Chong, Chee, Agzarian, Marc, Falcão, Alexandre Xavier, Martins, Samuel B, Teixeira, Bernardo C A, Sprenger, Flávia, Menotti, David, Lucio, Diego R, LaMontagne, Pamela, Marcus, Daniel, Wiestler, Benedikt, Kofler, Florian, Ezhov, Ivan, Metz, Marie, Jain, Rajan, Lee, Matthew, Lui, Yvonne W, McKinley, Richard, Slotboom, Johannes, Radojewski, Piotr, Meier, Raphael, Wiest, Roland, Murcia, Derrick, Fu, Eric, Haas, Rourke, Thompson, John, Ormond, David Ryan, Badve, Chaitra, Sloan, Andrew E, Vadmal, Vachan, Waite, Kristin, Colen, Rivka R, Pei, Linmin, Ak, Murat, Srinivasan, Ashok, Bapuraj, J Rajiv, Rao, Arvind, Wang, Nicholas, Yoshiaki, Ota, Moritani, Toshio, Turk, Sevcan, Lee, Joonsang, Prabhudesai, Snehal, Morón, Fanny, Mandel, Jacob, Kamnitsas, Konstantinos, Glocker, Ben, Dixon, Luke V M, Williams, Matthew, Zampakis, Peter, Panagiotopoulos, Vasileios, Tsiganos, Panagiotis, Alexiou, Sotiris, Haliassos, Ilias, Zacharaki, Evangelia I, Moustakas, Konstantinos, Kalogeropoulou, Christina, Kardamakis, Dimitrios M, Choi, Yoon Seong, Lee, Seung-Koo, Chang, Jong Hee, Ahn, Sung Soo, Luo, Bing, Poisson, Laila, Wen, Ning, Tiwari, Pallavi, Verma, Ruchika, Bareja, Rohan, Yadav, Ipsa, Chen, Jonathan, Kumar, Neeraj, Smits, Marion, van der Voort, Sebastian R, Alafandi, Ahmed, Incekara, Fatih, Wijnenga, Maarten MJ, Kapsas, Georgios, Gahrmann, Renske, Schouten, Joost W, Dubbink, Hendrikus J, Vincent, Arnaud JPE, Bent, Martin J van den, French, Pim J, Klein, Stefan, Yuan, Yading, Sharma, Sonam, Tseng, Tzu-Chi, Adabi, Saba, Niclou, Simone P, Keunen, Olivier, Hau, Ann-Christin, Vallières, Martin, Fortin, David, Lepage, Martin, Landman, Bennett, Ramadass, Karthik, Xu, Kaiwen, Chotai, Silky, Chambless, Lola B, Mistry, Akshitkumar, Thompson, Reid C, Gusev, Yuriy, Bhuvaneshwar, Krithika, Sayah, Anousheh, Bencheqroun, Camelia, Belouali, Anas, Madhavan, Subha, Booth, Thomas C, Chelliah, Alysha, Modat, Marc, Shuaib, Haris, Dragos, Carmen, Abayazeed, Aly, Kolodziej, Kenneth, Hill, Michael, Abbassy, Ahmed, Gamal, Shady, Mekhaimar, Mahmoud, Qayati, Mohamed, Reyes, Mauricio, Park, Ji Eun, Yun, Jihye, Kim, Ho Sung, Mahajan, Abhishek, Muzi, Mark, Benson, Sean, Beets-Tan, Regina G H, Teuwen, Jonas, Herrera-Trujillo, Alejandro, Trujillo, Maria, Escobar, William, Abello, Ana, Bernal, Jose, Gómez, Jhon, Choi, Joseph, Baek, Stephen, Kim, Yusung, Ismael, Heba, Allen, Bryan, Buatti, John M, Kotrotsou, Aikaterini, Li, Hongwei, Weiss, Tobias, Weller, Michael, Bink, Andrea, Pouymayou, Bertrand, Shaykh, Hassan F, Saltz, Joel, Prasanna, Prateek, Shrestha, Sampurna, Mani, Kartik M, Payne, David, Kurc, Tahsin, Pelaez, Enrique, Franco-Maldonado, Heydy, Loayza, Francis, Quevedo, Sebastian, Guevara, Pamela, Torche, Esteban, Mendoza, Cristobal, Vera, Franco, Ríos, Elvis, López, Eduardo, Velastin, Sergio A, Ogbole, Godwin, Oyekunle, Dotun, Odafe-Oyibotha, Olubunmi, Osobu, Babatunde, Shu'aibu, Mustapha, Dorcas, Adeleye, Soneye, Mayowa, Dako, Farouk, Simpson, Amber L, Hamghalam, Mohammad, Peoples, Jacob J, Hu, Ricky, Tran, Anh, Cutler, Danielle, Moraes, Fabio Y, Boss, Michael A, Gimpel, James, Veettil, Deepak Kattil, Schmidt, Kendall, Bialecki, Brian, Marella, Sailaja, Price, Cynthia, Cimino, Lisa, Apgar, Charles, Shah, Prashant, Menze, Bjoern, Barnholtz-Sloan, Jill S, Martin, Jason, Bakas, Spyridon
Although machine learning (ML) has shown promise in numerous domains, there are concerns about generalizability to out-of-sample data. This is currently addressed by centrally sharing ample, and importantly diverse, data from multiple sites. However, such centralization is challenging to scale (or even not feasible) due to various limitations. Federated ML (FL) provides an alternative to train accurate and generalizable ML models, by only sharing numerical model updates. Here we present findings from the largest FL study to-date, involving data from 71 healthcare institutions across 6 continents, to generate an automatic tumor boundary detector for the rare disease of glioblastoma, utilizing the largest dataset of such patients ever used in the literature (25, 256 MRI scans from 6, 314 patients). We demonstrate a 33% improvement over a publicly trained model to delineate the surgically targetable tumor, and 23% improvement over the tumor's entire extent. We anticipate our study to: 1) enable more studies in healthcare informed by large and diverse data, ensuring meaningful results for rare diseases and underrepresented populations, 2) facilitate further quantitative analyses for glioblastoma via performance optimization of our consensus model for eventual public release, and 3) demonstrate the effectiveness of FL at such scale and task complexity as a paradigm shift for multi-site collaborations, alleviating the need for data sharing.