Yang, Lin
Advancing Multimodal Medical Capabilities of Gemini
Yang, Lin, Xu, Shawn, Sellergren, Andrew, Kohlberger, Timo, Zhou, Yuchen, Ktena, Ira, Kiraly, Atilla, Ahmed, Faruk, Hormozdiari, Farhad, Jaroensri, Tiam, Wang, Eric, Wulczyn, Ellery, Jamil, Fayaz, Guidroz, Theo, Lau, Chuck, Qiao, Siyuan, Liu, Yun, Goel, Akshay, Park, Kendall, Agharwal, Arnav, George, Nick, Wang, Yang, Tanno, Ryutaro, Barrett, David G. T., Weng, Wei-Hung, Mahdavi, S. Sara, Saab, Khaled, Tu, Tao, Kalidindi, Sreenivasa Raju, Etemadi, Mozziyar, Cuadros, Jorge, Sorensen, Gregory, Matias, Yossi, Chou, Katherine, Corrado, Greg, Barral, Joelle, Shetty, Shravya, Fleet, David, Eslami, S. M. Ali, Tse, Daniel, Prabhakara, Shruthi, McLean, Cory, Steiner, Dave, Pilgrim, Rory, Kelly, Christopher, Azizi, Shekoofeh, Golden, Daniel
Many clinical tasks require an understanding of specialized data, such as medical images and genomics, which is not typically found in general-purpose large multimodal models. Building upon Gemini's multimodal models, we develop several models within the new Med-Gemini family that inherit core capabilities of Gemini and are optimized for medical use via fine-tuning with 2D and 3D radiology, histopathology, ophthalmology, dermatology and genomic data. Med-Gemini-2D sets a new standard for AI-based chest X-ray (CXR) report generation based on expert evaluation, exceeding previous best results across two separate datasets by an absolute margin of 1% and 12%, where 57% and 96% of AI reports on normal cases, and 43% and 65% on abnormal cases, are evaluated as "equivalent or better" than the original radiologists' reports. We demonstrate the first ever large multimodal model-based report generation for 3D computed tomography (CT) volumes using Med-Gemini-3D, with 53% of AI reports considered clinically acceptable, although additional research is needed to meet expert radiologist reporting quality. Beyond report generation, Med-Gemini-2D surpasses the previous best performance in CXR visual question answering (VQA) and performs well in CXR classification and radiology VQA, exceeding SoTA or baselines on 17 of 20 tasks. In histopathology, ophthalmology, and dermatology image classification, Med-Gemini-2D surpasses baselines across 18 out of 20 tasks and approaches task-specific model performance. Beyond imaging, Med-Gemini-Polygenic outperforms the standard linear polygenic risk score-based approach for disease risk prediction and generalizes to genetically correlated diseases for which it has never been trained. Although further development and evaluation are necessary in the safety-critical medical domain, our results highlight the potential of Med-Gemini across a wide range of medical tasks.
Achieving Near-Optimal Regret for Bandit Algorithms with Uniform Last-Iterate Guarantee
Liu, Junyan, Li, Yunfan, Yang, Lin
Existing performance measures for bandit algorithms such as regret, PAC bounds, or uniform-PAC (Dann et al., 2017), typically evaluate the cumulative performance, while allowing the play of an arbitrarily bad arm at any finite time t. Such a behavior can be highly detrimental in high-stakes applications. This paper introduces a stronger performance measure, the uniform last-iterate (ULI) guarantee, capturing both cumulative and instantaneous performance of bandit algorithms. Specifically, ULI characterizes the instantaneous performance since it ensures that the per-round regret of the played arm is bounded by a function, monotonically decreasing w.r.t. (large) round t, preventing revisits to bad arms when sufficient samples are available. We demonstrate that a near-optimal ULI guarantee directly implies near-optimal cumulative performance across aforementioned performance measures. To examine the achievability of ULI in the finite arm setting, we first provide two positive results that some elimination-based algorithms and high-probability adversarial algorithms with stronger analysis or additional designs, can attain near-optimal ULI guarantees. Then, we also provide a negative result, indicating that optimistic algorithms cannot achieve a near-optimal ULI guarantee. Finally, we propose an efficient algorithm for linear bandits with infinitely many arms, which achieves the ULI guarantee, given access to an optimization oracle.
On the Model-Misspecification in Reinforcement Learning
Li, Yunfan, Yang, Lin
The success of reinforcement learning (RL) crucially depends on effective function approximation when dealing with complex ground-truth models. Existing sample-efficient RL algorithms primarily employ three approaches to function approximation: policy-based, value-based, and model-based methods. However, in the face of model misspecification (a disparity between the ground-truth and optimal function approximators), it is shown that policy-based approaches can be robust even when the policy function approximation is under a large locally-bounded misspecification error, with which the function class may exhibit a $\Omega(1)$ approximation error in specific states and actions, but remains small on average within a policy-induced state distribution. Yet it remains an open question whether similar robustness can be achieved with value-based and model-based approaches, especially with general function approximation. To bridge this gap, in this paper we present a unified theoretical framework for addressing model misspecification in RL. We demonstrate that, through meticulous algorithm design and sophisticated analysis, value-based and model-based methods employing general function approximation can achieve robustness under local misspecification error bounds. In particular, they can attain a regret bound of $\widetilde{O}\left(\text{poly}(d H)(\sqrt{K} + K\zeta) \right)$, where $d$ represents the complexity of the function class, $H$ is the episode length, $K$ is the total number of episodes, and $\zeta$ denotes the local bound for misspecification error. Furthermore, we propose an algorithmic framework that can achieve the same order of regret bound without prior knowledge of $\zeta$, thereby enhancing its practical applicability.
MI-Gen: Multiple Instance Generation of Pathology Reports for Gigapixel Whole-Slide Images
Chen, Pingyi, Li, Honglin, Zhu, Chenglu, Zheng, Sunyi, Yang, Lin
Whole slide images are the foundation of digital pathology for the diagnosis and treatment of carcinomas. Writing pathology reports is laborious and error-prone for inexperienced pathologists. To reduce the workload and improve clinical automation, we investigate how to generate pathology reports given whole slide images. On the data end, we curated the largest WSI-text dataset (TCGA-PathoText). In specific, we collected nearly 10000 high-quality WSI-text pairs for visual-language models by recognizing and cleaning pathology reports which narrate diagnostic slides in TCGA. On the model end, we propose the multiple instance generative model (MI-Gen) which can produce pathology reports for gigapixel WSIs. We benchmark our model on the largest subset of TCGA-PathoText. Experimental results show our model can generate pathology reports which contain multiple clinical clues. Furthermore, WSI-text prediction can be seen as an approach of visual-language pre-training, which enables our model to be transferred to downstream diagnostic tasks like carcinoma grading and phenotyping. We observe that simple semantic extraction from the pathology reports can achieve the best performance (0.838 of F1 score) on BRCA subtyping without adding extra parameters or tricky fine-tuning. Our collected dataset and related code will all be publicly available.
Efficient Robust Bayesian Optimization for Arbitrary Uncertain Inputs
Yang, Lin, Lyu, Junlong, Lyu, Wenlong, Chen, Zhitang
Bayesian Optimization (BO) is a sample-efficient optimization algorithm widely employed across various applications. In some challenging BO tasks, input uncertainty arises due to the inevitable randomness in the optimization process, such as machining errors, execution noise, or contextual variability. This uncertainty deviates the input from the intended value before evaluation, resulting in significant performance fluctuations in the final result. In this paper, we introduce a novel robust Bayesian Optimization algorithm, AIRBO, which can effectively identify a robust optimum that performs consistently well under arbitrary input uncertainty. Our method directly models the uncertain inputs of arbitrary distributions by empowering the Gaussian Process with the Maximum Mean Discrepancy (MMD) and further accelerates the posterior inference via Nystrom approximation. Rigorous theoretical regret bound is established under MMD estimation error and extensive experiments on synthetic functions and real problems demonstrate that our approach can handle various input uncertainties and achieve state-of-the-art performance.
Biomedical image analysis competitions: The state of current participation practice
Eisenmann, Matthias, Reinke, Annika, Weru, Vivienn, Tizabi, Minu Dietlinde, Isensee, Fabian, Adler, Tim J., Godau, Patrick, Cheplygina, Veronika, Kozubek, Michal, Ali, Sharib, Gupta, Anubha, Kybic, Jan, Noble, Alison, de Solรณrzano, Carlos Ortiz, Pachade, Samiksha, Petitjean, Caroline, Sage, Daniel, Wei, Donglai, Wilden, Elizabeth, Alapatt, Deepak, Andrearczyk, Vincent, Baid, Ujjwal, Bakas, Spyridon, Balu, Niranjan, Bano, Sophia, Bawa, Vivek Singh, Bernal, Jorge, Bodenstedt, Sebastian, Casella, Alessandro, Choi, Jinwook, Commowick, Olivier, Daum, Marie, Depeursinge, Adrien, Dorent, Reuben, Egger, Jan, Eichhorn, Hannah, Engelhardt, Sandy, Ganz, Melanie, Girard, Gabriel, Hansen, Lasse, Heinrich, Mattias, Heller, Nicholas, Hering, Alessa, Huaulmรฉ, Arnaud, Kim, Hyunjeong, Landman, Bennett, Li, Hongwei Bran, Li, Jianning, Ma, Jun, Martel, Anne, Martรญn-Isla, Carlos, Menze, Bjoern, Nwoye, Chinedu Innocent, Oreiller, Valentin, Padoy, Nicolas, Pati, Sarthak, Payette, Kelly, Sudre, Carole, van Wijnen, Kimberlin, Vardazaryan, Armine, Vercauteren, Tom, Wagner, Martin, Wang, Chuanbo, Yap, Moi Hoon, Yu, Zeyun, Yuan, Chun, Zenk, Maximilian, Zia, Aneeq, Zimmerer, David, Bao, Rina, Choi, Chanyeol, Cohen, Andrew, Dzyubachyk, Oleh, Galdran, Adrian, Gan, Tianyuan, Guo, Tianqi, Gupta, Pradyumna, Haithami, Mahmood, Ho, Edward, Jang, Ikbeom, Li, Zhili, Luo, Zhengbo, Lux, Filip, Makrogiannis, Sokratis, Mรผller, Dominik, Oh, Young-tack, Pang, Subeen, Pape, Constantin, Polat, Gorkem, Reed, Charlotte Rosalie, Ryu, Kanghyun, Scherr, Tim, Thambawita, Vajira, Wang, Haoyu, Wang, Xinliang, Xu, Kele, Yeh, Hung, Yeo, Doyeob, Yuan, Yixuan, Zeng, Yan, Zhao, Xin, Abbing, Julian, Adam, Jannes, Adluru, Nagesh, Agethen, Niklas, Ahmed, Salman, Khalil, Yasmina Al, Alenyร , Mireia, Alhoniemi, Esa, An, Chengyang, Anwar, Talha, Arega, Tewodros Weldebirhan, Avisdris, Netanell, Aydogan, Dogu Baran, Bai, Yingbin, Calisto, Maria Baldeon, Basaran, Berke Doga, Beetz, Marcel, Bian, Cheng, Bian, Hao, Blansit, Kevin, Bloch, Louise, Bohnsack, Robert, Bosticardo, Sara, Breen, Jack, Brudfors, Mikael, Brรผngel, Raphael, Cabezas, Mariano, Cacciola, Alberto, Chen, Zhiwei, Chen, Yucong, Chen, Daniel Tianming, Cho, Minjeong, Choi, Min-Kook, Xie, Chuantao Xie Chuantao, Cobzas, Dana, Cohen-Adad, Julien, Acero, Jorge Corral, Das, Sujit Kumar, de Oliveira, Marcela, Deng, Hanqiu, Dong, Guiming, Doorenbos, Lars, Efird, Cory, Escalera, Sergio, Fan, Di, Serj, Mehdi Fatan, Fenneteau, Alexandre, Fidon, Lucas, Filipiak, Patryk, Finzel, Renรฉ, Freitas, Nuno R., Friedrich, Christoph M., Fulton, Mitchell, Gaida, Finn, Galati, Francesco, Galazis, Christoforos, Gan, Chang Hee, Gao, Zheyao, Gao, Shengbo, Gazda, Matej, Gerats, Beerend, Getty, Neil, Gibicar, Adam, Gifford, Ryan, Gohil, Sajan, Grammatikopoulou, Maria, Grzech, Daniel, Gรผley, Orhun, Gรผnnemann, Timo, Guo, Chunxu, Guy, Sylvain, Ha, Heonjin, Han, Luyi, Han, Il Song, Hatamizadeh, Ali, He, Tian, Heo, Jimin, Hitziger, Sebastian, Hong, SeulGi, Hong, SeungBum, Huang, Rian, Huang, Ziyan, Huellebrand, Markus, Huschauer, Stephan, Hussain, Mustaffa, Inubushi, Tomoo, Polat, Ece Isik, Jafaritadi, Mojtaba, Jeong, SeongHun, Jian, Bailiang, Jiang, Yuanhong, Jiang, Zhifan, Jin, Yueming, Joshi, Smriti, Kadkhodamohammadi, Abdolrahim, Kamraoui, Reda Abdellah, Kang, Inha, Kang, Junghwa, Karimi, Davood, Khademi, April, Khan, Muhammad Irfan, Khan, Suleiman A., Khantwal, Rishab, Kim, Kwang-Ju, Kline, Timothy, Kondo, Satoshi, Kontio, Elina, Krenzer, Adrian, Kroviakov, Artem, Kuijf, Hugo, Kumar, Satyadwyoom, La Rosa, Francesco, Lad, Abhi, Lee, Doohee, Lee, Minho, Lena, Chiara, Li, Hao, Li, Ling, Li, Xingyu, Liao, Fuyuan, Liao, KuanLun, Oliveira, Arlindo Limede, Lin, Chaonan, Lin, Shan, Linardos, Akis, Linguraru, Marius George, Liu, Han, Liu, Tao, Liu, Di, Liu, Yanling, Lourenรงo-Silva, Joรฃo, Lu, Jingpei, Lu, Jiangshan, Luengo, Imanol, Lund, Christina B., Luu, Huan Minh, Lv, Yi, Lv, Yi, Macar, Uzay, Maechler, Leon, L., Sina Mansour, Marshall, Kenji, Mazher, Moona, McKinley, Richard, Medela, Alfonso, Meissen, Felix, Meng, Mingyuan, Miller, Dylan, Mirjahanmardi, Seyed Hossein, Mishra, Arnab, Mitha, Samir, Mohy-ud-Din, Hassan, Mok, Tony Chi Wing, Murugesan, Gowtham Krishnan, Karthik, Enamundram Naga, Nalawade, Sahil, Nalepa, Jakub, Naser, Mohamed, Nateghi, Ramin, Naveed, Hammad, Nguyen, Quang-Minh, Quoc, Cuong Nguyen, Nichyporuk, Brennan, Oliveira, Bruno, Owen, David, Pal, Jimut Bahan, Pan, Junwen, Pan, Wentao, Pang, Winnie, Park, Bogyu, Pawar, Vivek, Pawar, Kamlesh, Peven, Michael, Philipp, Lena, Pieciak, Tomasz, Plotka, Szymon, Plutat, Marcel, Pourakpour, Fattaneh, Preloลพnik, Domen, Punithakumar, Kumaradevan, Qayyum, Abdul, Queirรณs, Sandro, Rahmim, Arman, Razavi, Salar, Ren, Jintao, Rezaei, Mina, Rico, Jonathan Adam, Rieu, ZunHyan, Rink, Markus, Roth, Johannes, Ruiz-Gonzalez, Yusely, Saeed, Numan, Saha, Anindo, Salem, Mostafa, Sanchez-Matilla, Ricardo, Schilling, Kurt, Shao, Wei, Shen, Zhiqiang, Shi, Ruize, Shi, Pengcheng, Sobotka, Daniel, Soulier, Thรฉodore, Fadida, Bella Specktor, Stoyanov, Danail, Mun, Timothy Sum Hon, Sun, Xiaowu, Tao, Rong, Thaler, Franz, Thรฉberge, Antoine, Thielke, Felix, Torres, Helena, Wahid, Kareem A., Wang, Jiacheng, Wang, YiFei, Wang, Wei, Wang, Xiong, Wen, Jianhui, Wen, Ning, Wodzinski, Marek, Wu, Ye, Xia, Fangfang, Xiang, Tianqi, Xiaofei, Chen, Xu, Lizhan, Xue, Tingting, Yang, Yuxuan, Yang, Lin, Yao, Kai, Yao, Huifeng, Yazdani, Amirsaeed, Yip, Michael, Yoo, Hwanseung, Yousefirizi, Fereshteh, Yu, Shunkai, Yu, Lei, Zamora, Jonathan, Zeineldin, Ramy Ashraf, Zeng, Dewen, Zhang, Jianpeng, Zhang, Bokai, Zhang, Jiapeng, Zhang, Fan, Zhang, Huahong, Zhao, Zhongchen, Zhao, Zixuan, Zhao, Jiachen, Zhao, Can, Zheng, Qingshuo, Zhi, Yuheng, Zhou, Ziqi, Zou, Baosheng, Maier-Hein, Klaus, Jรคger, Paul F., Kopp-Schneider, Annette, Maier-Hein, Lena
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
On-Demand Communication for Asynchronous Multi-Agent Bandits
Chen, Yu-Zhen Janice, Yang, Lin, Wang, Xuchuang, Liu, Xutong, Hajiesmaili, Mohammad, Lui, John C. S., Towsley, Don
This paper studies a cooperative multi-agent multi-armed stochastic bandit problem where agents operate asynchronously -- agent pull times and rates are unknown, irregular, and heterogeneous -- and face the same instance of a K-armed bandit problem. Agents can share reward information to speed up the learning process at additional communication costs. We propose ODC, an on-demand communication protocol that tailors the communication of each pair of agents based on their empirical pull times. ODC is efficient when the pull times of agents are highly heterogeneous, and its communication complexity depends on the empirical pull times of agents. ODC is a generic protocol that can be integrated into most cooperative bandit algorithms without degrading their performance. We then incorporate ODC into the natural extensions of UCB and AAE algorithms and propose two communication-efficient cooperative algorithms. Our analysis shows that both algorithms are near-optimal in regret.
Try with Simpler -- An Evaluation of Improved Principal Component Analysis in Log-based Anomaly Detection
Yang, Lin, Chen, Junjie, Gong, Zhihao, Gao, Shutao, Zhang, Hongyu, Kang, Yue, Li, Huaan
The rapid growth of deep learning (DL) has spurred interest in enhancing log-based anomaly detection. This approach aims to extract meaning from log events (log message templates) and develop advanced DL models for anomaly detection. However, these DL methods face challenges like heavy reliance on training data, labels, and computational resources due to model complexity. In contrast, traditional machine learning and data mining techniques are less data-dependent and more efficient but less effective than DL. To make log-based anomaly detection more practical, the goal is to enhance traditional techniques to match DL's effectiveness. Previous research in a different domain (linking questions on Stack Overflow) suggests that optimized traditional techniques can rival state-of-the-art DL methods. Drawing inspiration from this concept, we conducted an empirical study. We optimized the unsupervised PCA (Principal Component Analysis), a traditional technique, by incorporating lightweight semantic-based log representation. This addresses the issue of unseen log events in training data, enhancing log representation. Our study compared seven log-based anomaly detection methods, including four DL-based, two traditional, and the optimized PCA technique, using public and industrial datasets. Results indicate that the optimized unsupervised PCA technique achieves similar effectiveness to advanced supervised/semi-supervised DL methods while being more stable with limited training data and resource-efficient. This demonstrates the adaptability and strength of traditional techniques through small yet impactful adaptations.
Exploring Unsupervised Cell Recognition with Prior Self-activation Maps
Chen, Pingyi, Zhu, Chenglu, Shui, Zhongyi, Cai, Jiatong, Zheng, Sunyi, Zhang, Shichuan, Yang, Lin
The success of supervised deep learning models on cell recognition tasks relies on detailed annotations. Many previous works have managed to reduce the dependency on labels. However, considering the large number of cells contained in a patch, costly and inefficient labeling is still inevitable. To this end, we explored label-free methods for cell recognition. Prior self-activation maps (PSM) are proposed to generate pseudo masks as training targets. To be specific, an activation network is trained with self-supervised learning. The gradient information in the shallow layers of the network is aggregated to generate prior self-activation maps. Afterward, a semantic clustering module is then introduced as a pipeline to transform PSMs to pixel-level semantic pseudo masks for downstream tasks. We evaluated our method on two histological datasets: MoNuSeg (cell segmentation) and BCData (multi-class cell detection). Compared with other fully-supervised and weakly-supervised methods, our method can achieve competitive performance without any manual annotations. Our simple but effective framework can also achieve multi-class cell detection which can not be done by existing unsupervised methods. The results show the potential of PSMs that might inspire other research to deal with the hunger for labels in medical area.
Cooperative Multi-agent Bandits: Distributed Algorithms with Optimal Individual Regret and Constant Communication Costs
Yang, Lin, Wang, Xuchuang, Hajiesmaili, Mohammad, Zhang, Lijun, Lui, John C. S., Towsley, Don
Recently, there has been extensive study of cooperative multi-agent multi-armed bandits where a set of distributed agents cooperatively play the same multi-armed bandit game. The goal is to develop bandit algorithms with the optimal group and individual regrets and low communication between agents. The prior work tackled this problem using two paradigms: leader-follower and fully distributed algorithms. Prior algorithms in both paradigms achieve the optimal group regret. The leader-follower algorithms achieve constant communication costs but fail to achieve optimal individual regrets. The state-of-the-art fully distributed algorithms achieve optimal individual regrets but fail to achieve constant communication costs. This paper presents a simple yet effective communication policy and integrates it into a learning algorithm for cooperative bandits. Our algorithm achieves the best of both paradigms: optimal individual regret and constant communication costs.