AITopics | Ahmed, Faruk

Collaborating Authors

Ahmed, Faruk

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ModalTune: Fine-Tuning Slide-Level Foundation Models with Multi-Modal Information for Multi-task Learning in Digital Pathology

Ramanathan, Vishwesh, Xu, Tony, Pati, Pushpak, Ahmed, Faruk, Goubran, Maged, Martel, Anne L.

arXiv.org Artificial IntelligenceMar-21-2025

Prediction tasks in digital pathology are challenging due to the massive size of whole-slide images (WSIs) and the weak nature of training signals. Advances in computing, data availability, and self-supervised learning (SSL) have paved the way for slide-level foundation models (SLFMs) that can improve prediction tasks in low-data regimes. However, working with these models is challenging, with issues such as catastrophic forgetting during fine-tuning and under-utilization of shared information between tasks and modalities. To overcome these two challenges, we propose ModalTune, a novel fine-tuning framework which introduces the Modal Adapter to integrate new modalities without modifying SLFM weights. Additionally, we use large-language models (LLMs) to encode labels as text, capturing semantic relationships and enhancing generalization across multiple tasks and cancer types in a single training recipe. ModalTune achieves state-of-the-art (SOTA) results against both uni-modal and multi-modal models across four cancer types, jointly improving survival and cancer subtype prediction while remaining competitive in pan-cancer settings. Additionally, we show ModalTune is highly generalizable to two out-of-distribution (OOD) datasets. To our knowledge, this is the first unified fine-tuning framework for multi-modal, multi-task, and pan-cancer modeling in digital pathology.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2503.17564

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology > Carcinoma (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.87)

Add feedback

PolyPath: Adapting a Large Multimodal Model for Multi-slide Pathology Report Generation

Ahmed, Faruk, Yang, Lin, Jaroensri, Tiam, Sellergren, Andrew, Matias, Yossi, Hassidim, Avinatan, Corrado, Greg S., Webster, Dale R., Shetty, Shravya, Prabhakara, Shruthi, Liu, Yun, Golden, Daniel, Wulczyn, Ellery, Steiner, David F.

arXiv.org Artificial IntelligenceFeb-14-2025

Recent applications of vision-language modeling in digital histopathology have been predominantly designed to generate text describing individual regions of interest extracted from a single digitized histopathology image, or Whole Slide Image (WSI). An emerging line of research approaches the more practical clinical use case of slide-level text generation (Ahmed et al., 2024, Chen et al., 2024). However, in the typical clinical use case, there can be multiple biological tissue parts associated with a case, with each part having multiple slides. Pathologists write up a report summarizing their part-level diagnostic findings by microscopically reviewing each of the available slides per part and integrating information across these slides. This many-to-one relationship of slides to clinical descriptions is a recognized challenge for vision-language modeling in this space (Ahmed et al., 2024). The common approach taken in recent literature is to restrict modeling and analysis to single-slide cases or to manually identify a single slide within a case or part that is most representative of the clinical findings in reports (Ahmed et al., 2024, Chen et al., 2024, Guo et al., 2024, Shaikovski et al., 2024, Xu et al., 2024, Zhou et al., 2024). This strategy of selecting representative slides was also adopted in constructing one of the most widely used histopathology datasets, TCGA (Cooper et al., 2018).

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2502.10536

Country: North America > United States (0.14)

Genre:

Research Report > Experimental Study (0.70)
Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.54)

Add feedback

Health AI Developer Foundations

Kiraly, Atilla P., Baur, Sebastien, Philbrick, Kenneth, Mahvar, Fereshteh, Yatziv, Liron, Chen, Tiffany, Sterling, Bram, George, Nick, Jamil, Fayaz, Tang, Jing, Bailey, Kai, Ahmed, Faruk, Goel, Akshay, Ward, Abbi, Yang, Lin, Sellergren, Andrew, Matias, Yossi, Hassidim, Avinatan, Shetty, Shravya, Golden, Daniel, Azizi, Shekoofeh, Steiner, David F., Liu, Yun, Thelin, Tim, Pilgrim, Rory, Kirmizibayrak, Can

arXiv.org Artificial IntelligenceNov-26-2024

Robust medical Machine Learning (ML) models have the potential to revolutionize healthcare by accelerating clinical research, improving workflows and outcomes, and producing novel insights or capabilities. Developing such ML models from scratch is cost prohibitive and requires substantial compute, data, and time (e.g., expert labeling). To address these challenges, we introduce Health AI Developer Foundations (HAI-DEF), a suite of pre-trained, domain-specific foundation models, tools, and recipes to accelerate building ML for health applications. The models cover various modalities and domains, including radiology (X-rays and computed tomography), histopathology, dermatological imaging, and audio. These models provide domain specific embeddings that facilitate AI development with less labeled data, shorter training times, and reduced computational costs compared to traditional approaches. In addition, we utilize a common interface and style across these models, and prioritize usability to enable developers to integrate HAI-DEF efficiently. We present model evaluations across various tasks and conclude with a discussion of their application and evaluation, covering the importance of ensuring efficacy, fairness, and equity. Finally, while HAI-DEF and specifically the foundation models lower the barrier to entry for ML in healthcare, we emphasize the importance of validation with problem- and population-specific data for each desired usage setting. This technical report will be updated over time as more modalities and features are added.

artificial intelligence, deep learning, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2411.15128

Country: North America > United States (0.14)

Genre:

Research Report > Experimental Study (0.66)
Research Report > New Finding (0.48)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Nuclear Medicine (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

PathAlign: A vision-language model for whole slide images in histopathology

Ahmed, Faruk, Sellergren, Andrew, Yang, Lin, Xu, Shawn, Babenko, Boris, Ward, Abbi, Olson, Niels, Mohtashamian, Arash, Matias, Yossi, Corrado, Greg S., Duong, Quang, Webster, Dale R., Shetty, Shravya, Golden, Daniel, Liu, Yun, Steiner, David F., Wulczyn, Ellery

arXiv.org Artificial IntelligenceJun-27-2024

Microscopic interpretation of histopathology images underlies many important diagnostic and treatment decisions. While advances in vision-language modeling raise new opportunities for analysis of such images, the gigapixel-scale size of whole slide images (WSIs) introduces unique challenges. Additionally, pathology reports simultaneously highlight key findings from small regions while also aggregating interpretation across multiple slides, often making it difficult to create robust image-text pairs. As such, pathology reports remain a largely untapped source of supervision in computational pathology, with most efforts relying on region-of-interest annotations or self-supervision at the patch-level. In this work, we develop a vision-language model based on the BLIP-2 framework using WSIs paired with curated text from pathology reports. This enables applications utilizing a shared image-text embedding space, such as text or image retrieval for finding cases of interest, as well as integration of the WSI encoder with a frozen large language model (LLM) for WSI-based generative text capabilities such as report generation or AI-in-the-loop interactions. We utilize a de-identified dataset of over 350,000 WSIs and diagnostic text pairs, spanning a wide range of diagnoses, procedure types, and tissue types. We present pathologist evaluation of text generation and text retrieval using WSI embeddings, as well as results for WSI classification and workflow prioritization (slide-level triaging). Model-generated text for WSIs was rated by pathologists as accurate, without clinically significant error or omission, for 78% of WSIs on average. This work demonstrates exciting potential capabilities for language-aligned WSI embeddings.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2406.19578

Country: North America > United States > Maryland (0.14)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry:

Health & Medicine > Therapeutic Area > Dermatology (0.93)
Health & Medicine > Therapeutic Area > Obstetrics/Gynecology (0.93)
Health & Medicine > Diagnostic Medicine > Biopsy (0.74)
Health & Medicine > Therapeutic Area > Oncology > Carcinoma (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.46)

Add feedback

Advancing Multimodal Medical Capabilities of Gemini

Yang, Lin, Xu, Shawn, Sellergren, Andrew, Kohlberger, Timo, Zhou, Yuchen, Ktena, Ira, Kiraly, Atilla, Ahmed, Faruk, Hormozdiari, Farhad, Jaroensri, Tiam, Wang, Eric, Wulczyn, Ellery, Jamil, Fayaz, Guidroz, Theo, Lau, Chuck, Qiao, Siyuan, Liu, Yun, Goel, Akshay, Park, Kendall, Agharwal, Arnav, George, Nick, Wang, Yang, Tanno, Ryutaro, Barrett, David G. T., Weng, Wei-Hung, Mahdavi, S. Sara, Saab, Khaled, Tu, Tao, Kalidindi, Sreenivasa Raju, Etemadi, Mozziyar, Cuadros, Jorge, Sorensen, Gregory, Matias, Yossi, Chou, Katherine, Corrado, Greg, Barral, Joelle, Shetty, Shravya, Fleet, David, Eslami, S. M. Ali, Tse, Daniel, Prabhakara, Shruthi, McLean, Cory, Steiner, Dave, Pilgrim, Rory, Kelly, Christopher, Azizi, Shekoofeh, Golden, Daniel

arXiv.org Artificial IntelligenceMay-6-2024

Many clinical tasks require an understanding of specialized data, such as medical images and genomics, which is not typically found in general-purpose large multimodal models. Building upon Gemini's multimodal models, we develop several models within the new Med-Gemini family that inherit core capabilities of Gemini and are optimized for medical use via fine-tuning with 2D and 3D radiology, histopathology, ophthalmology, dermatology and genomic data. Med-Gemini-2D sets a new standard for AI-based chest X-ray (CXR) report generation based on expert evaluation, exceeding previous best results across two separate datasets by an absolute margin of 1% and 12%, where 57% and 96% of AI reports on normal cases, and 43% and 65% on abnormal cases, are evaluated as "equivalent or better" than the original radiologists' reports. We demonstrate the first ever large multimodal model-based report generation for 3D computed tomography (CT) volumes using Med-Gemini-3D, with 53% of AI reports considered clinically acceptable, although additional research is needed to meet expert radiologist reporting quality. Beyond report generation, Med-Gemini-2D surpasses the previous best performance in CXR visual question answering (VQA) and performs well in CXR classification and radiology VQA, exceeding SoTA or baselines on 17 of 20 tasks. In histopathology, ophthalmology, and dermatology image classification, Med-Gemini-2D surpasses baselines across 18 out of 20 tasks and approaches task-specific model performance. Beyond imaging, Med-Gemini-Polygenic outperforms the standard linear polygenic risk score-based approach for disease risk prediction and generalizes to genetically correlated diseases for which it has never been trained. Although further development and evaluation are necessary in the safety-critical medical domain, our results highlight the potential of Med-Gemini across a wide range of medical tasks.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2405.03162

Country:

Asia (0.92)
Europe (0.67)
North America > United States > California (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (1.00)
Health & Medicine > Therapeutic Area > Dermatology (1.00)
Health & Medicine > Nuclear Medicine (1.00)
(2 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(5 more...)

Add feedback

A Comprehensive Literature Review on Sweet Orange Leaf Diseases

Emon, Yousuf Rayhan, Rabbani, Md Golam, Ahad, Dr. Md. Taimur, Ahmed, Faruk

arXiv.org Artificial IntelligenceDec-4-2023

Sweet orange leaf diseases are significant to agricultural productivity. Leaf diseases impact fruit quality in the citrus industry. The apparition of machine learning makes the development of disease finder. Early detection and diagnosis are necessary for leaf management. Sweet orange leaf disease-predicting automated systems have already been developed using different image-processing techniques. This comprehensive literature review is systematically based on leaf disease and machine learning methodologies applied to the detection of damaged leaves via image classification. The benefits and limitations of different machine learning models, including Vision Transformer (ViT), Neural Network (CNN), CNN with SoftMax and RBF SVM, Hybrid CNN-SVM, HLB-ConvMLP, EfficientNet-b0, YOLOv5, YOLOv7, Convolutional, Deep CNN. These machine learning models tested on various datasets and detected the disease. This comprehensive review study related to leaf disease compares the performance of the models; those models' accuracy, precision, recall, etc., were used in the subsisting studies

artificial intelligence, detection, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2312.01756

Country: Asia > Bangladesh (0.15)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.68)
Research Report > New Finding (0.68)

Industry:

Health & Medicine (1.00)
Food & Agriculture > Agriculture (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Machine Learning-Based Tea Leaf Disease Detection: A Comprehensive Review

Ahmed, Faruk, Ahad, Md. Taimur, Emon, Yousuf Rayhan

arXiv.org Artificial IntelligenceNov-6-2023

Tea leaf diseases are a major challenge to agricultural productivity, with far-reaching implications for yield and quality in the tea industry. The rise of machine learning has enabled the development of innovative approaches to combat these diseases. Early detection and diagnosis are crucial for effective crop management. For predicting tea leaf disease, several automated systems have already been developed using different image processing techniques. This paper delivers a systematic review of the literature on machine learning methodologies applied to diagnose tea leaf disease via image classification. It thoroughly evaluates the strengths and constraints of various Vision Transformer models, including Inception Convolutional Vision Transformer (ICVT), GreenViT, PlantXViT, PlantViT, MSCVT, Transfer Learning Model & Vision Transformer (TLMViT), IterationViT, IEM-ViT. Moreover, this paper also reviews models like Dense Convolutional Network (DenseNet), Residual Neural Network (ResNet)-50V2, YOLOv5, YOLOv7, Convolutional Neural Network (CNN), Deep CNN, Non-dominated Sorting Genetic Algorithm (NSGA-II), MobileNetv2, and Lesion-Aware Visual Transformer. These machine-learning models have been tested on various datasets, demonstrating their real-world applicability. This review study not only highlights current progress in the field but also provides valuable insights for future research directions in the machine learning-based detection and classification of tea leaf diseases.

artificial intelligence, machine learning, survey article, (17 more...)

arXiv.org Artificial Intelligence

2311.0324

Country: Asia > Bangladesh (0.15)

Genre:

Research Report > Promising Solution (1.00)
Overview (1.00)

Industry: Consumer Products & Services > Food, Beverage, Tobacco & Cannabis > Beverages (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Integrating Categorical Semantics into Unsupervised Domain Translation

Lavoie-Marchildon, Samuel, Ahmed, Faruk, Courville, Aaron

arXiv.org Machine LearningOct-2-2020

While unsupervised domain translation (UDT) has seen a lot of success recently, we argue that allowing its translation to be mediated via categorical semantic features could enable wider applicability. In particular, we argue that categorical semantics are important when translating between domains with multiple object categories possessing distinctive styles, or even between domains that are simply too different but still share high-level semantics. We propose a method to learn, in an unsupervised manner, categorical semantic features (such as object labels) that are invariant of the source and target domains. We show that conditioning the style of a unsupervised domain translation methods on the learned categorical semantics leads to a considerably better high-level features preservation on tasks such as MNIST$\leftrightarrow$SVHN and to a more realistic stylization on Sketches$\to$Reals.

deep learning, neural network, representation, (19 more...)

arXiv.org Machine Learning

2010.01262

Country:

Europe (0.46)
North America > United States > New York (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Improved Training of Wasserstein GANs

Gulrajani, Ishaan, Ahmed, Faruk, Arjovsky, Martin, Dumoulin, Vincent, Courville, Aaron C.

Neural Information Processing SystemsDec-31-2017

Generative Adversarial Networks (GANs) are powerful generative models, but suffer from training instability. The recently proposed Wasserstein GAN (WGAN) makes progress toward stable training of GANs, but sometimes can still generate only poor samples or fail to converge. We find that these problems are often due to the use of weight clipping in WGAN to enforce a Lipschitz constraint on the critic, which can lead to undesired behavior. We propose an alternative to clipping weights: penalize the norm of gradient of the critic with respect to its input. Our proposed method performs better than standard WGAN and enables stable training of a wide variety of GAN architectures with almost no hyperparameter tuning, including 101-layer ResNets and language models with continuous generators. We also achieve high quality generations on CIFAR-10 and LSUN bedrooms.

artificial intelligence, arxiv preprint arxiv, neural network, (15 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.14)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Improved Training of Wasserstein GANs

Gulrajani, Ishaan, Ahmed, Faruk, Arjovsky, Martin, Dumoulin, Vincent, Courville, Aaron

arXiv.org Machine LearningDec-25-2017

Generative Adversarial Networks (GANs) are powerful generative models, but suffer from training instability. The recently proposed Wasserstein GAN (WGAN) makes progress toward stable training of GANs, but sometimes can still generate only low-quality samples or fail to converge. We find that these problems are often due to the use of weight clipping in WGAN to enforce a Lipschitz constraint on the critic, which can lead to undesired behavior. We propose an alternative to clipping weights: penalize the norm of gradient of the critic with respect to its input. Our proposed method performs better than standard WGAN and enables stable training of a wide variety of GAN architectures with almost no hyperparameter tuning, including 101-layer ResNets and language models over discrete data. We also achieve high quality generations on CIFAR-10 and LSUN bedrooms.

architecture, artificial intelligence, neural network, (15 more...)

arXiv.org Machine Learning

1704.00028

Country: North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback