AITopics | slide image

Collaborating Authors

slide image

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Dual-Curriculum Contrastive Multi-Instance Learning for Cancer Prognosis Analysis with Whole Slide Images

Neural Information Processing SystemsDec-25-2025, 03:51:53 GMT

The multi-instance learning (MIL) has advanced cancer prognosis analysis with whole slide images (WSIs). However, current MIL methods for WSI analysis still confront unique challenges. Previous methods typically generate instance representations via a pre-trained model or a model trained by the instances with bag-level annotations, which, however, may not generalize well to the downstream task due to the introduction of excessive label noises and the lack of fine-grained information across multi-magnification WSIs. Additionally, existing methods generally aggregate instance representations as bag ones for prognosis prediction and have no consideration of intra-bag redundancy and inter-bag discrimination. To address these issues, we propose a dual-curriculum contrastive MIL method for cancer prognosis analysis with WSIs. The proposed method consists of two curriculums, i.e., saliency-guided weakly-supervised instance encoding with cross-scale tiles and contrastive-enhanced soft-bag prognosis inference. Extensive experiments on three public datasets demonstrate that our method outperforms state-of-the-art methods in this field.

cancer prognosis analysis, dual-curriculum contrastive multi-instance learning, prognosis analysis, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.43)

Add feedback

BOOM: Beyond Only One Modality KIT's Multimodal Multilingual Lecture Companion

Koneru, Sai, Retkowski, Fabian, Huber, Christian, Hilgert, Lukas, Akti, Seymanur, Ugan, Enes Yavuz, Waibel, Alexander, Niehues, Jan

arXiv.org Artificial IntelligenceDec-3-2025

The globalization of education and rapid growth of online learning have made localizing educational content a critical challenge. Lecture materials are inherently multimodal, combining spoken audio with visual slides, which requires systems capable of processing multiple input modalities. To provide an accessible and complete learning experience, translations must preserve all modalities: text for reading, slides for visual understanding, and speech for auditory learning. We present \textbf{BOOM}, a multimodal multilingual lecture companion that jointly translates lecture audio and slides to produce synchronized outputs across three modalities: translated text, localized slides with preserved visual elements, and synthesized speech. This end-to-end approach enables students to access lectures in their native language while aiming to preserve the original content in its entirety. Our experiments demonstrate that slide-aware transcripts also yield cascading benefits for downstream tasks such as summarization and question answering. We release our Slide Translation code at https://github.com/saikoneru/image-translator and integrate it in Lecture Translator at https://gitlab.kit.edu/kit/isl-ai4lt/lt-middleware/ltpipeline}\footnote{All released code and models are licensed under the MIT License.

large language model, machine learning, translation, (18 more...)

arXiv.org Artificial Intelligence

2512.02817

Country:

Asia > Middle East > UAE (0.46)
North America > United States (0.28)
Europe > Austria (0.28)

Genre: Instructional Material > Course Syllabus & Notes (0.49)

Industry: Education > Educational Setting (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(2 more...)

Add feedback

Presenting a Paper is an Art: Self-Improvement Aesthetic Agents for Academic Presentations

Liu, Chengzhi, Yang, Yuzhe, Zhou, Kaiwen, Zhang, Zhen, Fan, Yue, Xie, Yanan, Qi, Peng, Wang, Xin Eric

arXiv.org Artificial IntelligenceOct-23-2025

The promotion of academic papers has become an important means of enhancing research visibility. However, existing automated methods struggle limited storytelling, insufficient aesthetic quality, and constrained self-adjustment, making it difficult to achieve efficient and engaging dissemination. At the heart of those challenges is a simple principle: \emph{there is no way to improve it when you cannot evaluate it right}. To address this, we introduce \textbf{EvoPresent}, a self-improvement agent framework that unifies coherent narratives, aesthetic-aware designs, and realistic presentation delivery via virtual characters. Central to EvoPresent is \textbf{PresAesth}, a multi-task reinforcement learning (RL) aesthetic model that provides reliable aesthetic scoring, defect adjustment, and comparative feedback, enabling iterative self-improvement even under limited aesthetic training data. To systematically evaluate the methods, we introduce \textbf{EvoPresent Benchmark}, a comprehensive benchmark comprising: \textit{Presentation Generation Quality}, built on 650 top-tier AI conference papers with multimodal resources (slides, videos and scripts) to assess both content and design; and \textit{Aesthetic Awareness}, consisting of 2,000 slide pairs with varying aesthetic levels, supporting joint training and evaluation on scoring, defect adjustment, and comparison. Our findings highlight that (i) High-quality feedback is essential for agent self-improvement, while initial capability alone does not guarantee effective self-correction. (ii) Automated generation pipelines exhibit a trade-off between visual design and content construction. (iii) Multi-task RL training shows stronger generalization in aesthetic awareness tasks.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2510.05571

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

4e5f5e4504759e3957e3eef2a44a535e-Paper-Conference.pdf

Neural Information Processing SystemsOct-8-2025, 16:08:52 GMT

artificial intelligence, deep learning, machine learning, (13 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.14)
Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
(3 more...)

Genre: Research Report (0.68)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Diagnostic Medicine (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Sensing and Signal Processing > Image Processing (0.68)

Add feedback

Efficient Whole Slide Pathology VQA via Token Compression

Lyu, Weimin, Hu, Qingqiao, Qi, Kehan, Shi, Zhan, Huang, Wentao, Gupta, Saumya, Chen, Chao

arXiv.org Artificial IntelligenceOct-3-2025

Whole-slide images (WSIs) in pathology can reach up to 10,000 x 10,000 pixels, posing significant challenges for multimodal large language model (MLLM) due to long context length and high computational demands. Previous methods typically focus on patch-level analysis or slide-level classification using CLIP-based models with multi-instance learning, but they lack the generative capabilities needed for visual question answering (VQA). More recent MLLM-based approaches address VQA by feeding thousands of patch tokens directly into the language model, which leads to excessive resource consumption. To address these limitations, we propose Token Compression Pathology LLaVA (TCP-LLaVA), the first MLLM architecture to perform WSI VQA via token compression. TCP-LLaVA introduces a set of trainable compression tokens that aggregate visual and textual information through a modality compression module, inspired by the [CLS] token mechanism in BERT. Only the compressed tokens are forwarded to the LLM for answer generation, significantly reducing input length and computational cost. Experiments on ten TCGA tumor subtypes show that TCP-LLaVA outperforms existing MLLM baselines in VQA accuracy while reducing training resource consumption by a substantial margin.

large language model, machine learning, question answering, (19 more...)

arXiv.org Artificial Intelligence

2507.14497

Genre: Research Report (1.00)

Industry:

Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.73)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.71)

Add feedback

A Single Detect Focused YOLO Framework for Robust Mitotic Figure Detection

Topuz, Yasemin, Gökcan, M. Taha, Yıldız, Serdar, Varlı, Songül

arXiv.org Artificial IntelligenceSep-4-2025

Mitotic figure detection is a crucial task in computational pathology, as mitotic activity serves as a strong prognostic marker for tumor aggressiveness. However, domain variability that arises from differences in scanners, tissue types, and staining protocols poses a major challenge to the robustness of automated detection methods. In this study, we introduce SDF-YOLO (Single Detect Focused YOLO), a lightweight yet domain-robust detection framework designed specifically for small, rare targets such as mitotic figures. The model builds on YOLOv11 with task-specific modifications, including a single detection head aligned with mitotic figure scale, coordinate attention to enhance positional sensitivity, and improved cross-channel feature mixing. Experiments were conducted on three datasets that span human and canine tumors: MIDOG ++, canine cutaneous mast cell tumor (CCMCT), and canine mammary carcinoma (CMC). When submitted to the preliminary test set for the MIDOG2025 challenge, SDF-YOLO achieved an average precision (AP) of 0.799, with a precision of 0.758, a recall of 0.775, an F1 score of 0.766, and an FROC-AUC of 5.793, demonstrating both competitive accuracy and computational efficiency. These results indicate that SDF-YOLO provides a reliable and efficient framework for robust mitotic figure detection across diverse domains.

artificial intelligence, dataset, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2509.02637

Country:

North America > United States (0.14)
Asia > Middle East > Republic of Türkiye (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.34)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.34)

Add feedback

The Next Layer: Augmenting Foundation Models with Structure-Preserving and Attention-Guided Learning for Local Patches to Global Context Awareness in Computational Pathology

Waqas, Muhammad, Bandyopadhyay, Rukhmini, Showkatian, Eman, Muneer, Amgad, Zafar, Anas, Alvarez, Frank Rojas, Marin, Maricel Corredor, Li, Wentao, Jaffray, David, Haymaker, Cara, Heymach, John, Vokes, Natalie I, Soto, Luisa Maren Solis, Zhang, Jianjun, Wu, Jia

arXiv.org Machine LearningAug-28-2025

Foundation models have recently emerged as powerful feature extractors in computational pathology, yet they typically omit mechanisms for leveraging the global spatial structure of tissues and the local contextual relationships among diagnostically relevan t regions -- key elements for understanding the tumor microenvironment. Multiple instance learning (MIL) remains an essential next step following foundation model, designing a framework to aggregate patch - level features into slide - level predictions. We presen t EAGLE - Net, a structure - preserving, attention - guided MIL architecture designed to augment prediction and interpretability. EAGLE - Net integrates multi - scale absolute spatial encoding to capture global tissue architecture, a top - K neighborhood - aware loss to focus attention on local microenvironments, and background suppression loss to minimize false positives. We benchmarked EAGLE - Net on large pan - cancer datasets, including three cancer types for classification (10,260 slides) and seven cancer types for surv ival prediction (4,172 slides), using three distinct histology foundation backbones (REMEDIES, Uni - V1, Uni2 - h). Across tasks, EAGLE - Net achieved up to 3% higher classification accuracy and the top concordance indices in 6 of 7 cancer types, producing smoot h, biologically coherent attention maps that aligned with expert annotations and highlighted invasive fronts, necrosis, and immune infiltration. These results position EAGLE - Net as a generalizable, interpretable framework that complements foundation models, enabling improved biomarker discovery, prognostic modeling, and clinical decision support.

artificial intelligence, eagle, machine learning, (18 more...)

arXiv.org Machine Learning

2508.19914

Country:

North America > United States > Texas > Harris County > Houston (0.05)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > France > Grand Est > Bas-Rhin > Strasbourg (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.68)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

GNN-ViTCap: GNN-Enhanced Multiple Instance Learning with Vision Transformers for Whole Slide Image Classification and Captioning

Raju, S M Taslim Uddin, Islam, Md. Milon, Haque, Md Rezwanul, Altaheri, Hamdi, Karray, Fakhri

arXiv.org Artificial IntelligenceJul-10-2025

Microscopic assessment of histopathology images is vital for accurate cancer diagnosis and treatment. Whole Slide Image (WSI) classification and captioning have become crucial tasks in computer-aided pathology. However, microscopic WSI face challenges such as redundant patches and unknown patch positions due to subjective pathologist captures. Moreover, generating automatic pathology captions remains a significant challenge. To address these issues, we introduce a novel GNN-ViTCap framework for classification and caption generation from histopathological microscopic images. First, a visual feature extractor generates patch embeddings. Redundant patches are then removed by dynamically clustering these embeddings using deep embedded clustering and selecting representative patches via a scalar dot attention mechanism. We build a graph by connecting each node to its nearest neighbors in the similarity matrix and apply a graph neural network to capture both local and global context. The aggregated image embeddings are projected into the language model's input space through a linear layer and combined with caption tokens to fine-tune a large language model. We validate our method on the BreakHis and PatchGastric datasets. GNN-ViTCap achieves an F1 score of 0.934 and an AUC of 0.963 for classification, along with a BLEU-4 score of 0.811 and a METEOR score of 0.569 for captioning. Experimental results demonstrate that GNN-ViTCap outperforms state of the art approaches, offering a reliable and efficient solution for microscopy based patient diagnosis.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2507.07006

Country: North America (0.28)

Genre: Research Report > New Finding (0.66)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

AI-Generated Lecture Slides for Improving Slide Element Detection and Retrieval

Maniyar, Suyash, Trivedi, Vishvesh, Mondal, Ajoy, Mishra, Anand, Jawahar, C. V.

arXiv.org Artificial IntelligenceJul-1-2025

Lecture slide element detection and retrieval are key problems in slide understanding. Training effective models for these tasks often depends on extensive manual annotation. However, annotating large volumes of lecture slides for supervised training is labor intensive and requires domain expertise. To address this, we propose a large language model (LLM)-guided synthetic lecture slide generation pipeline, SynLec-SlideGen, which produces high-quality, coherent and realistic slides. We also create an evaluation benchmark, namely RealSlide by manually annotating 1,050 real lecture slides. To assess the utility of our synthetic slides, we perform few-shot transfer learning on real data using models pre-trained on them. Experimental results show that few-shot transfer learning with pretraining on synthetic slides significantly improves performance compared to training only on real data. This demonstrates that synthetic data can effectively compensate for limited labeled lecture slides. The code and resources of our work are publicly available on our project website: https://synslidegen.github.io/.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2506.23605

Country: Asia > India (0.28)

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

Leveraging Tumor Heterogeneity: Heterogeneous Graph Representation Learning for Cancer Survival Prediction in Whole Slide Images

Neural Information Processing SystemsMay-27-2025, 05:39:06 GMT

Survival prediction is a significant challenge in cancer management. Tumor micro-environment is a highly sophisticated ecosystem consisting of cancer cells, immune cells, endothelial cells, fibroblasts, nerves and extracellular matrix. However, current methods often neglect the fact that the contribution to prognosis differs with tissue types. In this paper, we propose ProtoSurv, a novel heterogeneous graph model for WSI survival prediction. The learning process of ProtoSurv is not only driven by data but also incorporates pathological domain knowledge, including the awareness of tissue heterogeneity, the emphasis on prior knowledge of prognostic-related tissues, and the depiction of spatial interaction across multiple tissues.

cancer survival prediction, heterogeneous graph representation learning, leveraging tumor heterogeneity, (4 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Therapeutic Area > Oncology (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.45)

Add feedback