AITopics

2511.14302

Country: Europe > Switzerland (0.28)

Genre: Research Report (0.40)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Information Technology > Security & Privacy (0.86)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.62)

Cecille, Aurélien, Duffner, Stefan, Davoine, Franck, Agier, Rémi, Neveu, Thibault

Towards Sharper Object Boundaries in Self-Supervised Depth Estimation

arXiv.org Artificial IntelligenceNov-19-2025

Monocular depth estimation is a fundamental problem in computer vision with applications in autonomous driving, robotics and augmented reality. Recently, self-supervised learning methods have achieved impressive results by using view synthesis as a supervisory signal, but despite these advances, handling depth discontinuities remains challenging. In most scenes, foreground objects occlude the background, creating depth discontinuities at object boundaries. Conventional models assign a single depth value per pixel, but edge uncertainty often causes depth values to be averaged between foreground and background depths, blurring transitions and introducing artifacts in the point cloud (see Figure 2). To address this, we propose to represent per-pixel depth as a multimodal distribution, explicitly modeling both depths at boundaries, preserving sharp transitions and removing artifacts.

artificial intelligence, inductive learning, machine learning, (14 more...)

2509.15987

Country: Europe (0.46)

Genre: Research Report (0.50)

Industry:

Transportation > Ground > Road (0.34)
Automobiles & Trucks (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision > Image Understanding (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.54)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)

arXiv.org Artificial IntelligenceNov-19-2025

Foundation Models in Medical Imaging: A Review and Outlook

van Veldhuizen, Vivien, Botha, Vanessa, Lu, Chunyao, Cesur, Melis Erdal, Lipman, Kevin Groot, de Jong, Edwin D., Horlings, Hugo, Sanchez, Clárisa I., Snoek, Cees G. M., Wessels, Lodewyk, Mann, Ritse, Marcus, Eric, Teuwen, Jonas

Foundation models (FMs) are changing the way medical images are analyzed by learning from large collections of unlabeled data. Instead of relying on manually annotated examples, FMs are pre-trained to learn general-purpose visual features that can later be adapted to specific clinical tasks with little additional supervision. In this review, we examine how FMs are being developed and applied in pathology, radiology, and ophthalmology, drawing on evidence from over 150 studies. We explain the core components of FM pipelines, including model architectures, self-supervised learning methods, and strategies for downstream adaptation. We also review how FMs are being used in each imaging domain and compare design choices across applications. Finally, we discuss key challenges and open questions to guide future research.

large language model, machine learning, natural language, (22 more...)

2506.09095

Country: Europe > Netherlands (0.27)

Genre:

Overview (1.00)
Research Report > New Finding (0.45)

Industry:

Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Yu, Junwei, Darrell, Trevor, Wang, XuDong

UnSAMv2: Self-Supervised Learning Enables Segment Anything at Any Granularity

The Segment Anything Model (SAM) family has become a widely adopted vision foundation model, but its ability to control segmentation granularity remains limited. Users often need to refine results manually - by adding more prompts or selecting from pre-generated masks - to achieve the desired level of detail. This process can be ambiguous, as the same prompt may correspond to several plausible masks, and collecting dense annotations across all granularities is prohibitively expensive, making supervised solutions infeasible. To address this limitation, we introduce UnSAMv2, which enables segment anything at any granularity without human annotations. UnSAMv2 extends the divide-and-conquer strategy of UnSAM by discovering abundant mask-granularity pairs and introducing a novel granularity control embedding that enables precise, continuous control over segmentation scale. Remarkably, with only $6$K unlabeled images and $0.02\%$ additional parameters, UnSAMv2 substantially enhances SAM-2, achieving segment anything at any granularity across interactive, whole-image, and video segmentation tasks. Evaluated on over $11$ benchmarks, UnSAMv2 improves $\text{NoC}_{90}$ (5.69 $\rightarrow$ 4.75), 1-IoU (58.0 $\rightarrow$ 73.1), and $\text{AR}_{1000}$ (49.6 $\rightarrow$ 68.3), showing that small amounts of unlabeled data with a granularity-aware self-supervised learning method can unlock the potential of vision foundation models.

artificial intelligence, machine learning, segmentation, (13 more...)

2511.13714

Country: Europe > Switzerland (0.28)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.91)

OlmoEarth: Stable Latent Image Modeling for Multimodal Earth Observation

Herzog, Henry, Bastani, Favyen, Zhang, Yawen, Tseng, Gabriel, Redmon, Joseph, Sablon, Hadrien, Park, Ryan, Morrison, Jacob, Buraczynski, Alexandra, Farley, Karen, Hansen, Joshua, Howe, Andrew, Johnson, Patrick Alan, Otterlee, Mark, Schmitt, Ted, Pitelka, Hunter, Daspit, Stephen, Ratner, Rachel, Wilhelm, Christopher, Wood, Sebastian, Jacobi, Mike, Kerner, Hannah, Shelhamer, Evan, Farhadi, Ali, Krishna, Ranjay, Beukema, Patrick

Earth observation data presents a unique challenge: it is spatial like images, sequential like video or text, and highly multimodal. We present OlmoEarth: a multimodal, spatio-temporal foundation model that employs a novel self-supervised learning formulation, masking strategy, and loss all designed for the Earth observation domain. OlmoEarth achieves state-of-the-art performance compared to 12 other foundation models across a variety of research benchmarks and real-world tasks from external partners. When evaluating embeddings OlmoEarth achieves the best performance on 15 out of 24 tasks, and with full fine-tuning it is the best on 19 of 29 tasks. We deploy OlmoEarth as the backbone of an end-to-end platform for data collection, labeling, training, and inference of Earth observation models. The OlmoEarth Platform puts frontier foundation models and powerful data management tools into the hands of non-profits and NGOs working to solve the world's biggest problems. OlmoEarth source code, training data, and pre-trained weights are available at $\href{https://github.com/allenai/olmoearth_pretrain}{\text{https://github.com/allenai/olmoearth_pretrain}}$.

artificial intelligence, machine learning, natural language, (19 more...)

2511.13655

Country:

North America > United States (1.00)
Africa (1.00)

Genre: Research Report (0.82)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Food & Agriculture > Agriculture (1.00)
Energy > Renewable > Geothermal (0.69)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(3 more...)

Quantum Machine Learning via Contrastive Training

Zhukas, Liudmila A., Zhang, Vivian Ni, Miao, Qiang, Wang, Qingfeng, Cetina, Marko, Kim, Jungsang, Carin, Lawrence, Monroe, Christopher

Quantum machine learning (QML) has attracted growing interest with the rapid parallel advances in large-scale classical machine learning and quantum technologies. Similar to classical machine learning, QML models also face challenges arising from the scarcity of labeled data, particularly as their scale and complexity increase. Here, we introduce self-supervised pretraining of quantum representations that reduces reliance on labeled data by learning invariances from unlabeled examples. We implement this paradigm on a programmable trapped-ion quantum computer, encoding images as quantum states. In situ contrastive pretraining on hardware yields a representation that, when fine-tuned, classifies image families with higher mean test accuracy and lower run-to-run variability than models trained from random initialization. Performance improvement is especially significant in regimes with limited labeled training data. We show that the learned invariances generalize beyond the pretraining image samples. Unlike prior work, our pipeline derives similarity from measured quantum overlaps and executes all training and classification stages on hardware. These results establish a label-efficient route to quantum representation learning, with direct relevance to quantum-native datasets and a clear path to larger classical inputs.

artificial intelligence, machine learning, quantum processor, (16 more...)

2511.13497

Country: North America > United States (0.68)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

MaskAnyNet: Rethinking Masked Image Regions as Valuable Information in Supervised Learning

Hong, Jingshan, Hu, Haigen, Zhang, Huihuang, Zhou, Qianwei, Li, Zhao

In supervised learning, traditional image masking faces two key issues: (i) discarded pixels are underutilized, leading to a loss of valuable contextual information; (ii) masking may remove small or critical features, especially in fine-grained tasks. In contrast, masked image modeling (MIM) has demonstrated that masked regions can be reconstructed from partial input, revealing that even incomplete data can exhibit strong contextual consistency with the original image. This highlights the potential of masked regions as sources of semantic diversity. Motivated by this, we revisit the image masking approach, proposing to treat masked content as auxiliary knowledge rather than ignored. Based on this, we propose MaskAnyNet, which combines masking with a relearning mechanism to exploit both visible and masked information. It can be easily extended to any model with an additional branch to jointly learn from the recomposed masked region. This approach leverages the semantic diversity of the masked regions to enrich features and preserve fine-grained details. Experiments on CNN and Transformer backbones show consistent gains across multiple benchmarks. Further analysis confirms that the proposed method improves semantic diversity through the reuse of masked content.

artificial intelligence, machine learning, natural language, (16 more...)

2511.1248

Country:

Asia > China (0.28)
Europe > Switzerland (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)