AITopics

Following the successful paradigm shift of large language models, leveraging pre-training on a massive corpus of data and fine-tuning on different downstream tasks, generalist models have made their foray into computer vision. The introduction of Segment Anything Model (SAM) set a milestone on segmentation of natural images, inspiring the design of a multitude of architectures for medical image segmentation. In this survey we offer a comprehensive and in-depth investigation on generalist models for medical image segmentation. We start with an introduction on the fundamentals concepts underpinning their development. Then, we provide a taxonomy on the different declinations of SAM in terms of zero-shot, few-shot, fine-tuning, adapters, on the recent SAM 2, on other innovative models trained on images alone, and others trained on both text and images. We thoroughly analyze their performances at the level of both primary research and best-in-literature, followed by a rigorous comparison with the state-of-the-art task-specific models. We emphasize the need to address challenges in terms of compliance with regulatory frameworks, privacy and security laws, budget, and trustworthy artificial intelligence (AI). Finally, we share our perspective on future directions concerning synthetic data, early fusion, lessons learnt from generalist models in natural language processing, agentic AI and physical AI, and clinical translation.

large language model, machine learning, organ segmentation and tumor detection, (21 more...)

doi: 10.1016/j.inffus.2025.103709

2506.10825

Country:

Genre:

Overview (1.00)
Research Report > Experimental Study (0.67)
Research Report > New Finding (0.45)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)
(2 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)

Faster Certified Symmetry Breaking Using Orders With Auxiliary Variables

Anders, Markus, Bogaerts, Bart, Bogø, Benjamin, Gontier, Arthur, Koops, Wietze, McCreesh, Ciaran, Myreen, Magnus O., Nordström, Jakob, Oertel, Andy, Rebola-Pardo, Adrian, Tan, Yong Kiam

Symmetry breaking is a crucial technique in modern combinatorial solving, but it is difficult to be sure it is implemented correctly. The most successful approach to deal with bugs is to make solvers certifying, so that they output not just a solution, but also a mathematical proof of correctness in a standard format, which can then be checked by a formally verified checker. This requires justifying symmetry reasoning within the proof, but developing efficient methods for this has remained a long-standing open challenge. A fully general approach was recently proposed by Bogaerts et al. (2023), but it relies on encoding lexicographic orders with big integers, which quickly becomes infeasible for large symmetries. In this work, we develop a method for instead encoding orders with auxiliary variables. We show that this leads to orders-of-magnitude speed-ups in both theory and practice by running experiments on proof logging and checking for SAT symmetry breaking using the state-of-the-art satsuma symmetry breaker and the VeriPB proof checking toolchain.

artificial intelligence, constraint, logic & formal reasoning, (20 more...)

2511.16637

Country:

Europe > Austria > Vienna (0.14)
Europe > Denmark > Capital Region > Copenhagen (0.04)
Europe > Sweden > Vaestra Goetaland > Gothenburg (0.04)
(7 more...)

Genre: Research Report (0.49)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.67)

Mukherjee, Amartya, Liu, Jun

Almost Sure Convergence Analysis of Differentially Private Stochastic Gradient Methods

Abstract-- Differentially private stochastic gradient descent (DP-SGD) has become the standard algorithm for training machine learning models with rigorous privacy guarantees. Despite its widespread use, the theoretical understanding of its long-run behavior remains limited: existing analyses typically establish convergence in expectation or with high probability, but do not address the almost sure convergence of single trajectories. In this work, we prove that DP-SGD converges almost surely under standard smoothness assumptions, both in nonconvex and strongly convex settings, provided the step sizes satisfy some standard decaying conditions. Our analysis extends to momentum variants such as the stochastic heavy ball (DP-SHB) and Nesterov's accelerated gradient (DP-NAG), where we show that careful energy constructions yield similar guarantees. These results provide stronger theoretical foundations for differentially private optimization and suggest that, despite privacy-induced distortions, the algorithm remains pathwise stable in both convex and nonconvex regimes.

artificial intelligence, gradient, machine learning, (15 more...)

2511.16587

Country:

North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Integrating Symbolic Natural Language Understanding and Language Models for Word Sense Disambiguation

Zhao, Kexin, Forbus, Ken

Word sense disambiguation is a fundamental challenge in natural language understanding. Current methods are primarily aimed at coarse-grained representations (e.g. WordNet synsets or FrameNet frames) and require hand-annotated training data to construct. This makes it difficult to automatically disambiguate richer representations (e.g. built on OpenCyc) that are needed for sophisticated inference. We propose a method that uses statistical language models as oracles for disambiguation that does not require any hand-annotation of training data. Instead, the multiple candidate meanings generated by a symbolic NLU system are converted into distinguishable natural language alternatives, which are used to query an LLM to select appropriate interpretations given the linguistic context. The selected meanings are propagated back to the symbolic NLU system. We evaluate our method against human-annotated gold answers to demonstrate its effectiveness.

artificial intelligence, natural language, text processing, (18 more...)

2511.16577

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > Illinois > Cook County > Evanston (0.04)
(4 more...)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Bridging VLMs and Embodied Intelligence with Deliberate Practice Policy Optimization

Zhang, Yi, Liu, Che, Ren, Xiancong, Ni, Hanchu, Zhang, Yingji, Zhang, Shuai, Ding, Zeyuan, Hu, Jiayu, Shan, Haozhe, Qi, Junbo, Bai, Yan, Li, Dengjie, Luo, Jiachen, Wang, Yidong, Dai, Yong, Xu, Zenglin, Shen, Bin, Wang, Qifan, Tang, Jian, Ju, Xiaozhu

Developing a universal and versatile embodied intelligence system presents two primary challenges: the critical embodied data bottleneck, where real-world data is scarce and expensive, and the algorithmic inefficiency of existing methods, which are resource-prohibitive. To address these limitations, we introduce Deliberate Practice Policy Optimization (DPPO), a metacognitive ``Metaloop'' training framework that dynamically alternates between supervised fine-tuning (competence expansion) and reinforcement learning (skill refinement). This enables automatic weakness identification and targeted resource allocation, specifically designed to maximize learning efficiency from sparse, finite data. Theoretically, DPPO can be formalised as a unified preference-learning framework. Empirically, training a vision-language embodied model with DPPO, referred to as Pelican-VL 1.0, yields a 20.3% performance improvement over the base model and surpasses open-source models at the 100B-parameter scale by 10.6%. We are open-sourcing both the models and code, providing the first systematic framework that alleviates the data and resource bottleneck and enables the community to build versatile embodied agents efficiently.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

2511.16602

Country: North America > Montserrat (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Education > Focused Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Mukherjee, Sayak, Chatterjee, Samrat, Purvine, Emilie, Fujimoto, Ted, Emerson, Tegan

Large Language Model-Based Reward Design for Deep Reinforcement Learning-Driven Autonomous Cyber Defense

Designing rewards for autonomous cyber attack and defense learning agents in a complex, dynamic environment is a challenging task for subject matter experts. We propose a large language model (LLM)-based reward design approach to generate autonomous cyber defense policies in a deep reinforcement learning (DRL)-driven experimental simulation environment. Multiple attack and defense agent personas were crafted, reflecting heterogeneity in agent actions, to generate LLM-guided reward designs where the LLM was first provided with contextual cyber simulation environment information. These reward structures were then utilized within a DRL-driven attack-defense simulation environment to learn an ensemble of cyber defense policies. Our results suggest that LLM-guided reward designs can lead to effective defense strategies against diverse adversarial behaviors.

large language model, machine learning, reinforcement learning, (20 more...)

2511.16483

Country:

Europe > Austria (0.04)
North America > United States > Washington > Benton County > Richland (0.04)

Genre: Research Report > New Finding (0.86)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.89)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Thinking-while-Generating: Interleaving Textual Reasoning throughout Visual Generation

Guo, Ziyu, Zhang, Renrui, Li, Hongyu, Zhang, Manyuan, Chen, Xinyan, Wang, Sifan, Feng, Yan, Pei, Peng, Heng, Pheng-Ann

Recent advances in visual generation have increasingly explored the integration of reasoning capabilities. They incorporate textual reasoning, i.e., think, either before (as pre-planning) or after (as post-refinement) the generation process, yet they lack on-the-fly multimodal interaction during the generation itself. In this preliminary study, we introduce Thinking-while-Generating (TwiG), the first interleaved framework that enables co-evolving textual reasoning throughout the visual generation process. As visual content is progressively generating, textual reasoning is interleaved to both guide upcoming local regions and reflect on previously synthesized ones. This dynamic interplay produces more context-aware and semantically rich visual outputs. To unveil the potential of this framework, we investigate three candidate strategies, zero-shot prompting, supervised fine-tuning (SFT) on our curated TwiG-50K dataset, and reinforcement learning (RL) via a customized TwiG-GRPO strategy, each offering unique insights into the dynamics of interleaved reasoning. We hope this work inspires further research into interleaving textual reasoning for enhanced visual generation. Code will be released at: https://github.com/ZiyuGuo99/Thinking-while-Generating.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

2511.16671

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Cazenavette, George, Torralba, Antonio, Sitzmann, Vincent

Dataset Distillation for Pre-Trained Self-Supervised Vision Models

The task of dataset distillation aims to find a small set of synthetic images such that training a model on them reproduces the performance of the same model trained on a much larger dataset of real samples. Existing distillation methods focus on synthesizing datasets that enable training randomly initialized models. In contrast, state-of-the-art vision approaches are increasingly building on large, pre-trained self-supervised models rather than training from scratch. In this paper, we investigate the problem of distilling datasets that enable us to optimally train linear probes on top of such large, pre-trained vision models. We introduce a method of dataset distillation for this task called Linear Gradient Matching that optimizes the synthetic images such that, when passed through a pre-trained feature extractor, they induce gradients in the linear classifier similar to those produced by the real data. Our method yields synthetic data that outperform all real-image baselines and, remarkably, generalize across pre-trained vision models, enabling us, for instance, to train a linear CLIP probe that performs competitively using a dataset distilled via a DINO backbone. Further, we show that our distilled datasets are exceptionally effective for fine-grained classification and provide a valuable tool for model interpretability, predicting, among other things, how similar two models' embedding spaces are under the platonic representation hypothesis or whether a model is sensitive to spurious correlations in adversarial datasets.

artificial intelligence, dataset, machine learning, (14 more...)

2511.16674

Country:

North America > Canada > Newfoundland and Labrador > Labrador (0.04)
North America > United States > Massachusetts (0.04)
North America > United States > California (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Taghibakhshi, Ali, Sreenivas, Sharath Turuvekere, Muralidharan, Saurav, Cai, Ruisi, Chochowski, Marcin, Mahabaleshwarkar, Ameya Sunil, Suhara, Yoshi, Olabiyi, Oluwatobi, Korzekwa, Daniel, Patwary, Mostofa, Shoeybi, Mohammad, Kautz, Jan, Catanzaro, Bryan, Aithal, Ashwath, Tajbakhsh, Nima, Molchanov, Pavlo

Nemotron Elastic: Towards Efficient Many-in-One Reasoning LLMs

Training a family of large language models targeting multiple scales and deployment objectives is prohibitively expensive, requiring separate training runs for each different size. Recent work on model compression through pruning and knowledge distillation has reduced this cost; however, this process still incurs hundreds of billions of tokens worth of training cost per compressed model. In this paper, we present Nemotron Elastic, a framework for building reasoning-oriented LLMs, including hybrid Mamba-Attention architectures, that embed multiple nested submodels within a single parent model, each optimized for different deployment configurations and budgets. Each of these submodels shares weights with the parent model and can be extracted zero-shot during deployment without additional training or fine-tuning. We enable this functionality through an end-to-end trained router, tightly coupled to a two-stage training curriculum designed specifically for reasoning models. We additionally introduce group-aware SSM elastification that preserves Mamba's structural constraints, heterogeneous MLP elastification, normalized MSE-based layer importance for improved depth selection, and knowledge distillation enabling simultaneous multi-budget optimization. We apply Nemotron Elastic to the Nemotron Nano V2 12B model, simultaneously producing a 9B and a 6B model using only 110B training tokens; this results in over 360x cost reduction compared to training model families from scratch, and around 7x compared to SoTA compression techniques. Each of the nested models performs on par or better than the SoTA in accuracy. Moreover, unlike other compression methods, the nested capability of our approach allows having a many-in-one reasoning model that has constant deployment memory against the number of models in the family.

artificial intelligence, large language model, natural language, (16 more...)

2511.16664

Country: Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre:

Research Report (0.50)
Instructional Material (0.34)

Industry: Education (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

King, Juan C., Amigo, Jose M.

Enhancing Forex Forecasting Accuracy: The Impact of Hybrid Variable Sets in Cognitive Algorithmic Trading Systems

The question whether algorithmic trading systems (ATS) can improve human trading in terms of effectiveness is eliciting an increasingly relevant debate among traders and investors, as well as quantitative studies that address this issue through numerical testing [[9]]. In recent years, the discussion regarding whether algorithmic trading systems (ATS) can surpass human traders in terms of efficiency, consistency, and adaptability has gained significant traction in both academic and professional circles. Empirical evidence indicates that algorithmic strategies tend to exhibit superior performance in volatile or declining markets, whereas human-managed funds may retain a relative advantage during upward market trends due to behavioral and intuitive factors [[2]]. Moreover, large-scale behavioral studies reveal that algorithms largely eliminate well-known cognitive biases such as the disposition effect that continue to affect human traders [[23]]. Complementary research has also emphasized the growing integration of artificial intelligence and machine learning methods in modern ATS, which enhances predictive accuracy and execution speed [[7]]. Nonetheless, experimental findings suggest that algorithmic trading may still be constrained by design limitations, challenging the notion of its absolute superiority over human decision-making [[16]]. These findings collectively indicate that algorithmic and human trading approaches might be best viewed as complementary, each offering unique strengths under different market conditions.

artificial intelligence, deep learning, machine learning, (18 more...)

2511.16657

Country:

North America > United States > New York (0.04)
Europe > Spain (0.04)
Asia > Singapore (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.66)