AITopics | generalist

Collaborating Authors

generalist

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

From Generalist to Specialist Representation

Zheng, Yujia, Feng, Fan, Li, Yuke, Xie, Shaoan, Murphy, Kevin, Zhang, Kun

arXiv.org Machine LearningMay-14-2026

Given a generalist model, learning a task-relevant specialist representation is fundamental for downstream applications. Identifiability, the asymptotic guarantee of recovering the ground-truth representation, is critical because it sets the ultimate limit of any model, even with infinite data and computation. We study this problem in a completely nonparametric setting, without relying on interventions, parametric forms, or structural constraints. We first prove that the structure between time steps and tasks is identifiable in a fully unsupervised manner, even when sequences lack strict temporal dependence and may exhibit disconnections, and task assignments can follow arbitrarily complex and interleaving structures. We then prove that, within each time step, the task-relevant latent representation can be disentangled from the irrelevant part under a simple sparsity regularization, without any additional information or parametric constraints. Together, these results establish a hierarchical foundation: task structure is identifiable across time steps, and task-relevant latent representations are identifiable within each step. To our knowledge, each result provides a first general nonparametric identifiability guarantee, and together they mark a step toward provably moving from generalist to specialist models.

artificial intelligence, machine learning, representation, (15 more...)

arXiv.org Machine Learning

2605.12733

Country: Asia (0.28)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Self-Improving Vision-Language-Action Models with Data Generation via Residual RL

Xiao, Wenli, Lin, Haotian, Peng, Andy, Xue, Haoru, He, Tairan, Xie, Yuqi, Hu, Fengyuan, Wu, Jimmy, Luo, Zhengyi, Fan, Linxi "Jim", Shi, Guanya, Zhu, Yuke

arXiv.org Artificial IntelligenceNov-4-2025

Supervised fine-tuning (SFT) has become the de facto post-training strategy for large vision-language-action (VLA) models, but its reliance on costly human demonstrations limits scalability and generalization. We propose Probe, Learn, Distill (PLD), a three-stage plug-and-play framework that improves VLAs through residual reinforcement learning (RL) and distribution-aware data collection. In Stage 1, we train lightweight residual actors to probe failure regions of the VLA generalist. In Stage 2, we use a hybrid rollout scheme that aligns collected trajectories with the generalist's deployment distribution while capturing recovery behaviors. In Stage 3, we distill the curated trajectories back into the generalist with standard SFT. PLD achieves near-saturated 99% task success on LIBERO, over 50% gains in SimplerEnv, and 100% success on real-world Franka and YAM arm manipulation tasks. Ablations show that residual probing and distribution-aware replay are key to collecting deployment-aligned data that improves both seen and unseen tasks, offering a scalable path toward self-improving VLA models.

arxiv preprint arxiv, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2511.00091

Genre: Research Report (1.00)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Modeling the Economic Impacts of AI Openness Regulation

Qiu, Tori, Laufer, Benjamin, Kleinberg, Jon, Heidari, Hoda

arXiv.org Artificial IntelligenceOct-27-2025

Regulatory frameworks, such as the EU AI Act, encourage openness of general-purpose AI models by offering legal exemptions for "open-source" models. Despite this legislative attention on openness, the definition of open-source foundation models remains ambiguous. This paper models the strategic interactions among the creator of a general-purpose model (the generalist) and the entity that fine-tunes the general-purpose model to a specialized domain or task (the specialist), in response to regulatory requirements on model openness. We present a stylized model of the regulator's choice of an open-source definition to evaluate which AI openness standards will establish appropriate economic incentives for developers. Our results characterize market equilibria -- specifically, upstream model release decisions and downstream fine-tuning efforts -- under various openness regulations and present a range of effective regulatory penalties and open-source thresholds. Overall, we find the model's baseline performance determines when increasing the regulatory penalty vs. the open-source threshold will significantly alter the generalist's release strategy. Our model provides a theoretical foundation for AI governance decisions around openness and enables evaluation and refinement of practical open-source policies.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2507.14193

Country: Europe (0.28)

Genre:

Research Report > Experimental Study (0.67)
Research Report > New Finding (0.66)

Industry:

Government (1.00)
Law > Statutes (0.68)

Technology:

Information Technology > Software (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(2 more...)

Add feedback

MoTVLA: A Vision-Language-Action Model with Unified Fast-Slow Reasoning

Huang, Wenhui, Chen, Changhe, Qi, Han, Lv, Chen, Du, Yilun, Yang, Heng

arXiv.org Artificial IntelligenceOct-24-2025

Integrating visual-language instructions into visuomotor policies is gaining momentum in robot learning for enhancing open-world generalization. Despite promising advances, existing approaches face two challenges: limited language steerability when no generated reasoning is used as a condition, or significant inference latency when reasoning is incorporated. In this work, we introduce MoTVLA, a mixture-of-transformers (MoT)-based vision-language-action (VLA) model that integrates fast-slow unified reasoning with behavior policy learning. MoTVLA preserves the general intelligence of pre-trained VLMs (serving as the generalist) for tasks such as perception, scene understanding, and semantic planning, while incorporating a domain expert, a second transformer that shares knowledge with the pretrained VLM, to generate domain-specific fast reasoning (e.g., robot motion decomposition), thereby improving policy execution efficiency. By conditioning the action expert on decomposed motion instructions, MoTVLA can learn diverse behaviors and substantially improve language steerability. Extensive evaluations across natural language processing benchmarks, robotic simulation environments, and real-world experiments confirm the superiority of MoTVLA in both fast-slow reasoning and manipulation task performance.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2510.18337

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.68)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)

Add feedback

A Comprehensive Evaluation of Graph Neural Networks and Physics Informed Learning for Surrogate Modelling of Finite Element Analysis

Singh, Nayan Kumar

arXiv.org Artificial IntelligenceOct-20-2025

Although Finite Element Analysis (FEA) is an integral part of the product design lifecycle, the analysis is computationally expensive, making it unsuitable for many design optimization problems. The deep learning models can be a great solution. However, selecting the architecture that emulates the FEA with great accuracy is a challenge. This paper presents a comprehensive evaluation of graph neural networks (GNNs) and 3D U-Nets as surrogates for FEA of parametric I-beams. We introduce a Physics-Informed Neural Network (PINN) framework, governed by the Navier Cauchy equations, to enforce physical laws. Crucially, we demonstrate that a curriculum learning strategy, pretraining on data followed by physics informed fine tuning, is essential for stabilizing training. Our results show that GNNs fundamentally outperform the U-Net. Even the worst performer among GNNs, the GCN framework, achieved a relative L2 error of 8.7% while the best framework among U Net, U Net with attention mechanism trained on high resolution data, achieved 13.0% score. Among the graph-based architectures, the Message Passing Neural Networks (MPNN) and Graph Transformers achieved the highest accuracy, achieving a relative L2 score of 3.5% and 2.6% respectively. The inclusion of physics fundamental laws (PINN) significantly improved the generalization, reducing error by up to 11.3% on high-signal tasks. While the Graph Transformer is the most accurate model, it is more 37.5% slower during inference when compared to second best model, MPNN PINN. The PINN enhanced MPNN (MPNN PINN) provides the most practical solution. It offers a good compromise between predictive performance, model size, and inference speed.

artificial intelligence, machine learning, neural network, (18 more...)

arXiv.org Artificial Intelligence

2510.1575

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Expert-guided Clinical Text Augmentation via Query-Based Model Collaboration

Cho, Dongkyu, Zhang, Miao, Chunara, Rumi

arXiv.org Artificial IntelligenceSep-29-2025

Data augmentation is a widely used strategy to improve model robustness and generalization by enriching training datasets with synthetic examples. While large language models (LLMs) have demonstrated strong generative capabilities for this purpose, their applications in high-stakes domains like healthcare present unique challenges due to the risk of generating clinically incorrect or misleading information. In this work, we propose a novel query-based model collaboration framework that integrates expert-level domain knowledge to guide the augmentation process to preserve critical medical information. Experiments on clinical prediction tasks demonstrate that our lightweight collaboration-based approach consistently outperforms existing LLM augmentation methods while improving safety through reduced factual errors. This framework addresses the gap between LLM augmentation potential and the safety requirements of specialized domains.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2509.2153

Country: North America > Mexico (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine > Health Care Providers & Services (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

CLARIFY: A Specialist-Generalist Framework for Accurate and Lightweight Dermatological Visual Question Answering

Saha, Aranya, Khan, Tanvir Ahmed, Swapnil, Ismam Nur, Haque, Mohammad Ariful

arXiv.org Artificial IntelligenceAug-27-2025

--Vision-language models (VLMs) have shown significant potential for medical tasks; however, their general-purpose nature can limit specialized diagnostic accuracy, and their large size poses substantial inference costs for real-world clinical deployment. T o address these challenges, we introduce CLARIFY, a Specialist-Generalist framework for dermatological visual question answering (VQA). CLARIFY combines two components: (i) a lightweight, domain-trained image classifier (the Specialist) that provides fast and highly accurate diagnostic predictions, and (ii) a powerful yet compressed conversational VLM (the Generalist) that generates natural language explanations to user query. This synergy is further enhanced by a knowledge graph-based retrieval module, which grounds the Generalist's responses in factual dermatological knowledge, ensuring both accuracy and reliability. This hierarchical design not only reduces diagnostic errors but also significantly improves computational efficiency. Experiments on our curated multimodal dermatology dataset demonstrate that CLARIFY achieves an 18% improvement in diagnostic accuracy over the strongest baseline--a fine-tuned, uncompressed single-line VLM--while reducing the average VRAM requirement and latency by at least 20% and 5% respectively. These results indicate that a Specialist-Generalist system provides a practical and powerful paradigm for building lightweight, trustworthy, and clinically viable AI systems. ISION language models (VLMs) like LLaV A [1] and Qwen-VL [2] have demonstrated a remarkable ability to interpret and reason about joint visual and textual data [3]. Their potential in medicine is vast, with promising applications in tasks ranging from radiological report generation to comprehensive clinical decision support [4], [5]. However, translating this potential into reliable clinical tools faces some critical hurdles.

large language model, machine learning, question answering, (20 more...)

arXiv.org Artificial Intelligence

2508.1843

Country:

North America (0.28)
Asia > Bangladesh (0.15)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Dermatology (1.00)
Health & Medicine > Diagnostic Medicine (0.90)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.84)

Add feedback

Neural network task specialization via domain constraining

Malashin, Roman, Ilyukhin, Daniil

arXiv.org Artificial IntelligenceApr-29-2025

This paper introduces a concept of neural network specialization via task-specific domain constraining, aimed at enhancing network performance on data subspace in which the network operates. The study presents experiments on training specialists for image classification and object detection tasks. The results demonstrate that specialization can enhance a generalist's accuracy even without additional data or changing training regimes: solely by constraining class label space in which the network performs. Theoretical and experimental analyses indicate that effective specialization requires modifying traditional fine-tuning methods and constraining data space to semantically coherent subsets. The specialist extraction phase before tuning the network is proposed for maximal performance gains. We also provide analysis of the evolution of the feature space during specialization. This study paves way to future research for developing more advanced dynamically configurable image analysis systems, where computations depend on the specific input. Additionally, the proposed methods can help improve system performance in scenarios where certain data domains should be excluded from consideration of the generalist network.

artificial intelligence, machine learning, specialization, (15 more...)

arXiv.org Artificial Intelligence

2504.19592

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

The Backfiring Effect of Weak AI Safety Regulation

Laufer, Benjamin, Kleinberg, Jon, Heidari, Hoda

arXiv.org Artificial IntelligenceMar-26-2025

Recent policy proposals aim to improve the safety of general-purpose AI, but there is little understanding of the efficacy of different regulatory approaches to AI safety. We present a strategic model that explores the interactions between the regulator, the general-purpose AI technology creators, and domain specialists--those who adapt the AI for specific applications. Our analysis examines how different regulatory measures, targeting different parts of the development chain, affect the outcome of the development process. In particular, we assume AI technology is described by two key attributes: safety and performance. The regulator first sets a minimum safety standard that applies to one or both players, with strict penalties for non-compliance. The general-purpose creator then develops the technology, establishing its initial safety and performance levels. Next, domain specialists refine the AI for their specific use cases, and the resulting revenue is distributed between the specialist and generalist through an ex-ante bargaining process. Our analysis of this game reveals two key insights: First, weak safety regulation imposed only on the domain specialists can backfire. While it might seem logical to regulate use cases (as opposed to the general-purpose technology), our analysis shows that weak regulations targeting domain specialists alone can unintentionally reduce safety. This effect persists across a wide range of settings. Second, in sharp contrast to the previous finding, we observe that stronger, well-placed regulation can in fact benefit all players subjected to it. When regulators impose appropriate safety standards on both AI creators and domain specialists, the regulation functions as a commitment mechanism, leading to safety and performance gains, surpassing what is achieved under no regulation or regulating one player only.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2503.20848

Country:

North America > United States > California (0.04)
North America > United States > New York > New York County > New York City (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Law (1.00)
Government (1.00)
Leisure & Entertainment > Games (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
Information Technology > Artificial Intelligence > Natural Language (0.67)

Add feedback

M3TR: Generalist HD Map Construction with Variable Map Priors

Immel, Fabian, Fehler, Richard, Bieder, Frank, Pauls, Jan-Hendrik, Stiller, Christoph

arXiv.org Artificial IntelligenceDec-10-2024

Autonomous vehicles require road information for their operation, usually in form of HD maps. Since offline maps eventually become outdated or may only be partially available, online HD map construction methods have been proposed to infer map information from live sensor data. A key issue remains how to exploit such partial or outdated map information as a prior. We introduce M3TR (Multi-Masking Map Transformer), a generalist approach for HD map construction both with and without map priors. We address shortcomings in ground truth generation for Argoverse 2 and nuScenes and propose the first realistic scenarios with semantically diverse map priors. Examining various query designs, we use an improved method for integrating prior map elements into a HD map construction model, increasing performance by +4.3 mAP. Finally, we show that training across all prior scenarios yields a single Generalist model, whose performance is on par with previous Expert models that can handle only one specific type of map prior. M3TR thus is the first model capable of leveraging variable map priors, making it suitable for real-world deployment. Code is available at https://github.com/immel-f/m3tr

argoverse 2, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2411.10316

Country:

Europe > Switzerland (0.04)
North America > United States > Hawaii (0.04)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.66)

Add feedback