AITopics | documentation

Collaborating Authors

documentation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

WONDERBREAD: A Benchmark for Evaluating Multimodal Foundation Models on Business Process Management Tasks

Neural Information Processing SystemsDec-27-2025, 08:29:36 GMT

Existing ML benchmarks lack the depth and diversity of annotations needed for evaluating models on business process management (BPM) tasks. BPM is the practice of documenting, measuring, improving, and automating enterprise workflows. However, research has focused almost exclusively on one task -- full end-to-end automation using agents based on multimodal foundation models (FMs) like GPT-4. This focus on automation ignores the reality of how most BPM tools are applied today -- simply documenting the relevant workflow takes 60% of the time of the typical process optimization project. To address this gap we present WONDERBREAD, the first benchmark for evaluating multimodal FMs on BPM tasks beyond automation. Our contributions are: (1) a dataset containing 2928 documented workflow demonstrations; (2) 6 novel BPM tasks sourced from real-world applications ranging from workflow documentation to knowledge transfer to process improvement; and (3) an automated evaluation harness. Our benchmark shows that while state-of-the-art FMs can automatically generate documentation (e.g.

artificial intelligence, machine learning, proceedings, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.39)

Add feedback

The State of Data Curation at NeurIPS: An Assessment of Dataset Development Practices in the Datasets and Benchmarks Track

Neural Information Processing SystemsDec-26-2025, 04:58:06 GMT

Data curation is a field with origins in librarianship and archives, whose scholarship and thinking on data issues go back centuries, if not millennia. The field of machine learning is increasingly observing the importance of data curation to the advancement of both applications and fundamental understanding of machine learning models -- evidenced not least by the creation of the Datasets and Benchmarks track itself. This work provides an analysis of recent dataset development practices at NeurIPS through the lens of data curation. We present an evaluation framework for dataset documentation, consisting of a rubric and toolkit developed through a thorough literature review of data curation principles. We use the framework to systematically assess the strengths and weaknesses in current dataset development practices of 60 datasets published in the NeurIPS Datasets and Benchmarks track from 2021-2023.

artificial intelligence, machine learning, proceedings, (8 more...)

Neural Information Processing Systems

Genre: Overview (0.38)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.97)

Add feedback

Institutional AI Sovereignty Through Gateway Architecture: Implementation Report from Fontys ICT

Huijts, Ruud, Suilen, Koen

arXiv.org Artificial IntelligenceDec-11-2025

To counter fragmented, high-risk adoption of commercial AI tools, we built and ran an institutional AI platform in a six-month, 300-user pilot, showing that a university of applied sciences can offer advanced AI with fair access, transparent risks, controlled costs, and alignment with European law. Commercial AI subscriptions create unequal access and compliance risks through opaque processing and non-EU hosting, yet banning them is neither realistic nor useful. Institutions need a way to provide powerful AI in a sovereign, accountable form. Our solution is a governed gateway platform with three layers: a ChatGPT-style frontend linked to institutional identity that makes model choice explicit; a gateway core enforcing policy, controlling access and budgets, and routing traffic to EU infrastructure by default; and a provider layer wrapping commercial and open-source models in institutional model cards that consolidate vendor documentation into one governance interface. The pilot ran reliably with no privacy incidents and strong adoption, enabling EU-default routing, managed spending, and transparent model choices. Only the gateway pattern combines model diversity and rapid innovation with institutional control. The central insight: AI is not a support function but strategy, demanding dedicated leadership. Sustainable operation requires governance beyond traditional boundaries. We recommend establishing a formal AI Officer role combining technical literacy, governance authority, and educational responsibility. Without it, AI decisions stay ad-hoc and institutional exposure grows. With it, higher-education institutions can realistically operate their own multi-provider AI platform, provided they govern AI as seriously as they teach it.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2512.08978

Country:

Europe > Netherlands (0.04)
Europe > Germany (0.04)
North America > United States > Virginia (0.04)
(5 more...)

Genre:

Research Report (0.64)
Instructional Material > Course Syllabus & Notes (0.46)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Energy (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Auto-BenchmarkCard: Automated Synthesis of Benchmark Documentation

Hofmann, Aris, Vejsbjerg, Inge, Salwala, Dhaval, Daly, Elizabeth M.

arXiv.org Artificial IntelligenceDec-11-2025

We present Auto-BenchmarkCard, a workflow for generating validated descriptions of AI benchmarks. Benchmark documentation is often incomplete or inconsistent, making it difficult to interpret and compare benchmarks across tasks or domains. Auto-BenchmarkCard addresses this gap by combining multi-agent data extraction from heterogeneous sources (e.g., Hugging Face, Unitxt, academic papers) with LLM-driven synthesis. A validation phase evaluates factual accuracy through atomic entailment scoring using the FactReasoner tool. This workflow has the potential to promote transparency, comparability, and reusability in AI benchmark reporting, enabling researchers and practitioners to better navigate and evaluate benchmark choices.

artificial intelligence, large language model, natural language, (14 more...)

arXiv.org Artificial Intelligence

2512.09577

Country:

Europe > Ireland > Leinster > County Dublin > Dublin (0.05)
Europe > Germany (0.05)

Genre: Workflow (0.73)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.50)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.39)

Add feedback

Systematic Framework of Application Methods for Large Language Models in Language Sciences

Sun, Kun, Wang, Rong

arXiv.org Artificial IntelligenceDec-11-2025

Large Language Models (LLMs) are transforming language sciences. However, their widespread deployment currently suffers from methodological fragmentation and a lack of systematic soundness. This study proposes two comprehensive methodological frameworks designed to guide the strategic and responsible application of LLMs in language sciences. The first method-selection framework defines and systematizes three distinct, complementary approaches, each linked to a specific research goal: (1) prompt-based interaction with general-use models for exploratory analysis and hypothesis generation; (2) fine-tuning of open-source models for confirmatory, theory-driven investigation and high-quality data generation; and (3) extraction of contextualized embeddings for further quantitative analysis and probing of model internal mechanisms. We detail the technical implementation and inherent trade-offs of each method, supported by empirical case studies. Based on the method-selection framework, the second systematic framework proposed provides constructed configurations that guide the practical implementation of multi-stage research pipelines based on these approaches. We then conducted a series of empirical experiments to validate our proposed framework, employing retrospective analysis, prospective application, and an expert evaluation survey. By enforcing the strategic alignment of research questions with the appropriate LLM methodology, the frameworks enable a critical paradigm shift in language science research. We believe that this system is fundamental for ensuring reproducibility, facilitating the critical evaluation of LLM mechanisms, and providing the structure necessary to move traditional linguistics from ad-hoc utility to verifiable, robust science.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2512.09552

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
Europe > Bulgaria (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(4 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study > Negative Result (0.35)

Industry:

Health & Medicine > Therapeutic Area (0.46)
Information Technology > Security & Privacy (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Capability-Driven Skill Generation with LLMs: A RAG-Based Approach for Reusing Existing Libraries and Interfaces

da Silva, Luis Miguel Vieira, Köcher, Aljosha, König, Nicolas, Gehlhoff, Felix, Fay, Alexander

arXiv.org Artificial IntelligenceDec-10-2025

Modern automation systems increasingly rely on modular architectures, with capabilities and skills as one solution approach. Capabilities define the functions of resources in a machine-readable form and skills provide the concrete implementations that realize those capabilities. However, the development of a skill implementation conforming to a corresponding capability remains a time-consuming and challenging task. In this paper, we present a method that treats capabilities as contracts for skill implementations and leverages large language models to generate executable code based on natural language user input. A key feature of our approach is the integration of existing software libraries and interface technologies, enabling the generation of skill implementations across different target languages. We introduce a framework that allows users to incorporate their own libraries and resource interfaces into the code generation process through a retrieval-augmented generation architecture. The proposed method is evaluated using an autonomous mobile robot controlled via Python and ROS 2, demonstrating the feasibility and flexibility of the approach.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ETFA65518.2025.11205724

2505.03295

Country: Europe > Germany > Hamburg (0.04)

Genre:

Research Report (1.00)
Workflow (0.95)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

An Empirical Framework for Evaluating Semantic Preservation Using Hugging Face

Jia, Nan, Raja, Anita, Khatchadourian, Raffi

arXiv.org Artificial IntelligenceDec-10-2025

As machine learning (ML) becomes an integral part of high-autonomy systems, it is critical to ensure the trustworthiness of learning-enabled software systems (LESS). Yet, the nondeterministic and run-time-defined semantics of ML complicate traditional software refactoring. We define semantic preservation in LESS as the property that optimizations of intelligent components do not alter the system's overall functional behavior. This paper introduces an empirical framework to evaluate semantic preservation in LESS by mining model evolution data from HuggingFace. We extract commit histories, $\textit{Model Cards}$, and performance metrics from a large number of models. To establish baselines, we conducted case studies in three domains, tracing performance changes across versions. Our analysis demonstrates how $\textit{semantic drift}$ can be detected via evaluation metrics across commits and reveals common refactoring patterns based on commit message analysis. Although API constraints limited the possibility of estimating a full-scale threshold, our pipeline offers a foundation for defining community-accepted boundaries for semantic preservation. Our contributions include: (1) a large-scale dataset of ML model evolution, curated from 1.7 million Hugging Face entries via a reproducible pipeline using the native HF hub API, (2) a practical pipeline for the evaluation of semantic preservation for a subset of 536 models and 4000+ metrics and (3) empirical case studies illustrating semantic drift in practice. Together, these contributions advance the foundations for more maintainable and trustworthy ML systems.

artificial intelligence, machine learning, semantic preservation, (18 more...)

arXiv.org Artificial Intelligence

2512.07983

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The SMART+ Framework for AI Systems

Kandikatla, Laxmiraju, Radeljic, Branislav

arXiv.org Artificial IntelligenceDec-10-2025

Artificial Intelligence (AI) systems are now an integral part of multiple industries. In clinical research, AI supports automated adverse event detection in clinical trials, patient eligibility screening for protocol enrollment, and data quality validation. Beyond healthcare, AI is transforming finance through real-time fraud detection, automated loan risk assessment, and algorithmic decision-making. Similarly, in manufacturing, AI enables predictive maintenance to reduce equipment downtime, enhances quality control through computer-vision inspection, and optimizes production workflows using real-time operational data. While these technologies enhance operational efficiency, they introduce new challenges regarding safety, accountability, and regulatory compliance. To address these concerns, we introduce the SMART+ Framework - a structured model built on the pillars of Safety, Monitoring, Accountability, Reliability, and Transparency, and further enhanced with Privacy & Security, Data Governance, Fairness & Bias, and Guardrails. SMART+ offers a practical, comprehensive approach to evaluating and governing AI systems across industries. This framework aligns with evolving mechanisms and regulatory guidance to integrate operational safeguards, oversight procedures, and strengthened privacy and governance controls. SMART+ demonstrates risk mitigation, trust-building, and compliance readiness. By enabling responsible AI adoption and ensuring auditability, SMART+ provides a robust foundation for effective AI governance in clinical research.

ai system, data mining, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2512.08592

Country:

North America > Canada > Quebec > Montreal (0.04)
North America > United States > New Jersey > Middlesex County > Edison (0.04)
Africa > Zambia > Southern Province > Choma (0.04)

Genre:

Research Report > Experimental Study (0.88)
Research Report > New Finding (0.74)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Government (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.86)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(2 more...)

Add feedback

GENIUS: An Agentic AI Framework for Autonomous Design and Execution of Simulation Protocols

Soleymanibrojeni, Mohammad, Aydin, Roland, Guedes-Sobrinho, Diego, Dias, Alexandre C., Piotrowski, Maurício J., Wenzel, Wolfgang, Rêgo, Celso Ricardo Caldeira

arXiv.org Artificial IntelligenceDec-9-2025

Computational simulations have revolutionized materials design, accelerating innovation by allowing researchers to explore material properties and their behaviors virtually before experimental validation[1-4]. This shift has led to significant breakthroughs that range from energy storage[5, 6] to pharmaceutical development[7, 8]. However, a persistent challenge undermines this potential: the technical barriers to effective simulation setup disproportionately burden researchers, particularly those whose expertise lies in experimental rather than computational domains. When scientists identify a promising new compound, understanding its fundamental properties often requires computational validation. Y et, even seemingly straightforward simulations frequently lead to lengthy technical challenges. Even experienced computational scientists (physicists, chemists, engineers) find themselves diverted from scientific inquiry toward navigating complex programming challenges, engaging in trial-and-error attempts, and struggling with computational setup details rather than focusing on the scientific questions[9]. Integrated Computational Materials Engineering (ICME) has emerged as a robust framework to accelerate materials development by synergizing experimental data, simulations, and theoretical models across multiple scales.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2512.06404

Country:

Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
South America > Brazil > Federal District > Brasília (0.04)
South America > Brazil > Paraná > Curitiba (0.04)
North America > Montserrat (0.04)

Genre: Research Report (0.82)

Industry:

Materials (0.66)
Energy > Energy Storage (0.48)
Health & Medicine > Pharmaceuticals & Biotechnology (0.34)

Technology:

Information Technology > Human Computer Interaction (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Learning to Code with Context: A Study-Based Approach

Borghoff, Uwe M., Minas, Mark, Schopp, Jannis

arXiv.org Artificial IntelligenceDec-8-2025

The rapid emergence of generative AI tools is transforming the way software is developed. Consequently, software engineering education must adapt to ensure that students not only learn traditional development methods but also understand how to meaningfully and responsibly use these new technologies. In particular, project-based courses offer an effective environment to explore and evaluate the integration of AI assistance into real-world development practices. This paper presents our approach and a user study conducted within a university programming project in which students collaboratively developed computer games. The study investigates how participants used generative AI tools throughout different phases of the software development process, identifies the types of tasks where such tools were most effective, and analyzes the challenges students encountered. Building on these insights, we further examine a repository-aware, locally deployed large language model (LLM) assistant designed to provide project-contextualized support. The system employs Retrieval-Augmented Generation (RAG) to ground responses in relevant documentation and source code, enabling qualitative analysis of model behavior, parameter sensitivity, and common failure modes. The findings deepen our understanding of context-aware AI support in educational software projects and inform future integration of AI-based assistance into software engineering curricula.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2512.05242

Country:

North America > United States > Florida > Miami-Dade County > Miami (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(9 more...)

Genre:

Research Report > New Finding (1.00)
Instructional Material > Course Syllabus & Notes (1.00)
Overview (0.92)
(2 more...)

Industry:

Education > Curriculum > Subject-Specific Education (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.69)

Add feedback