AITopics

2508.17184

Country: Europe > United Kingdom (0.28)

Genre:

Overview (1.00)
Research Report > New Finding (0.92)

Industry:

Information Technology (1.00)
Education (1.00)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Bedrick, Steven, Doğruöz, A. Seza, Nisioi, Sergiu

A Typology of Synthetic Datasets for Dialogue Processing in Clinical Contexts

arXiv.org Artificial IntelligenceNov-20-2025

Synthetic data sets are used across linguistic domains and NLP tasks, particularly in scenarios where authentic data is limited (or even non-existent). One such domain is that of clinical (healthcare) contexts, where there exist significant and long-standing challenges (e.g., privacy, anonymization, and data governance) which have led to the development of an increasing number of synthetic datasets. One increasingly important category of clinical dataset is that of clinical dialogues which are especially sensitive and difficult to collect, and as such are commonly synthesized. While such synthetic datasets have been shown to be sufficient in some situations, little theory exists to inform how they may be best used and generalized to new applications. In this paper, we provide an overview of how synthetic datasets are created, evaluated and being used for dialogue related tasks in the medical domain. Additionally, we propose a novel typology for use in classifying types and degrees of data synthesis, to facilitate comparison and evaluation.

data mining, large language model, machine learning, (22 more...)

2505.03025

Country:

North America > United States (1.00)
Asia (1.00)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)

Genre: Overview (1.00)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Health Care Providers & Services (0.67)
Health & Medicine > Consumer Health (0.67)
Health & Medicine > Health Care Technology > Medical Record (0.47)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)
(4 more...)

Neural Information Processing SystemsNov-19-2025, 22:07:59 GMT

9547b09b722f2948ff3ddb5d86002bc0-Paper-Datasets_and_Benchmarks_Track.pdf

artificial intelligence, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
South America > Paraguay > Asunción > Asunción (0.04)
Europe > Austria (0.04)
(13 more...)

Genre:

Research Report (0.67)
Questionnaire & Opinion Survey (0.49)
Overview (0.46)

Industry: Government (0.46)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(4 more...)

Neural Information Processing SystemsNov-19-2025, 17:47:35 GMT

b693a240cf1009bff9fa4422141c9392-Paper-Conference.pdf

artificial intelligence, calibration, machine learning, (19 more...)

Neural Information Processing Systems

Country:

Asia > China > Jiangsu Province > Nanjing (0.04)
Europe > France (0.04)

Genre:

Research Report > New Finding (0.68)
Overview (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Vision (0.93)

Neural Information Processing SystemsNov-19-2025, 16:42:33 GMT

Ethical Considerations for Responsible Data Curation

Equal contribution; authors are listed in random order.

computer vision, data mining, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Oceania > Australia (0.28)
Asia > China (0.14)
North America > United States > Illinois > Cook County > Chicago (0.04)
(19 more...)

Genre:

Research Report (0.93)
Overview (0.67)
Personal > Interview (0.48)

Industry:

Media (1.00)
Law > Civil Rights & Constitutional Law (1.00)
Information Technology > Security & Privacy (1.00)
(5 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
(8 more...)

SparseST: Exploiting Data Sparsity in Spatiotemporal Modeling and Prediction

Wu, Junfeng, Benmeziane, Hadjer, Maghraoui, Kaoutar El, Liu, Liu, Wang, Yinan

Spatiotemporal data mining (STDM) has a wide range of applications in various complex physical systems (CPS), i.e., transportation, manufacturing, healthcare, etc. Among all the proposed methods, the Convolutional Long Short-Term Memory (ConvLSTM) has proved to be generalizable and extendable in different applications and has multiple variants achieving state-of-the-art performance in various STDM applications. However, ConvLSTM and its variants are computationally expensive, which makes them inapplicable in edge devices with limited computational resources. With the emerging need for edge computing in CPS, efficient AI is essential to reduce the computational cost while preserving the model performance. Common methods of efficient AI are developed to reduce redundancy in model capacity (i.e., model pruning, compression, etc.). However, spatiotemporal data mining naturally requires extensive model capacity, as the embedded dependencies in spatiotemporal data are complex and hard to capture, which limits the model redundancy. Instead, there is a fairly high level of data and feature redundancy that introduces an unnecessary computational burden, which has been largely overlooked in existing research. Therefore, we developed a novel framework SparseST, that pioneered in exploiting data sparsity to develop an efficient spatiotemporal model. In addition, we explore and approximate the Pareto front between model performance and computational efficiency by designing a multi-objective composite loss function, which provides a practical guide for practitioners to adjust the model according to computational resource constraints and the performance requirements of downstream tasks.

data mining, machine learning, sparse convolution, (21 more...)

2511.14753

Genre:

Research Report (0.64)
Workflow (0.46)
Overview (0.46)

Industry: Transportation (0.48)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
(2 more...)

Bias in, Bias out: Annotation Bias in Multilingual Large Language Models

Cui, Xia, Huang, Ziyi, Adel, Naeemeh

Annotation bias in NLP datasets remains a major challenge for developing multilingual Large Language Models (LLMs), particularly in culturally diverse settings. Bias from task framing, annotator subjectivity, and cultural mismatches can distort model outputs and exacerbate social harms. We propose a comprehensive framework for understanding annotation bias, distinguishing among instruction bias, annotator bias, and contextual and cultural bias. We review detection methods (including inter-annotator agreement, model disagreement, and metadata analysis) and highlight emerging techniques such as multilingual model divergence and cultural inference. We further outline proactive and reactive mitigation strategies, including diverse annotator recruitment, iterative guideline refinement, and post-hoc model adjustments. Our contributions include: (1) a typology of annotation bias; (2) a synthesis of detection metrics; (3) an ensemble-based bias mitigation approach adapted for multilingual settings, and (4) an ethical analysis of annotation processes. Together, these insights aim to inform more equitable and culturally grounded annotation pipelines for LLMs.

computational linguistic, large language model, machine learning, (16 more...)

2511.14662

Country:

Europe (1.00)
Asia > Middle East > UAE (0.28)
North America > United States > Minnesota (0.28)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Etumi, Mahdi, Taylor, Hazel M., Farrell, Marie

Towards A Catalogue of Requirement Patterns for Space Robotic Missions

In the development of safety and mission-critical systems, including autonomous space robotic missions, complex behaviour is captured during the requirements elicitation phase. Requirements are typically expressed using natural language which is ambiguous and not amenable to formal verification methods that can provide robust guarantees of system behaviour. To support the definition of formal requirements, specification patterns provide reusable, logic-based templates. A suite of robotic specification patterns, along with their formalisation in NASA's Formal Requirements Elicitation Tool (FRET) already exists. These pre-existing requirement patterns are domain agnostic and, in this paper we explore their applicability for space missions. To achieve this we carried out a literature review of existing space missions and formalised their requirements using FRET, contributing a corpus of space mission requirements. We categorised these requirements using pre-existing specification patterns which demonstrated their applicability in space missions. However, not all of the requirements that we formalised corresponded to an existing pattern so we have contributed 5 new requirement specification patterns as well as several variants of the existing and new patterns. We also conducted an expert evaluation of the new patterns, highlighting their benefits and limitations.

artificial intelligence, astrobee, requirement, (17 more...)

doi: 10.4204/EPTCS.436.12

2511.14438

Country: North America > United States (1.00)

Genre:

Research Report (0.64)
Overview (0.48)

Industry:

Transportation > Air (1.00)
Government > Space Agency (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Watchdogs and Oracles: Runtime Verification Meets Large Language Models for Autonomous Systems

Ferrando, Angelo

Assuring the safety and trustworthiness of autonomous systems is particularly difficult when learning-enabled components and open environments are involved. Formal methods provide strong guarantees but depend on complete models and static assumptions. Runtime verification (RV) complements them by monitoring executions at run time and, in its predictive variants, by anticipating potential violations. Large language models (LLMs), meanwhile, excel at translating natural language into formal artefacts and recognising patterns in data, yet they remain error-prone and lack formal guarantees. This vision paper argues for a symbiotic integration of RV and LLMs. RV can serve as a guardrail for LLM-driven autonomy, while LLMs can extend RV by assisting specification capture, supporting anticipatory reasoning, and helping to handle uncertainty. We outline how this mutual reinforcement differs from existing surveys and roadmaps, discuss challenges and certification implications, and identify future research directions towards dependable autonomy.

artificial intelligence, large language model, natural language, (17 more...)

doi: 10.4204/EPTCS.436.8

2511.14435

Country: Europe > Austria (0.46)

Genre:

Research Report (0.70)
Overview (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

SciRAG: Adaptive, Citation-Aware, and Outline-Guided Retrieval and Synthesis for Scientific Literature

Ding, Hang, Zhao, Yilun, Hu, Tiansheng, Patwardhan, Manasi, Cohan, Arman

The accelerating growth of scientific publications has intensified the need for scalable, trustworthy systems to synthesize knowledge across diverse literature. While recent retrieval-augmented generation (RAG) methods have improved access to scientific information, they often overlook citation graph structure, adapt poorly to complex queries, and yield fragmented, hard-to-verify syntheses. We introduce SciRAG, an open-source framework for scientific literature exploration that addresses these gaps through three key innovations: (1) adaptive retrieval that flexibly alternates between sequential and parallel evidence gathering; (2) citation-aware symbolic reasoning that leverages citation graphs to organize and filter supporting documents; and (3) outline-guided synthesis that plans, critiques, and refines answers to ensure coherence and transparent attribution. Extensive experiments across multiple benchmarks such as QASA and ScholarQA demonstrate that SciRAG outperforms prior systems in factual accuracy and synthesis quality, establishing a new foundation for reliable, large-scale scientific knowledge aggregation.

large language model, machine learning, natural language, (22 more...)

2511.14362

Country: Asia > China (0.28)

Genre:

Research Report (1.00)
Overview (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)