natural language generation
- North America > United States > Utah > Utah County > Provo (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > North Carolina > Durham County > Durham (0.04)
- (3 more...)
- Research Report > Promising Solution (0.46)
- Research Report > New Finding (0.46)
- Government (0.67)
- Media > News (0.46)
Exploring the Influence of Relevant Knowledge for Natural Language Generation Interpretability
Martínez-Murillo, Iván, Moreda, Paloma, Lloret, Elena
This paper explores the influence of external knowledge integration in Natural Language Generation (NLG), focusing on a commonsense generation task. We extend the CommonGen dataset by creating KITGI, a benchmark that pairs input concept sets with retrieved semantic relations from ConceptNet and includes manually annotated outputs. Using the T5-Large model, we compare sentence generation under two conditions: with full external knowledge and with filtered knowledge where highly relevant relations were deliberately removed. Our interpretability benchmark follows a three-stage method: (1) identifying and removing key knowledge, (2) regenerating sentences, and (3) manually assessing outputs for commonsense plausibility and concept coverage. Results show that sentences generated with full knowledge achieved 91\% correctness across both criteria, while filtering reduced performance drastically to 6\%. These findings demonstrate that relevant external knowledge is critical for maintaining both coherence and concept coverage in NLG. This work highlights the importance of designing interpretable, knowledge-enhanced NLG systems and calls for evaluation frameworks that capture the underlying reasoning beyond surface-level metrics.
- Europe > Spain > Valencian Community > Alicante Province > Alicante (0.05)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Colorado (0.04)
- (5 more...)
The QCET Taxonomy of Standard Quality Criterion Names and Definitions for the Evaluation of NLP Systems
Belz, Anya, Mille, Simon, Thomson, Craig
Prior work has shown that two NLP evaluation experiments that report results for the same quality criterion name (e.g. Fluency) do not necessarily evaluate the same aspect of quality, and the comparability implied by the name can be misleading. Not knowing when two evaluations are comparable in this sense means we currently lack the ability to draw reliable conclusions about system quality on the basis of multiple, independently conducted evaluations. This in turn hampers the ability of the field to progress scientifically as a whole, a pervasive issue in NLP since its beginning (Sparck Jones, 1981). It is hard to see how the issue of unclear comparability can be fully addressed other than by the creation of a standard set of quality criterion names and definitions that the several hundred quality criterion names actually in use in the field can be mapped to, and grounded in. Taking a strictly descriptive approach, the QCET Quality Criteria for Evaluation Taxonomy derives a standard set of quality criterion names and definitions from three surveys of evaluations reported in NLP, and structures them into a hierarchy where each parent node captures common aspects of its child nodes. We present QCET and the resources it consists of, and discuss its three main uses in (i) establishing comparability of existing evaluations, (ii) guiding the design of new evaluations, and (iii) assessing regulatory compliance.
- Research Report (1.00)
- Overview (0.67)
- Law (1.00)
- Government (1.00)
- Leisure & Entertainment (0.92)
- Health & Medicine > Therapeutic Area (0.46)
Rule-Based Moral Principles for Explaining Uncertainty in Natural Language Generation
Abstract--Rule-Based Moral Principles for Explaining Uncertainty in Natural Language Generation As large language models (LLMs) are increasingly used in high-stakes applications, the challenge of explaining uncertainty in natural language generation has become both a technical and moral imperative. Traditional approaches rely on probabilistic methods that are often opaque, difficult to interpret, and misaligned with human expectations of transparency and accountability. In response to these limitations, this paper introduces a novel framework based on rule-based moral principles--simple, human-inspired ethical guidelines--for responding to uncertainty in LLM-generated text. Drawing on insights from experimental moral psychology and virtue ethics, we define a set of symbolic behavioral rules such as precaution, deference, and responsibility to guide system responses under conditions of epistemic or aleatoric uncertainty. These rules are implemented declaratively and are designed to generate adaptive, context-sensitive explanations even in the absence of precise confidence metrics. The moral principles are encoded as symbolic rules within a lightweight Prolog-based engine, where each uncertainty tag (low, medium, high) activates an ethically aligned system action along with an automatically generated, plain-language rationale. We evaluate the framework through scenario-based simulations that benchmark rule coverage, assess fairness implications, and analyze trust calibration. An interpretive explanation module is integrated to reveal both the assigned uncertainty level and its underlying justification in a transparent and accessible way. We illustrate the framework through hypothetical yet plausible use cases in clinical and legal domains, demonstrating how rule-based moral reasoning can enhance user trust, promote fairness, and improve the interpretability of AI-generated language. By offering a lightweight, philosophically grounded alternative to probabilistic uncertainty modeling, our approach paves the way for more ethical, human-aligned, and socially responsible natural language generation.
- North America > United States (0.04)
- North America > Canada > Ontario > Durham Region > Oshawa (0.04)
- Asia > Singapore (0.04)
- Law (1.00)
- Health & Medicine (1.00)
- Information Technology > Security & Privacy (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
An End-to-End System for Culturally-Attuned Driving Feedback using a Dual-Component NLG Engine
Thompson, Iniakpokeikiye Peter, Dewei, Yi, Ehud, Reiter
This paper presents an end-to-end mobile system that delivers culturally-attuned safe driving feedback to drivers in Nigeria, a low-resource environment with significant infrastructural challenges. The core of the system is a novel dual-component Natural Language Generation (NLG) engine that provides both legally-grounded safety tips and persuasive, theory-driven behavioural reports. We describe the complete system architecture, including an automatic trip detection service, on-device behaviour analysis, and a sophisticated NLG pipeline that leverages a two-step reflection process to ensure high-quality feedback. The system also integrates a specialized machine learning model for detecting alcohol-influenced driving, a key local safety issue. The architecture is engineered for robustness against intermittent connectivity and noisy sensor data. A pilot deployment with 90 drivers demonstrates the viability of our approach, and initial results on detected unsafe behaviours are presented. This work provides a framework for applying data-to-text and AI systems to achieve social good.
- Europe > United Kingdom > Scotland > City of Aberdeen > Aberdeen (0.05)
- North America > Canada > Ontario > Toronto (0.04)
- Africa > Nigeria > Osun State > Ile-Ife (0.04)
- North America > United States > Utah > Utah County > Provo (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > North Carolina > Durham County > Durham (0.04)
- (5 more...)
- Research Report > Promising Solution (0.46)
- Research Report > New Finding (0.46)
- Government (0.67)
- Media > News (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Communications (0.94)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.74)
My Life in Artificial Intelligence: People, anecdotes, and some lessons learnt
In this very personal workography, I relate my 40-year experiences as a researcher and educator in and around Artificial Intelligence (AI), more specifically Natural Language Processing. I describe how curiosity, and the circumstances of the day, led me to work in both industry and academia, and in various countries, including The Netherlands (Amsterdam, Eindhoven, and Utrecht), the USA (Stanford), England (Brighton), Scotland (Aberdeen), and China (Beijing and Harbin). People and anecdotes play a large role in my story; the history of AI forms its backdrop. I focus on things that might be of interest to (even) younger colleagues, given the choices they face in their own work and life at a time when AI is finally emerging from the shadows.
- Europe > Netherlands > North Holland > Amsterdam (0.25)
- Europe > Netherlands > North Brabant > Eindhoven (0.25)
- Asia > China > Beijing > Beijing (0.24)
- (18 more...)
- Leisure & Entertainment (1.00)
- Education (1.00)
- Health & Medicine (0.92)
- (2 more...)
Natural Language Generation in Healthcare: A Review of Methods and Applications
Lyu, Mengxian, Li, Xiaohan, Chen, Ziyi, Pan, Jinqian, Peng, Cheng, Talankar, Sankalp, Wu, Yonghui
Natural language generation (NLG) is the key technology to achieve generative artificial intelligence (AI). With the breakthroughs in large language models (LLMs), NLG has been widely used in various medical applications, demonstrating the potential to enhance clinical workflows, support clinical decision-making, and improve clinical documentation. Heterogeneous and diverse medical data modalities, such as medical text, images, and knowledge bases, are utilized in NLG. Researchers have proposed many generative models and applied them in a number of healthcare applications. There is a need for a comprehensive review of NLG methods and applications in the medical domain. In this study, we systematically reviewed 113 scientific publications from a total of 3,988 NLG-related articles identified using a literature search, focusing on data modality, model architecture, clinical applications, and evaluation methods. Following PRISMA (Preferred Reporting Items for Systematic reviews and Meta-Analyses) guidelines, we categorize key methods, identify clinical applications, and assess their capabilities, limitations, and emerging challenges. This timely review covers the key NLG technologies and medical applications and provides valuable insights for future studies to leverage NLG to transform medical discovery and healthcare.
- North America > United States > Florida > Alachua County > Gainesville (0.28)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- (4 more...)
- Overview (1.00)
- Research Report > Experimental Study (0.68)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- Health & Medicine > Health Care Technology > Medical Record (0.97)
- Health & Medicine > Nuclear Medicine (0.70)
- (2 more...)
JaccDiv: A Metric and Benchmark for Quantifying Diversity of Generated Marketing Text in the Music Industry
Afzal, Anum, Mercier, Alexandre, Matthes, Florian
Online platforms are increasingly interested in using Data-to-Text technologies to generate content and help their users. Unfortunately, traditional generative methods often fall into repetitive patterns, resulting in monotonous galleries of texts after only a few iterations. In this paper, we investigate LLM-based data-to-text approaches to automatically generate marketing texts that are of sufficient quality and diverse enough for broad adoption. We leverage Language Models such as T5, GPT -3.5, GPT -4, and LLaMa2 in conjunction with fine-tuning, few-shot, and zero-shot approaches to set a baseline for diverse marketing texts. We also introduce a metric JaccDiv to evaluate the diversity of a set of texts. This research extends its relevance beyond the music industry, proving beneficial in various fields where repetitive automated content generation is prevalent.
- North America > United States (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)
- (10 more...)
Anyprefer: An Agentic Framework for Preference Data Synthesis
Zhou, Yiyang, Wang, Zhaoyang, Wang, Tianle, Xing, Shangyu, Xia, Peng, Li, Bo, Zheng, Kaiyuan, Zhang, Zijian, Chen, Zhaorun, Zheng, Wenhao, Zhang, Xuchao, Bansal, Chetan, Zhang, Weitong, Wei, Ying, Bansal, Mohit, Yao, Huaxiu
High-quality preference data is essential for aligning foundation models with human values through preference learning. However, manual annotation of such data is often time-consuming and costly. Recent methods often adopt a self-rewarding approach, where the target model generates and annotates its own preference data, but this can lead to inaccuracies since the reward model shares weights with the target model, thereby amplifying inherent biases. To address these issues, we propose Anyprefer, a framework designed to synthesize high-quality preference data for aligning the target model. Anyprefer frames the data synthesis process as a cooperative two-player Markov Game, where the target model and the judge model collaborate together. Here, a series of external tools are introduced to assist the judge model in accurately rewarding the target model's responses, mitigating biases in the rewarding process. In addition, a feedback mechanism is introduced to optimize prompts for both models, enhancing collaboration and improving data quality. The synthesized data is compiled into a new preference dataset, Anyprefer-V1, consisting of 58K high-quality preference pairs. Extensive experiments show that Anyprefer significantly improves model alignment performance across four main applications, covering 21 datasets, achieving average improvements of 18.55% in five natural language generation datasets, 3.66% in nine vision-language understanding datasets, 30.05% in three medical image analysis datasets, and 16.00% in four visuo-motor control tasks.
- Europe > France (0.04)
- Asia > Middle East > Jordan (0.04)
- Education (0.93)
- Health & Medicine > Diagnostic Medicine > Imaging (0.67)