AITopics | verbalization

Collaborating Authors

verbalization

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Do Natural Language Descriptions of Model Activations Convey Privileged Information?

Li, Millicent, Arroyo, Alberto Mario Ceballos, Rogers, Giordano, Saphra, Naomi, Wallace, Byron C.

arXiv.org Artificial IntelligenceDec-10-2025

Recent interpretability methods have proposed to translate LLM internal representations into natural language descriptions using a second verbalizer LLM. This is intended to illuminate how the target model represents and operates on inputs. But do such activation verbalization approaches actually provide privileged knowledge about the internal workings of the target model, or do they merely convey information about its inputs? We critically evaluate popular verbalization methods across datasets used in prior work and find that they can succeed at benchmarks without any access to target model internals, suggesting that these datasets may not be ideal for evaluating verbalization methods. We then run controlled experiments which reveal that verbalizations often reflect the parametric knowledge of the verbalizer LLM which generated them, rather than the knowledge of the target LLM whose activations are decoded. Taken together, our results indicate a need for targeted benchmarks and experimental controls to rigorously assess whether verbalization methods provide meaningful insights into the operations of LLMs.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2509.13316

Country:

North America > United States (1.00)
Europe (1.00)
Asia (1.00)

Genre: Research Report > New Finding (1.00)

Industry:

Education (0.67)
Leisure & Entertainment > Games > Computer Games (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

Informative Communication of Robot Plans

Persiani, Michele, Hellstrom, Thomas

arXiv.org Artificial IntelligenceNov-18-2025

When a robot is asked to verbalize its plan it can do it in many ways. For example, a seemingly natural strategy is incremental, where the robot verbalizes its planned actions in plan order. However, an important aspect of this type of strategy is that it misses considerations on what is effectively informative to communicate, because not considering what the user knows prior to explanations. In this paper we propose a verbalization strategy to communicate robot plans informatively, by measuring the information gain that verbalizations have against a second-order theory of mind of the user capturing his prior knowledge on the robot. As shown in our experiments, this strategy allows to understand the robot's goal much quicker than by using strategies such as increasing or decreasing plan order. In addition, following our formulation we hint to what is informative and why when a robot communicates its plan.

artificial intelligence, planning & scheduling, verbalization, (15 more...)

arXiv.org Artificial Intelligence

2511.13226

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.69)

Add feedback

Multilingual Target-Stance Extraction

Mines, Ethan, Dorr, Bonnie

arXiv.org Artificial IntelligenceOct-28-2025

Social media enables data-driven analysis of public opinion on contested issues. Target-Stance Extraction (TSE) is the task of identifying the target discussed in a document and the document's stance towards that target. Many works classify stance towards a given target in a multilingual setting, but all prior work in TSE is English-only. This work introduces the first multilingual TSE benchmark, spanning Catalan, Estonian, French, Italian, Mandarin, and Spanish corpora. It manages to extend the original TSE pipeline to a multilingual setting without requiring separate models for each language. Our model pipeline achieves a modest F1 score of 12.78, underscoring the increased difficulty of the multilingual task relative to English-only setups and highlighting target prediction as the primary bottleneck. We are also the first to demonstrate the sensitivity of TSE's F1 score to different target verbalizations. Together these serve as a much-needed baseline for resources, algorithms, and evaluation criteria in multilingual TSE.

computational linguistic, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2510.22334

Country:

Asia > Middle East (0.68)
Europe > France (0.68)
North America > United States > California (0.14)
Europe > Austria > Vienna (0.14)

Genre: Research Report (1.00)

Industry: Government > Regional Government > Europe Government (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.55)

Add feedback

Measuring the Effect of Disfluency in Multilingual Knowledge Probing Benchmarks

Semenov, Kirill, Sennrich, Rico

arXiv.org Artificial IntelligenceOct-20-2025

For multilingual factual knowledge assessment of LLMs, benchmarks such as MLAMA use template translations that do not take into account the grammatical and semantic information of the named entities inserted in the sentence. This leads to numerous instances of ungrammaticality or wrong wording of the final prompts, which complicates the interpretation of scores, especially for languages that have a rich morphological inventory. In this work, we sample 4 Slavic languages from the MLAMA dataset and compare the knowledge retrieval scores between the initial (templated) MLAMA dataset and its sentence-level translations made by Google Translate and ChatGPT. We observe a significant increase in knowledge retrieval scores, and provide a qualitative analysis for possible reasons behind it. We also make an additional analysis of 5 more languages from different families and see similar patterns. Therefore, we encourage the community to control the grammaticality of highly multilingual datasets for higher and more interpretable results, which is well approximated by whole sentence translation with neural MT or LLM systems. The dataset and all related code is published at the Github repository: https://github.com/ZurichNLP/Fluent-mLAMA.

computational linguistic, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2510.15115

Country:

Europe (1.00)
Asia (1.00)
North America > United States (0.67)

Genre: Research Report > New Finding (1.00)

Industry: Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Neologism Learning for Controllability and Self-Verbalization

Hewitt, John, Tafjord, Oyvind, Geirhos, Robert, Kim, Been

arXiv.org Artificial IntelligenceOct-10-2025

Humans invent new words when there is a rising demand for a new useful concept (e.g., doomscrolling). We explore and validate a similar idea in our communication with LLMs: introducing new words to better understand and control the models, expanding on the recently introduced neologism learning. This method introduces a new word by adding a new word embedding and training with examples that exhibit the concept with no other changes in model parameters. We show that adding a new word allows for control of concepts such as flattery, incorrect answers, text length, as well as more complex concepts in AxBench. We discover that neologisms can also further our understanding of the model via self-verbalization: models can describe what each new word means to them in natural language, like explaining that a word that represents a concept of incorrect answers means ``a lack of complete, coherent, or meaningful answers...'' To validate self-verbalizations, we introduce plug-in evaluation: we insert the verbalization into the context of a model and measure whether it controls the target concept. In some self-verbalizations, we find machine-only synonyms: words that seem unrelated to humans but cause similar behavior in machines. Finally, we show how neologism learning can jointly learn multiple concepts in multiple words.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2510.08506

Country: Europe (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Think, Verbalize, then Speak: Bridging Complex Thoughts and Comprehensible Speech

Woo, Sang Hoon, Lee, Sehun, Kim, Kang-wook, Kim, Gunhee

arXiv.org Artificial IntelligenceSep-22-2025

Spoken dialogue systems increasingly employ large language models (LLMs) to leverage their advanced reasoning capabilities. However, direct application of LLMs in spoken communication often yield suboptimal results due to mismatches between optimal textual and verbal delivery. While existing approaches adapt LLMs to produce speech-friendly outputs, their impact on reasoning performance remains underexplored. In this work, we propose Think-Verbalize-Speak, a framework that decouples reasoning from spoken delivery to preserve the full reasoning capacity of LLMs. Central to our method is verbalizing, an intermediate step that translates thoughts into natural, speech-ready text. We also introduce ReVerT, a latency-efficient verbalizer based on incremental and asynchronous summarization. Experiments across multiple benchmarks show that our method enhances speech naturalness and conciseness with minimal impact on reasoning. The project page with the dataset and the source code is available at https://yhytoto12.github.io/TVS-ReVerT

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2509.16028

Country: Europe (1.00)

Genre:

Research Report (1.00)
Overview (1.00)
Personal (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

VERBA: Verbalizing Model Differences Using Large Language Models

Doda, Shravan, Javaji, Shashidhar Reddy, Zhu, Zining

arXiv.org Artificial IntelligenceJul-4-2025

In the current machine learning landscape, we face a "model lake" phenomenon: Given a task, there is a proliferation of trained models with similar performances despite different behavior. For model users attempting to navigate and select from the models, documentation comparing model pairs is helpful. However, for every $N$ models there could be $O(N^2)$ pairwise comparisons, a number prohibitive for the model developers to manually perform pairwise comparisons and prepare documentations. To facilitate fine-grained pairwise comparisons among models, we introduced $\textbf{VERBA}$. Our approach leverages a large language model (LLM) to generate verbalizations of model differences by sampling from the two models. We established a protocol that evaluates the informativeness of the verbalizations via simulation. We also assembled a suite with a diverse set of commonly used machine learning models as a benchmark. For a pair of decision tree models with up to 5% performance difference but 20-25% behavioral differences, $\textbf{VERBA}$ effectively verbalizes their variations with up to 80% overall accuracy. When we included the models' structural information, the verbalization's accuracy further improved to 90%. $\textbf{VERBA}$ opens up new research avenues for improving the transparency and comparability of machine learning models in a post-hoc manner.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2507.02241

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

Medical Argument Mining: Exploitation of Scarce Data Using NLI Systems

Urruela, Maitane, Martín, Sergio, De la Iglesia, Iker, Barrena, Ander

arXiv.org Artificial IntelligenceJun-17-2025

In recent years, there has been a growing interest in developing intelligent systems to assist healthcare professionals, particularly in the field of Evidence-Based Medicine (EBM). EBM systems aim to extract pertinent information from unstructured clinical documents and transform it into a structured, machine-readable format, enabling automated analysis. Argument Mining (AM), aligning with EBM, examines the evidence and reasoning clinicians use in clinical cases. This process involves identifying argumentative structures within texts--specifically, finding claims (a point to be proved) and premises (evidence that supports or refutes a claim), and establishing support or attack relations between them. In the clinical context, this process enables the extraction of logical relationships that justify clinical decision-making (Stylianou and Vlahavas, 2021).

large language model, machine learning, relation, (18 more...)

arXiv.org Artificial Intelligence

2506.12823

Country:

North America > United States (0.46)
Europe (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Diagnostic Medicine (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)

Add feedback

Evaluating Design Decisions for Dual Encoder-based Entity Disambiguation

Rücker, Susanna, Akbik, Alan

arXiv.org Artificial IntelligenceMay-20-2025

Entity disambiguation (ED) is the task of linking mentions in text to corresponding entries in a knowledge base. Dual Encoders address this by embedding mentions and label candidates in a shared embedding space and applying a similarity metric to predict the correct label. In this work, we focus on evaluating key design decisions for Dual Encoder-based ED, such as its loss function, similarity metric, label verbalization format, and negative sampling strategy. We present the resulting model VerbalizED, a document-level Dual Encoder model that includes contextual label verbalizations and efficient hard negative sampling. Additionally, we explore an iterative prediction variant that aims to improve the disambiguation of challenging data points. Comprehensive experiments on AIDA-Yago validate the effectiveness of our approach, offering insights into impactful design choices that result in a new State-of-the-Art system on the ZELDA benchmark.

computational linguistic, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2505.11683

Country:

Asia (1.00)
Europe > United Kingdom > Scotland (0.28)
North America > United States > Minnesota (0.28)

Genre: Research Report (0.64)

Industry:

Media > Television (0.94)
Government (0.68)
Leisure & Entertainment > Sports > Soccer (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(2 more...)

Add feedback

RAGONITE: Iterative Retrieval on Induced Databases and Verbalized RDF for Conversational QA over KGs with RAG

Roy, Rishiraj Saha, Hinze, Chris, Schlotthauer, Joel, Naderi, Farzad, Hangya, Viktor, Foltyn, Andreas, Hahn, Luzian, Kuech, Fabian

arXiv.org Artificial IntelligenceDec-25-2024

Conversational question answering (ConvQA) is a convenient means of searching over RDF knowledge graphs (KGs), where a prevalent approach is to translate natural language questions to SPARQL queries. However, SPARQL has certain shortcomings: (i) it is brittle for complex intents and conversational questions, and (ii) it is not suitable for more abstract needs. Instead, we propose a novel two-pronged system where we fuse: (i) SQL-query results over a database automatically derived from the KG, and (ii) text-search results over verbalizations of KG facts. Our pipeline supports iterative retrieval: when the results of any branch are found to be unsatisfactory, the system can automatically opt for further rounds. We put everything together in a retrieval augmented generation (RAG) setup, where an LLM generates a coherent response from accumulated search results. We demonstrate the superiority of our proposed system over several baselines on a knowledge graph of BMW automobiles.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2412.1769

Genre: Research Report (0.64)

Industry:

Automobiles & Trucks (0.92)
Transportation > Passenger (0.70)
Transportation > Ground > Road (0.70)
Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)

Add feedback