AITopics | monologue

Collaborating Authors

monologue

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

6efcc7fd8efeee29a050a79c843c90e0-Paper-Conference.pdf

Neural Information Processing SystemsFeb-15-2026, 17:21:01 GMT

large language model, machine learning, programming language, (21 more...)

Neural Information Processing Systems

Country: North America > United States (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Education (0.92)
Information Technology (0.67)

Technology:

Information Technology > Software Engineering (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(3 more...)

Add feedback

Training Code Language Models with Comprehensive Semantics Reasoning

Neural Information Processing SystemsOct-10-2025, 05:34:12 GMT

monologue, oder, reasoning, (14 more...)

Neural Information Processing Systems

Country: North America > United States (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Education (0.92)
Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

FLM-Audio: Natural Monologues Improves Native Full-Duplex Chatbots via Dual Training

Yao, Yiqun, Li, Xiang, Jiang, Xin, Fang, Xuezhi, Yu, Naitong, Ma, Wenjia, Sun, Aixin, Wang, Yequan

arXiv.org Artificial IntelligenceSep-12-2025

Full-duplex dialog models aim to listen and speak simultaneously, delivering rapid responses to dynamic user input. Among different solutions to full duplexity, a native solution merges multiple channels in each time step, achieving the lowest latency. However, prevailing designs break down the textual monologue sentences for word-level alignment with audio streams, which degrades language modeling abilities. To help address this issue, we introduce natural monologues, which are composed by continuous sentences and waiting intervals, mimicking humanoid cognitive behavior in dialogs. We find a proper training paradigm to be critical for semantically aligning natural monologues with audio. To this end, we develop a dual training paradigm that alternates the position of the monologues, either leading or trailing the audio, across different training stages. A combination of our natural monologue and dual training strategy is applied in developing FLM-Audio, our 7B spoken dialog chatbot with native full-duplexity. As confirmed by experimental results, FLM-Audio achieves superior response qualities and chatting experiences while requiring significantly less training data.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2509.02521

Country: Asia > China (0.14)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (0.48)
Media (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Multimodal Proposal for an AI-Based Tool to Increase Cross-Assessment of Messages

Castro, Alejandro Álvarez, Ordieres-Meré, Joaquín

arXiv.org Artificial IntelligenceSep-5-2025

Earnings calls represent a uniquely rich and semi-structured source of financial communication, blending scripted managerial commentary with unscripted analyst dialogue. Although recent advances in financial sentiment analysis have integrated multi-modal signals, such as textual content and vocal tone, most systems rely on flat document-level or sentence-level models, failing to capture the layered discourse structure of these interactions. This paper introduces a novel multi-modal framework designed to generate semantically rich and structurally aware embeddings of earnings calls, by encoding them as hierarchical discourse trees. Each node, comprising either a monologue or a question-answer pair, is enriched with emotional signals derived from text, audio, and video, as well as structured metadata including coherence scores, topic labels, and answer coverage assessments. A two-stage transformer architecture is proposed: the first encodes multi-modal content and discourse metadata at the node level using contrastive learning, while the second synthesizes a global embedding for the entire conference. Experimental results reveal that the resulting embeddings form stable, semantically meaningful representations that reflect affective tone, structural logic, and thematic alignment. Beyond financial reporting, the proposed system generalizes to other high-stakes unscripted communicative domains such as tele-medicine, education, and political discourse, offering a robust and explainable approach to multi-modal discourse representation. This approach offers practical utility for downstream tasks such as financial forecasting and discourse evaluation, while also providing a generalizable method applicable to other domains involving high-stakes communication.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2509.03529

Country: Europe > Spain (0.15)

Genre: Financial News (1.00)

Industry: Banking & Finance (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Towards Film-Making Production Dialogue, Narration, Monologue Adaptive Moving Dubbing Benchmarks

Wang, Chaoyi, Zheng, Junjie, Chen, Zihao, Xia, Shiyu, Ding, Chaofan, Zhang, Xiaohao, Tao, Xi, He, Xiaoming, Di, Xinhan

arXiv.org Artificial IntelligenceMay-6-2025

Movie dubbing has advanced significantly, yet assessing the real-world effectiveness of these models remains challenging. A comprehensive evaluation benchmark is crucial for two key reasons: 1) Existing metrics fail to fully capture the complexities of dialogue, narration, monologue, and actor adaptability in movie dubbing. 2) A practical evaluation system should offer valuable insights to improve movie dubbing quality and advancement in film production. To this end, we introduce Talking Adaptive Dubbing Benchmarks (TA-Dubbing), designed to improve film production by adapting to dialogue, narration, monologue, and actors in movie dubbing. TA-Dubbing offers several key advantages: 1) Comprehensive Dimensions: TA-Dubbing covers a variety of dimensions of movie dubbing, incorporating metric evaluations for both movie understanding and speech generation. 2) Versatile Benchmarking: TA-Dubbing is designed to evaluate state-of-the-art movie dubbing models and advanced multi-modal large language models. 3) Full Open-Sourcing: We fully open-source TA-Dubbing at https://github.com/woka- 0a/DeepDubber- V1 including all video suits, evaluation methods, annotations. We also continuously integrate new movie dubbing models into the TA-Dubbing leaderboard at https://github.com/woka- 0a/DeepDubber-V1 to drive forward the field of movie dubbing.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2505.0145

Country:

Asia > China (0.15)
Oceania > Australia (0.14)
Asia > Thailand (0.14)

Genre: Research Report (0.50)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

ThinkPatterns-21k: A Systematic Study on the Impact of Thinking Patterns in LLMs

Wen, Pengcheng, Ji, Jiaming, Chan, Chi-Min, Dai, Juntao, Hong, Donghai, Yang, Yaodong, Han, Sirui, Guo, Yike

arXiv.org Artificial IntelligenceMar-17-2025

Large language models (LLMs) have demonstrated enhanced performance through the \textit{Thinking then Responding} paradigm, where models generate internal thoughts before final responses (aka, System 2 thinking). However, existing research lacks a systematic understanding of the mechanisms underlying how thinking patterns affect performance across model sizes. In this work, we conduct a comprehensive analysis of the impact of various thinking types on model performance and introduce ThinkPatterns-21k, a curated dataset comprising 21k instruction-response pairs (QA) collected from existing instruction-following datasets with five thinking types. For each pair, we augment it with five distinct internal thinking patterns: one unstructured thinking (monologue) and four structured variants (decomposition, self-ask, self-debate and self-critic), while maintaining the same instruction and response. Through extensive evaluation across different model sizes (3B-32B parameters), we have two key findings: (1) smaller models (<30B parameters) can benefit from most of structured thinking patterns, while larger models (32B) with structured thinking like decomposition would degrade performance and (2) unstructured monologue demonstrates broad effectiveness across different model sizes. Finally, we released all of our datasets, checkpoints, training logs of diverse thinking patterns to reproducibility, aiming to facilitate further research in this direction.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2503.12918

Country:

Africa > Tanzania (0.05)
Africa > Kenya (0.05)
Africa > Zimbabwe (0.04)
(11 more...)

Genre: Research Report > New Finding (0.46)

Industry: Consumer Products & Services (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Talking to oneself in CMC: a study of self replies in Wikipedia talk pages

Tanguy, Ludovic, Poudat, Céline, Ho-Dac, Lydia-Mai

arXiv.org Artificial IntelligenceNov-28-2024

This study proposes a qualitative analysis of self replies in Wikipedia talk pages, more precisely when the first two messages of a discussion are written by the same user. This specific pattern occurs in more than 10% of threads with two messages or more and can be explained by a number of reasons. After a first examination of the lexical specificities of second messages, we propose a seven categories typology and use it to annotate two reference samples (English and French) of 100 threads each. Finally, we analyse and compare the performance of human annotators (who reach a reasonable global efficiency) and instruction-tuned LLMs (which encounter important difficulties with several categories).

category, large language model, natural language, (19 more...)

arXiv.org Artificial Intelligence

2411.19007

Country:

North America > United States > Oregon > Lane County > Eugene (0.04)
Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
Europe > France > Provence-Alpes-Côte d'Azur (0.04)
Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Communications > Social Media (0.77)

Add feedback

I or Not I: Unraveling the Linguistic Echoes of Identity in Samuel Beckett's "Not I" Through Natural Language Processing

Pourzarandi, Arezou Zahiri, Jafari, Farshad

arXiv.org Artificial IntelligenceOct-12-2024

Exploring the depths of Samuel Beckett's "Not I" through advanced natural language processing techniques, this research uncovers the intricate linguistic structures that underpin the text. By analyzing word frequency, detecting emotional sentiments with a BERT-based model, and examining repetitive motifs, we unveil how Beckett's minimalist yet complex language reflects the protagonist's fragmented psyche. Our results demonstrate that recurring themes of time, memory, and existential angst are artfully woven through recursive linguistic patterns and rhythmic repetition. This innovative approach not only deepens our understanding of Beckett's stylistic contributions but also highlights his unique role in modern literature, where language transcends simple communication to explore profound existential questions.

artificial intelligence, natural language, text processing, (19 more...)

arXiv.org Artificial Intelligence

2410.09608

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.54)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.47)

Add feedback

SemCoder: Training Code Language Models with Comprehensive Semantics

Ding, Yangruibo, Peng, Jinjun, Min, Marcus J., Kaiser, Gail, Yang, Junfeng, Ray, Baishakhi

arXiv.org Artificial IntelligenceJun-3-2024

Code Large Language Models (Code LLMs) have excelled at tasks like code completion but often miss deeper semantics such as execution effects and dynamic states. This paper aims to bridge the gap between Code LLMs' reliance on static text data and the need for thorough semantic understanding for complex tasks like debugging and program repair. We introduce a novel strategy to train Code LLMs with comprehensive semantics, encompassing high-level functional descriptions, local execution effects of individual statements, and overall input/output behavior, thereby linking static code text with dynamic execution states. We begin by collecting PyX, a clean code corpus of fully executable samples with functional descriptions and execution tracing. We propose training Code LLMs to write code and represent and reason about execution behaviors using natural language, mimicking human verbal debugging. This approach led to the development of SemCoder, a Code LLM with only 6.7B parameters, which shows competitive performance with GPT-3.5-turbo on code generation and execution reasoning tasks. SemCoder achieves 81.1% on HumanEval (GPT-3.5-turbo: 76.8%) and 54.5% on CRUXEval-I (GPT-3.5-turbo: 50.3%). We also study the effectiveness of SemCoder's monologue-style execution reasoning compared to concrete scratchpad reasoning, showing that our approach integrates semantics from multiple dimensions more smoothly. Finally, we demonstrate the potential of applying learned semantics to improve Code LLMs' debugging and self-refining capabilities.

code generation, code llm, reasoning, (14 more...)

arXiv.org Artificial Intelligence

2406.01006

Country: North America > United States > New York > New York County > New York City (0.04)

Genre:

Workflow (0.46)
Research Report (0.40)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Linguistic Changes in Spontaneous Speech for Detecting Parkinsons Disease Using Large Language Models

Crawford, Jonathan

arXiv.org Artificial IntelligenceApr-7-2024

Parkinsons disease is the second most prevalent neurodegenerative disorder with over ten million active cases worldwide and one million new diagnoses per year. Detecting and subsequently diagnosing the disease is challenging because of symptom heterogeneity with respect to complexity, as well as the type and timing of phenotypic manifestations. Typically, language impairment can present in the prodromal phase and precede motor symptoms suggesting that a linguistic-based approach could serve as a diagnostic method for incipient Parkinsons disease. Additionally, improved linguistic models may enhance other approaches through ensemble techniques. The field of large language models is advancing rapidly, presenting the opportunity to explore the use of these new models for detecting Parkinsons disease and to improve on current linguistic approaches with high-dimensional representations of linguistics. We evaluate the application of state-of-the-art large language models to detect Parkinsons disease automatically from spontaneous speech with up to 73% accuracy.

accuracy, language model, parkinson, (14 more...)

arXiv.org Artificial Intelligence

2404.0516

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Wisconsin > Milwaukee County > Milwaukee (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > Greenland (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology > Parkinson's Disease (1.00)
Health & Medicine > Therapeutic Area > Musculoskeletal (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback