Goto

Collaborating Authors

 comma


A Linguistically Motivated Analysis of Intonational Phrasing in Text-to-Speech Systems: Revealing Gaps in Syntactic Sensitivity

Pouw, Charlotte, Alishahi, Afra, Zuidema, Willem

arXiv.org Artificial Intelligence

We analyze the syntactic sensitivity of Text-to-Speech (TTS) systems using methods inspired by psycholinguistic research. Specifically, we focus on the generation of intonational phrase boundaries, which can often be predicted by identifying syntactic boundaries within a sentence. We find that TTS systems struggle to accurately generate intonational phrase boundaries in sentences where syntactic boundaries are ambiguous (e.g., garden path sentences or sentences with attachment ambiguity). In these cases, systems need superficial cues such as commas to place boundaries at the correct positions. In contrast, for sentences with simpler syntactic structures, we find that systems do incorporate syntactic cues beyond surface markers. Finally, we finetune models on sentences without commas at the syntactic boundary positions, encouraging them to focus on more subtle linguistic cues. Our findings indicate that this leads to more distinct intonation patterns that better reflect the underlying structure.


When Thinking Fails: The Pitfalls of Reasoning for Instruction-Following in LLMs

Li, Xiaomin, Yu, Zhou, Zhang, Zhiwei, Chen, Xupeng, Zhang, Ziji, Zhuang, Yingying, Sadagopan, Narayanan, Beniwal, Anurag

arXiv.org Artificial Intelligence

Reasoning-enhanced large language models (RLLMs), whether explicitly trained for reasoning or prompted via chain-of-thought (CoT), have achieved state-of-the-art performance on many complex reasoning tasks. However, we uncover a surprising and previously overlooked phenomenon: explicit CoT reasoning can significantly degrade instruction-following accuracy. Evaluating 15 models on two benchmarks: IFEval (with simple, rule-verifiable constraints) and ComplexBench (with complex, compositional constraints), we consistently observe performance drops when CoT prompting is applied. Through large-scale case studies and an attention-based analysis, we identify common patterns where reasoning either helps (e.g., with formatting or lexical precision) or hurts (e.g., by neglecting simple constraints or introducing unnecessary content). We propose a metric, constraint attention, to quantify model focus during generation and show that CoT reasoning often diverts attention away from instruction-relevant tokens. To mitigate these effects, we introduce and evaluate four strategies: in-context learning, self-reflection, self-selective reasoning, and classifier-selective reasoning. Our results demonstrate that selective reasoning strategies, particularly classifier-selective reasoning, can substantially recover lost performance. To our knowledge, this is the first work to systematically expose reasoning-induced failures in instruction-following and offer practical mitigation strategies.


ConExion: Concept Extraction with Large Language Models

Norouzi, Ebrahim, Hertling, Sven, Sack, Harald

arXiv.org Artificial Intelligence

In this paper, an approach for concept extraction from documents using pre-trained large language models (LLMs) is presented. Compared with conventional methods that extract keyphrases summarizing the important information discussed in a document, our approach tackles a more challenging task of extracting all present concepts related to the specific domain, not just the important ones. Through comprehensive evaluations of two widely used benchmark datasets, we demonstrate that our method improves the F1 score compared to state-of-the-art techniques. Additionally, we explore the potential of using prompts within these models for unsupervised concept extraction. The extracted concepts are intended to support domain coverage evaluation of ontologies and facilitate ontology learning, highlighting the effectiveness of LLMs in concept extraction tasks. Our source code and datasets are publicly available at https://github.com/ISE-FIZKarlsruhe/concept_extraction.


Detecting LLM-Generated Korean Text through Linguistic Feature Analysis

Park, Shinwoo, Kim, Shubin, Kim, Do-Kyung, Han, Yo-Sub

arXiv.org Artificial Intelligence

The rapid advancement of large language models (LLMs) increases the difficulty of distinguishing between human-written and LLM-generated text. Detecting LLM-generated text is crucial for upholding academic integrity, preventing plagiarism, protecting copyrights, and ensuring ethical research practices. Most prior studies on detecting LLM-generated text focus primarily on English text. However, languages with distinct morphological and syntactic characteristics require specialized detection approaches. Their unique structures and usage patterns can hinder the direct application of methods primarily designed for English. Among such languages, we focus on Korean, which has relatively flexible spacing rules, a rich morphological system, and less frequent comma usage compared to English. We introduce KatFish, the first benchmark dataset for detecting LLM-generated Korean text. The dataset consists of text written by humans and generated by four LLMs across three genres. By examining spacing patterns, part-of-speech diversity, and comma usage, we illuminate the linguistic differences between human-written and LLM-generated Korean text. Building on these observations, we propose KatFishNet, a detection method specifically designed for the Korean language. KatFishNet achieves an average of 19.78% higher AUROC compared to the best-performing existing detection method. Our code and data are available at https://github.com/Shinwoo-Park/detecting_llm_generated_korean_text_through_linguistic_analysis.


Algorithm for Semantic Network Generation from Texts of Low Resource Languages Such as Kiswahili

Wanjawa, Barack Wamkaya, Muchemi, Lawrence, Miriti, Evans

arXiv.org Artificial Intelligence

Box 30197 Nairobi 00100, Kenya eamiriti@uonbi.ac.ke Abstract Processing low-resource languages, such as Kiswahili, using machine learning is difficult due to lack of adequate training data. However, such low-resource languages are still important for human communication and are already in daily use and users need practical machine processing tasks such as summarization, disambiguation and even question answering (QA). One method of processing such languages, while bypassing the need for training data, is the use semantic networks. Some low resource languages, such as Kiswahili, are of the subject-verb-object (SVO) structure, and similarly semantic networks are a triple of subject-predicate-object, hence SVO parts of speech tags can map into a semantic network triple. An algorithm to process raw natural language text and map it into a semantic network is therefore necessary and desirable in structuring low resource languages texts. This algorithm tested on the Kiswahili QA task with upto 78.6% exact match. Highlights Languages, both low and high-resource are important for communication. Low resource languages lack vast data repositories necessary for machine learning. Use of language part of speech tags can create meaning from the language. An algorithm can create semantic networks out of the language parts of speech. The semantic network of the language can do practical tasks such as QA.


Reviews: Gradient Descent Can Take Exponential Time to Escape Saddle Points

Neural Information Processing Systems

It has recently been shown that, when all the saddle points of a non-convex function are "strict saddle", then gradient descent with a (reasonable) random initialization converges to a local minimizer with probability one. For a randomly perturbed version of gradient descent, the convergence rate can additionally be shown to be polynomial in the parameters. This article proves that such a convergence rate does not hold for the non-perturbed version: there exists reasonably smooth functions with only strict saddle points, and natural initialization schemes such that gradient descent requires a number of steps that is exponential in the dimension to find a local minimum. I liked this article very much. It answers a very natural question: gradient descent is an extremely classical, and very simple algorithm.


Reviews: Improving Online Algorithms via ML Predictions

Neural Information Processing Systems

Please provide some additional clarification on what is to be compared here – in particular, I also stumbled over why (as stated in several corresponding theorems) one ends up with competitive ratios „at most min{ robustness-ratio, consistency-ratio }". At first glance, I had thought that if an algorithm does well for good predictions but the robustness bound is bad, that the latter should dominate. As this is apparently not the case, I ask for a brief explanation/clarification, preferably already in the introduction, as to why the minimum of the two values yields a bound on the competitive ratio.


Large Language Models are Pattern Matchers: Editing Semi-Structured and Structured Documents with ChatGPT

Weber, Irene

arXiv.org Artificial Intelligence

Large Language Models (LLMs) are extensive artificial neural networks trained on vast amounts of textual data to generate coherent continuations of given prompts. The initial training, which is time-consuming and computationally intensive, is typically followed by additional training phases. Fine-tuning with specific tasks and example responses enables LLMs to solve particular types of problems, while Reinforcement Learning with Human Feedback focuses them on delivering high-quality and socially preferred responses. Research has shown that LLMs can not only produce correct natural and formal language texts conveying plausible contents, but are also capable of reasoning, planning, and simulating other forms of intelligent behaviors. Thus, LLMs offer a wide range of potential applications, the extent of which is still not fully explored. Frequently, LLMs are applied for creating and processing texts, for communicating, planning, and computer programming. LLMs require that all tasks and inputs are provided in a textual format. For many applications, LLMs are prompted with freely phrased, natural language text or program code. Yet, they are also capable of processing texts that are structured such that they represent data or formatted documents.


A light-weight and efficient punctuation and word casing prediction model for on-device streaming ASR

You, Jian, Li, Xiangfeng

arXiv.org Artificial Intelligence

Punctuation and word casing prediction are necessary for automatic speech recognition (ASR). With the popularity of on-device end-to-end streaming ASR systems, the on-device punctuation and word casing prediction become a necessity while we found little discussion on this. With the emergence of Transformer, Transformer based models have been explored for this scenario. However, Transformer based models are too large for on-device ASR systems. In this paper, we propose a light-weight and efficient model that jointly predicts punctuation and word casing in real time. The model is based on Convolutional Neural Network (CNN) and Bidirectional Long Short-Term Memory (BiLSTM). Experimental results on the IWSLT2011 test set show that the proposed model obtains 9% relative improvement compared to the best of non-Transformer models on overall F1-score. Compared to the representative of Transformer based models, the proposed model achieves comparable results to the representative model while being only one-fortieth its size and 2.5 times faster in terms of inference time. It is suitable for on-device streaming ASR systems. Our code is publicly available.


Incremental Comprehension of Garden-Path Sentences by Large Language Models: Semantic Interpretation, Syntactic Re-Analysis, and Attention

Li, Andrew, Feng, Xianle, Narang, Siddhant, Peng, Austin, Cai, Tianle, Shah, Raj Sanjay, Varma, Sashank

arXiv.org Artificial Intelligence

When reading temporarily ambiguous garden-path sentences, misinterpretations sometimes linger past the point of disambiguation. This phenomenon has traditionally been studied in psycholinguistic experiments using online measures such as reading times and offline measures such as comprehension questions. Here, we investigate the processing of garden-path sentences and the fate of lingering misinterpretations using four large language models (LLMs): GPT-2, LLaMA-2, Flan-T5, and RoBERTa. The overall goal is to evaluate whether humans and LLMs are aligned in their processing of garden-path sentences and in the lingering misinterpretations past the point of disambiguation, especially when extra-syntactic information (e.g., a comma delimiting a clause boundary) is present to guide processing. We address this goal using 24 garden-path sentences that have optional transitive and reflexive verbs leading to temporary ambiguities. For each sentence, there are a pair of comprehension questions corresponding to the misinterpretation and the correct interpretation. In three experiments, we (1) measure the dynamic semantic interpretations of LLMs using the question-answering task; (2) track whether these models shift their implicit parse tree at the point of disambiguation (or by the end of the sentence); and (3) visualize the model components that attend to disambiguating information when processing the question probes. These experiments show promising alignment between humans and LLMs in the processing of garden-path sentences, especially when extra-syntactic information is available to guide processing.