homophone
A new kid on the block: Distributional semantics predicts the word-specific tone signatures of monosyllabic words in conversational Taiwan Mandarin
Jin, Xiaoyun, Ernestus, Mirjam, Baayen, R. Harald
We present a corpus-based investigation of how the pitch contours of monosyllabic words are realized in spontaneous conversational Mandarin, focusing on the effects of words' meanings. We used the generalized additive model to decompose a given observed pitch contour into a set of component pitch contours that are tied to different control variables and semantic predictors. Even when variables such as word duration, gender, speaker identity, tonal context, vowel height, and utterance position are controlled for, the effect of word remains a strong predictor of tonal realization. We present evidence that this effect of word is a semantic effect: word sense is shown to be a better predictor than word, and heterographic homophones are shown to have different pitch contours. The strongest evidence for the importance of semantics is that the pitch contours of individual word tokens can be predicted from their contextualized embeddings with an accuracy that substantially exceeds a permutation baseline. For phonetics, distributional semantics is a new kid on the block. Although our findings challenge standard theories of Mandarin tone, they fit well within the theoretical framework of the Discriminative Lexicon Model.
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (17 more...)
- Health & Medicine (0.67)
- Education (0.67)
- Leisure & Entertainment (0.46)
LLMs as Method Actors: A Model for Prompt Engineering and Architecture
We introduce "Method Actors" as a mental model for guiding LLM prompt engineering and prompt architecture. Under this mental model, LLMs should be thought of as actors; prompts as scripts and cues; and LLM responses as performances. We apply this mental model to the task of improving LLM performance at playing Connections, a New York Times word puzzle game that prior research identified as a challenging benchmark for evaluating LLM reasoning. Our experiments with GPT-4o show that a "Method Actors" approach can significantly improve LLM performance over both a vanilla and "Chain of Thoughts" approach. A vanilla approach solves 27% of Connections puzzles in our dataset and a "Chain of Thoughts" approach solves 41% of puzzles, whereas our strongest "Method Actor" approach solves 86% of puzzles. We also test OpenAI's newest model designed specifically for complex reasoning tasks, o1-preview. When asked to solve a puzzle all at once, o1-preview solves 79% of Connections puzzles in our dataset, and when allowed to build puzzle solutions one guess at a time over multiple API calls, o1-preview solves 100% of the puzzles. Incorporating a "Method Actor" prompt architecture increases the percentage of puzzles that o1-preview solves perfectly from 76% to 87%.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > New York (0.04)
- Europe > France (0.04)
- (2 more...)
- Research Report (1.00)
- Workflow (0.71)
- Leisure & Entertainment > Games (1.00)
- Media (0.67)
A Survey on Importance of Homophones Spelling Correction Model for Khmer Authors
Born, Seanghort, May, Madeth, Piau-Toffolon, Claudine, Iksal, Sébastien
Homophones present a significant challenge to authors in any languages due to their similarities of pronunciations but different meanings and spellings. This issue is particularly pronounced in the Khmer language, rich in homophones due to its complex structure and extensive character set. This research aims to address the difficulties faced by Khmer authors when using homophones in their writing and proposes potential solutions based on an extensive literature review and survey analysis. A survey of 108 Khmer native speakers, including students, employees, and professionals, revealed that many frequently encounter challenges with homophones in their writing, often struggling to choose the correct word based on context. The survey also highlighted the absence of effective tools to address homophone errors in Khmer, which complicates the writing process. Additionally, a review of existing studies on spelling correction in other languages, such as English, Azerbaijani, and Bangla, identified a lack of research focused specifically on homophones, particularly in the Khmer language. In summary, this research highlights the necessity for a specialized tool to address Khmer homophone errors. By bridging current gaps in research and available resources, such a tool would enhance the confidence and accuracy of Khmer authors in their writing, thereby contributing to the enrichment and preservation of the language. Continued efforts in this domain are essential for ensuring that Khmer can leverage advancements in technology and linguistics effectively.
- Asia > Cambodia (0.05)
- North America > United States > Missouri (0.04)
- North America > Canada > British Columbia (0.04)
- (4 more...)
- Research Report (1.00)
- Questionnaire & Opinion Survey (1.00)
- Overview (0.69)
ToxiCloakCN: Evaluating Robustness of Offensive Language Detection in Chinese with Cloaking Perturbations
Xiao, Yunze, Hu, Yujia, Choo, Kenny Tsu Wei, Lee, Roy Ka-wei
Detecting hate speech and offensive language is essential for maintaining a safe and respectful digital environment. This study examines the limitations of state-of-the-art large language models (LLMs) in identifying offensive content within systematically perturbed data, with a focus on Chinese, a language particularly susceptible to such perturbations. We introduce \textsf{ToxiCloakCN}, an enhanced dataset derived from ToxiCN, augmented with homophonic substitutions and emoji transformations, to test the robustness of LLMs against these cloaking perturbations. Our findings reveal that existing models significantly underperform in detecting offensive content when these perturbations are applied. We provide an in-depth analysis of how different types of offensive content are affected by these perturbations and explore the alignment between human and model explanations of offensiveness. Our work highlights the urgent need for more advanced techniques in offensive language detection to combat the evolving tactics used to evade detection mechanisms.
- North America > Canada > Ontario > Toronto (0.04)
- Asia > Taiwan (0.04)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- (7 more...)
Memory-assisted prompt editing to improve GPT-3 after deployment
Madaan, Aman, Tandon, Niket, Clark, Peter, Yang, Yiming
Large LMs such as GPT-3 are powerful, but can commit mistakes that are obvious to humans. For example, GPT-3 would mistakenly interpret "What word is similar to good?" to mean a homophone, while the user intended a synonym. Our goal is to effectively correct such errors via user interactions with the system but without retraining, which will be prohibitively costly. We pair GPT-3 with a growing memory of recorded cases where the model misunderstood the user's intents, along with user feedback for clarification. Such a memory allows our system to produce enhanced prompts for any new query based on the user feedback for error correction on similar cases in the past. On four tasks (two lexical tasks, two advanced ethical reasoning tasks), we show how a (simulated) user can interactively teach a deployed GPT-3, substantially increasing its accuracy over the queries with different kinds of misunderstandings by the GPT-3. Our approach is a step towards the low-cost utility enhancement for very large pre-trained LMs. Code, data, and instructions to implement MEMPROMPT for a new task at https://www.memprompt.com/.
- North America > United States > Washington > King County > Seattle (0.14)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- (15 more...)
- Leisure & Entertainment > Sports (0.68)
- Education > Health & Safety (0.46)
Evaluation of Automated Speech Recognition Systems for Conversational Speech: A Linguistic Perspective
Pasandi, Hannaneh B., Pasandi, Haniyeh B.
Automatic speech recognition (ASR) meets more informal and free-form input data as voice user interfaces and conversational agents such as the voice assistants such as Alexa, Google Home, etc., gain popularity. Conversational speech is both the most difficult and environmentally relevant sort of data for speech recognition. In this paper, we take a linguistic perspective, and take the French language as a case study toward disambiguation of the French homophones. Our contribution aims to provide more insight into human speech transcription accuracy in conditions to reproduce those of state-of-the-art ASR systems, although in a much focused situation. We investigate a case study involving the most common errors encountered in the automatic transcription of French language.
- North America > United States > New York > New York County > New York City (0.04)
- Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)
- Africa > Middle East > Morocco (0.04)
Is the Chinese Language a Superstition Machine? - Issue 59: Connections
Every year, more than a billion people around the world celebrate Chinese New Year and engage in a subtle linguistic dance with luck. You can think of it as a set of holiday rituals that resemble a courtship. To lure good fortune into their lives, they may decorate their homes and doors with paper cutouts of lucky words or phrases. Those who need a haircut make sure to get one before the New Year, as the word for "hair" (fa) sounds like the word for "prosperity"--and who wants to snip away prosperity, even if it's just a trim? The menu of food served at festive meals often includes fish, because its name (yu) sounds the same as the word for "surplus"; a type of algae known as fat choy because in Cantonese it sounds like "get rich"; and oranges, because in certain regions their name sounds like the word for "luck."
Is the Chinese Language a Superstition Machine? - Issue 44: Luck
Every year, more than a billion people around the world celebrate Chinese New Year and engage in a subtle linguistic dance with luck. You can think of it as a set of holiday rituals that resemble a courtship. To lure good fortune into their lives, they may decorate their homes and doors with paper cutouts of lucky words or phrases. Those who need a haircut make sure to get one before the New Year, as the word for "hair" (fa) sounds like the word for "prosperity"--and who wants to snip away prosperity, even if it's just a trim? The menu of food served at festive meals often includes fish, because its name (yu) sounds the same as the word for "surplus"; a type of algae known as fat choy because in Cantonese it sounds like "get rich"; and oranges, because in certain regions their name sounds like the word for "luck."
- North America > Canada > Alberta > Census Division No. 6 > Calgary Metropolitan Region > Calgary (0.04)
- Europe > United Kingdom (0.04)