Goto

Collaborating Authors

 active voice


Semantic Prosody in Machine Translation: the English-Chinese Case of Passive Structures

arXiv.org Artificial Intelligence

Semantic prosody is a collocational meaning formed through the co-occurrence of a linguistic unit and a consistent series of collocates, which should be treated separately from semantic meaning. Since words that are literal translations of each other may have different semantic prosody, more attention should be paid to this linguistic property to generate accurate translations. However, current machine translation models cannot handle this problem. To bridge the gap, we propose an approach to teach machine translation models about semantic prosody of a specific structure. We focus on Chinese BEI passives and create a dataset of English-Chinese sentence pairs with the purpose of demonstrating the negative semantic prosody of BEI passives. Then we fine-tune OPUS-MT, NLLB-600M and mBART50 models with our dataset for the English-Chinese translation task. Our results show that fine-tuned MT models perform better on using BEI passives for translating unfavourable content and avoid using it for neutral and favourable content. Also, in NLLB-600M, which is a multilingual model, this knowledge of semantic prosody can be transferred from English-Chinese translation to other language pairs, such as Spanish-Chinese.


CardiffNLP at CLEARS-2025: Prompting Large Language Models for Plain Language and Easy-to-Read Text Rewriting

arXiv.org Artificial Intelligence

This paper details the CardiffNLP team's contribution to the CLEARS shared task on Spanish text adaptation, hosted by IberLEF 2025. The shared task contained two subtasks and the team submitted to both. Our team took an LLM-prompting approach with different prompt variations. While we initially experimented with LLaMA-3.2, we adopted Gemma-3 for our final submission, and landed third place in Subtask 1 and second place in Subtask 2. We detail our numerous prompt variations, examples, and experimental results.


AIs Can Think, But They Don't Know What They Are Doing* – Casey Dorman, Author

#artificialintelligence

In Cixin Liu's "The Dark Forest," the second book in the Chinese sci-fi writer's "The Three-Body Problem" series, an alien says that it is puzzled by the fact that humans do not regard "think" and "say" as synonyms. The aliens' thoughts are immediately discernible to each other, so they do not have a need to "say" anything. For them, speaking and thinking are the same. Humans are different because we cannot read each other's thoughts and may choose to not speak about what they are thinking. So, for humans, speaking and thinking are different, but what about words and thoughts?


ChatGPT, Artificial Intelligence, and Cyber Threat Intelligence: a moment in time - Threat Intelligence Academy

#artificialintelligence

It is safe to say that the Chat GPT function from OpenAI has created a firestorm of conversation about the applications of artificial intelligence (AI) in knowledge work and scholarship, which includes cyber threat intelligence. Can ChatGPT really replace the thought and knowledge work done by many people? That question is outstanding and I cannot answer, nor can anyone yet with any certainty. But, it's application to various topics, including cyber threat intelligence, is in question – and by proxy, it's impact on those topics. So, let me provide some perspective after 20 years of cyber threat intelligence AND having employed artificial intelligence and machine learning in this space for the last 10 years at least.


On Neurons Invariant to Sentence Structural Changes in Neural Machine Translation

arXiv.org Artificial Intelligence

We present a methodology that explores how sentence structure is reflected in neural representations of machine translation systems. We demonstrate our model-agnostic approach with the Transformer English-German translation model. We analyze neuron-level correlation of activations between paraphrases while discussing the methodology challenges and the need for confound analysis to isolate the effects of shallow cues. We find that similarity between activation patterns can be mostly accounted for by similarity in word choice and sentence length. Following that, we manipulate neuron activations to control the syntactic form of the output. We show this intervention to be somewhat successful, indicating that deep models capture sentence-structure distinctions, despite finding no such indication at the neuron level. To conduct our experiments, we develop a semi-automatic method to generate meaning-preserving minimal pair paraphrases (active-passive voice and adverbial clause-noun phrase) and compile a corpus of such pairs.


TextGenie - Augmenting your text dataset with just 2 lines of code!

#artificialintelligence

Often while developing Natural Language Processing models, we find it difficult to find relevant data. Previously, while developing our Intent Classifier, we used the CLINC150 Dataset that had 100 samples for 150 different classes. But, what if we needed even more samples? One more similar scenario was when I was working on a contextual assistant with Rasa. While creating the training data from scratch, I'd have to imagine different samples for each intent or ask my friends for some help.


From Note-Level to Chord-Level Neural Network Models for Voice Separation in Symbolic Music

arXiv.org Artificial Intelligence

Music is often experienced as a progression of concurrent streams of notes, or voices. The degree to which this happens depends on the position along a voice-leading continuum, ranging from monophonic, to homophonic, to polyphonic, which complicates the design of automatic voice separation models. We address this continuum by defining voice separation as the task of decomposing music into streams that exhibit both a high degree of external perceptual separation from the other streams and a high degree of internal perceptual consistency. The proposed voice separation task allows for a voice to diverge to multiple voices and also for multiple voices to converge to the same voice. Equipped with this flexible task definition, we manually annotated a corpus of popular music and used it to train neural networks that assign notes to voices either separately for each note in a chord (note-level), or jointly to all notes in a chord (chord-level). The trained neural models greedily assign notes to voices in a left to right traversal of the input chord sequence, using a diverse set of perceptually informed input features. When evaluated on the extraction of consecutive within voice note pairs, both models surpass a strong baseline based on an iterative application of an envelope extraction function, with the chord-level model consistently edging out the note-level model. The two models are also shown to outperform previous approaches on separating the voices in Bach music.