Media
Cat and Mouse -- Can Fake Text Generation Outpace Detector Systems?
McGlinchey, Andrea, Barclay, Peter J
Large language models (LLMs) can produce convincing'fake text' in domains such as academic writing, product reviews, and political news. Many approaches have been investigated for the detection of artificially generated text. While this may seem to presage an endless'arms race', we note that newer LLMs use ever more parameters, training data, and energy, while relatively simple classifiers demonstrate a good level of detection accuracy with modest resources. To approach the question of whether the models ability to beat the detectors may therefore reach a plateau, we examine the ability of statistical classifiers to identify'fake text' in the style of classical detective fiction. Over a 0.5 version increase, we found that Gemini showed an increased ability to generate deceptive text, while GPT did not. This suggests that reliable detection of fake text may remain feasible even for ever-larger models, though new model architectures may improve their deceptiveness.
Aligning Spoken Dialogue Models from User Interactions
Wu, Anne, Mazaré, Laurent, Zeghidour, Neil, Défossez, Alexandre
We propose a novel preference alignment framework for improving spoken dialogue models on real-time conversations from user interactions. Current preference learning methods primarily focus on text-based language models, and are not directly suited to the complexities of real-time speech interactions, with richer dynamics (e.g. interruption, interjection) and no explicit segmentation between speaker turns.We create a large-scale dataset of more than 150,000 preference pairs from raw multi-turn speech conversations, annotated with AI feedback, to cover preferences over both linguistic content and temporal context variations. We leverage offline alignment methods to finetune a full-duplex autoregressive speech-to-speech model. Extensive experiments demonstrate that feedback on generic conversations can be consistently effective in improving spoken dialogue models to produce more factual, safer and more contextually aligned interactions. We deploy the finetuned model and conduct holistic human evaluations to assess the impact beyond single-turn conversations. Our findings shed light on the importance of a well-calibrated balance among various dynamics, crucial for natural real-time speech dialogue systems.
skLEP: A Slovak General Language Understanding Benchmark
Šuppa, Marek, Ridzik, Andrej, Hládek, Daniel, Javůrek, Tomáš, Ondrejová, Viktória, Sásiková, Kristína, Tamajka, Martin, Šimko, Marián
In this work, we introduce skLEP, the first comprehensive benchmark specifically designed for evaluating Slovak natural language understanding (NLU) models. We have compiled skLEP to encompass nine diverse tasks that span token-level, sentence-pair, and document-level challenges, thereby offering a thorough assessment of model capabilities. To create this benchmark, we curated new, original datasets tailored for Slovak and meticulously translated established English NLU resources. Within this paper, we also present the first systematic and extensive evaluation of a wide array of Slovak-specific, multilingual, and English pre-trained language models using the skLEP tasks. Finally, we also release the complete benchmark data, an open-source toolkit facilitating both fine-tuning and evaluation of models, and a public leaderboard at https://github.com/slovak-nlp/sklep in the hopes of fostering reproducibility and drive future research in Slovak NLU.
A Hierarchical Deep Learning Approach for Minority Instrument Detection
Sechet, Dylan, Bugiotti, Francesca, Kowalski, Matthieu, d'Hérouville, Edouard, Langiewicz, Filip
Identifying instrument activities within audio excerpts is vital in music information retrieval, with significant implications for music cataloging and discovery. Prior deep learning endeavors in musical instrument recognition have predominantly emphasized instrument classes with ample data availability. Recent studies have demonstrated the applicability of hierarchical classification in detecting instrument activities in orchestral music, even with limited fine-grained annotations at the instrument level. Based on the Hornbostel-Sachs classification, such a hierarchical classification system is evaluated using the MedleyDB dataset, renowned for its diversity and richness concerning various instruments and music genres. This work presents various strategies to integrate hierarchical structures into models and tests a new class of models for hierarchical music prediction. This study showcases more reliable coarse-level instrument detection by bridging the gap between detailed instrument identification and group-level recognition, paving the way for further advancements in this domain.
Enhancing Homophily-Heterophily Separation: Relation-Aware Learning in Heterogeneous Graphs
Zheng, Ziyu, Yang, Yaming, Guan, Ziyu, Zhao, Wei, Lu, Weigang
Real-world networks usually have a property of node heterophily, that is, the connected nodes usually have different features or different labels. This heterophily issue has been extensively studied in homogeneous graphs but remains under-explored in heterogeneous graphs, where there are multiple types of nodes and edges. Capturing node heterophily in heterogeneous graphs is very challenging since both node/edge heterogeneity and node heterophily should be carefully taken into consideration. Existing methods typically convert heterogeneous graphs into homogeneous ones to learn node heterophily, which will inevitably lose the potential heterophily conveyed by heterogeneous relations. To bridge this gap, we propose Relation-Aware Separation of Homophily and Heterophily (RASH), a novel contrastive learning framework that explicitly models high-order semantics of heterogeneous interactions and adaptively separates homophilic and heterophilic patterns. Particularly, RASH introduces dual heterogeneous hypergraphs to encode multi-relational bipartite subgraphs and dynamically constructs homophilic graphs and heterophilic graphs based on relation importance. A multi-relation contrastive loss is designed to align heterogeneous and homophilic/heterophilic views by maximizing mutual information. In this way, RASH simultaneously resolves the challenges of heterogeneity and heterophily in heterogeneous graphs. Extensive experiments on benchmark datasets demonstrate the effectiveness of RASH across various downstream tasks. The code is available at: https://github.com/zhengziyu77/RASH.
"M3GAN 2.0" Is a Victim of Inflation
At least it shows its symptoms clearly: inflammation and swelling. In the first film, Gemma (Allison Williams), a robotics engineer, becomes the guardian to her orphaned niece, Cady (Violet McGraw), and tests a new invention, the titular A.I.-powered robot-doll, on her. Cady grows attached to the responsive doll, which is programmed to protect the child and takes to the mission with a mechanical perfection, slaughtering anyone who expresses hostility--and does so with snarky pride in her absolute power. At its core, though, "M3GAN" (like the sequel, directed by Gerard Johnstone) is a family melodrama centered on Gemma's struggles with parenting and Cady's need to bond--plus the robot's quick embrace of human cruelty. The film's failures are painful because its setup is fruitful.
Google's AI video tool amplifies fears of an increase in misinformation
In both Tehran and Tel Aviv, residents have faced heightened anxiety in recent days as the threat of missile strikes looms over their communities. Alongside the very real concerns for physical safety, there is growing alarm over the role of misinformation, particularly content generated by artificial intelligence, in shaping public perception. GeoConfirmed, an online verification platform, has reported an increase in AI-generated misinformation, including fabricated videos of air strikes that never occurred, both in Iran and Israel. This follows a similar wave of manipulated footage that circulated during recent protests in Los Angeles, which were sparked by a rise in immigration raids in the second-most populous city in the United States. The developments are part of a broader trend of politically charged events being exploited to spread false or misleading narratives.
Meta wins AI copyright lawsuit as US judge rules against authors
However, the ruling offered some hope for American creative professionals who argue that training AI models on their work without permission is illegal. "It stands only for the proposition that these plaintiffs made the wrong arguments and failed to develop a record in support of the right one." A Meta spokesperson said the company appreciated the decision and called fair use a "vital legal framework" for building "transformative" AI technology. The authors sued Meta in 2023, arguing the company misused pirated versions of their books to train its AI system Llama without permission or compensation. Get set for the working day – we'll point you to all the business news and analysis you need every morning Chhabria expressed sympathy for that argument during a hearing in May, which he reiterated on Wednesday.
SLEEPING-DISCO 9M: A large-scale pre-training dataset for generative music modeling
Ahmed, Tawsif, Radonjic, Andrej, Rabby, Gollam
We present Sleeping-DISCO 9M, a large-scale pre-training dataset for music and song. To the best of our knowledge, there are no open-source high-quality dataset representing popular and well-known songs for generative music modeling tasks such as text-music, music-captioning, singing-voice synthesis, melody reconstruction and cross-model retrieval. Past contributions focused on isolated and constrained factors whose core perspective was to create synthetic or re-recorded music corpus (e.g. GTSinger, M4Singer) and arbitrarily large-scale audio datasets (e.g. DISCO-10M and LAIONDISCO-12M) had been another focus for the community. Unfortunately, adoption of these datasets has been below substantial in the generative music community as these datasets fail to reflect real-world music and its flavour. Our dataset changes this narrative and provides a dataset that is constructed using actual popular music and world-renowned artists.
Off-Policy Evaluation and Learning for the Future under Non-Stationarity
Shimizu, Tatsuhiro, Kawamura, Kazuki, Muroi, Takanori, Narita, Yusuke, Tateno, Kei, Udagawa, Takuma, Saito, Yuta
We study the novel problem of future off-policy evaluation (F-OPE) and learning (F-OPL) for estimating and optimizing the future value of policies in non-stationary environments, where distributions vary over time. In e-commerce recommendations, for instance, our goal is often to estimate and optimize the policy value for the upcoming month using data collected by an old policy in the previous month. A critical challenge is that data related to the future environment is not observed in the historical data. Existing methods assume stationarity or depend on restrictive reward-modeling assumptions, leading to significant bias. To address these limitations, we propose a novel estimator named \textit{\textbf{O}ff-\textbf{P}olicy Estimator for the \textbf{F}uture \textbf{V}alue (\textbf{\textit{OPFV}})}, designed for accurately estimating policy values at any future time point. The key feature of OPFV is its ability to leverage the useful structure within time-series data. While future data might not be present in the historical log, we can leverage, for example, seasonal, weekly, or holiday effects that are consistent in both the historical and future data. Our estimator is the first to exploit these time-related structures via a new type of importance weighting, enabling effective F-OPE. Theoretical analysis identifies the conditions under which OPFV becomes low-bias. In addition, we extend our estimator to develop a new policy-gradient method to proactively learn a good future policy using only historical data. Empirical results show that our methods substantially outperform existing methods in estimating and optimizing the future policy value under non-stationarity for various experimental setups.