Media
DAVID MARCUS: Cracker Barrel abandons customers, trading authenticity for corporate slop
People in Pensacola, Florida shared their thoughts on Cracker Barrel's new logo with Fox News Digital. Few things in American life have felt as trapped in the amber of history as Cracker Barrel restaurants, with their recipe of comfort food served up in cozy confines that evoke a bygone era. It's little wonder Americans routinely wait for an hour to get a table after church, or welcome a road-trip diversion when they see the classic logo on a highway sign. Now, the cracker-jack whiz-kid marketing team at the iconic eatery's corprate headquarters has decided to forgo all of this, including possibly, based on public reaction to their changes, the long lines. CRACKER BARREL UNVEILS NEW SIMPLIFIED LOGO: 'OUR STORY HASN'T CHANGED' This may not exactly be wokeness at work, as we have seen with so many brands such as Target and Bud Light, but it is something similarly lifeless and cold.
TCSinger 2: Customizable Multilingual Zero-shot Singing Voice Synthesis
Zhang, Yu, Guo, Wenxiang, Pan, Changhao, Yao, Dongyu, Zhu, Zhiyuan, Jiang, Ziyue, Wang, Yuhan, Jin, Tao, Zhao, Zhou
Customizable multilingual zero-shot singing voice synthesis (SVS) has various potential applications in music composition and short video dubbing. However, existing SVS models overly depend on phoneme and note boundary annotations, limiting their robustness in zero-shot scenarios and producing poor transitions between phonemes and notes. Moreover, they also lack effective multi-level style control via diverse prompts. To overcome these challenges, we introduce TCSinger 2, a multi-task multilingual zero-shot SVS model with style transfer and style control based on various prompts. TCSinger 2 mainly includes three key modules: 1) Blurred Boundary Content (BBC) Encoder, predicts duration, extends content embedding, and applies masking to the boundaries to enable smooth transitions. 2) Custom Audio Encoder, uses contrastive learning to extract aligned representations from singing, speech, and textual prompts. 3) Flow-based Custom Transformer, leverages Cus-MOE, with F0 supervision, enhancing both the synthesis quality and style modeling of the generated singing voice. Experimental results show that TCSinger 2 outperforms baseline models in both subjective and objective metrics across multiple related tasks. Singing voice samples are available at https://aaronz345.github.io/TCSinger2Demo/.
Machine Learning Approaches to Vocal Register Classification in Contemporary Male Pop Music
Kim, Alexander, Botha, Charlotte
For singers of all experience levels, one of the most fun and daunting challenges in learning, technical repertoire is navigating placement and vocal register in and around the passagio (passage between chest voice and head voice registers). Contemporary Pop and Musical Theater solos increasingly demand strong command through and above the first passagio, and the use of various timbre and textures to achieve a desired quality. Thus, it can be difficult to identify what vocal register within the vocal range a singer is using even for advanced vocalists. This paper presents two methods for classifying vocal registers in an audio signal of male pop music through the end-to-end analysis of textural features of mel-spectrogram images. Additionally, we will discuss the practical integration of these models for vocal analysis tools, and introduce a concurrently developed software called AVRA which stands for Automatic Vocal Register Analysis. Our proposed methods achieved consistent classification of vocal register through both Support Vector Machine (SVM) and Convolutional Neural Network (CNN) models, which shows promise for robust classification possibilities across a greater range of voice types and genre.
Cequel: Cost-Effective Querying of Large Language Models for Text Clustering
Wang, Hongtao, Zhang, Taiyan, Yang, Renchi, Xu, Jianliang
Text clustering aims to automatically partition a collection of documents into coherent groups based on their linguistic features. In the literature, this task is formulated either as metric clustering over pre-trained text embeddings or as graph clustering based on pairwise similarities derived from an oracle, e.g., a large machine learning model. Recent advances in large language models (LLMs) have significantly improved this field by providing high-quality contextualized embeddings and accurate semantic similarity estimates. However, leveraging LLMs at scale introduces substantial computational and financial costs due to the large number of required API queries or inference calls. To address this issue, we propose Cequel, a cost-effective framework that achieves accurate text clustering under a limited budget of LLM queries. At its core, Cequel constructs must-link and cannot-link constraints by selectively querying LLMs on informative text pairs or triplets, identified via our proposed algorithms, EdgeLLM and TriangleLLM. These constraints are then utilized in a weighted constrained clustering algorithm to form high-quality clusters. Specifically, EdgeLLM and TriangleLLM employ carefully designed greedy selection strategies and prompting techniques to identify and extract informative constraints efficiently. Experiments on multiple benchmark datasets demonstrate that Cequel consistently outperforms existing methods in unsupervised text clustering under the same query budget.
CUS-QA: Local-Knowledge-Oriented Open-Ended Question Answering Dataset
Libovický, Jindřich, Helcl, Jindřich, Manea, Andrei, Vico, Gianluca
We introduce CUS-QA, a benchmark for open-ended regional question answering that encompasses both textual and visual modalities. We also provide strong baselines using state-of-the-art large language models (LLMs). Our dataset consists of manually curated questions and answers grounded in Wikipedia, created by native speakers from Czechia, Slovakia, and Ukraine, with accompanying English translations. It includes both purely textual questions and those requiring visual understanding. We evaluate state-of-the-art LLMs through prompting and complement this with human judgments of answer correctness. Using these human evaluations, we analyze the reliability of existing automatic evaluation metrics. Our baseline results show that even the best open-weight LLMs achieve only around 50% accuracy on textual questions and below 30% on visual questions. LLM-based evaluation metrics show strong correlation with human judgment, while traditional string-overlap metrics perform surprisingly well due to the prevalence of named entities in answers.
Wired and Business Insider remove articles by AI-generated 'freelancer'
Multiple news organisations have taken down articles written by an alleged freelance journalist that now appear to have been generated by AI. On Thursday, Press Gazette reported that at least six publications, including Wired and Business Insider, have removed articles from their websites in recent months after it was discovered that the stories – written under the name of Margaux Blanchard – were AI-generated. Wired published a story titled "They Fell in Love Playing Minecraft. A few weeks later, the outlet took down the story, stating in an editor's note: "After an additional review of the article … Wired editorial leadership has determined this article does not meet our editorial standards." The story cited a "Jessica Hu", an alleged 34-year-old "ordained officiant based in Chicago" who reportedly "made a name for herself as a'digital celebrant', specialising in ceremonies across Twitch, Discord and VRChat", according to Press Gazette, which reviewed the Wired article. Both the Press Gazette and the Guardian were not able to verify the identity of Hu. Press Gazette further reported that in April, Business Insider published two essays by Blanchard titled: "Remote work has been the best thing for me as a parent but the worst as a person" and "I had my first kid at 45.