Goto

Collaborating Authors

 Media


Enhancing Live Broadcast Engagement: A Multi-modal Approach to Short Video Recommendations Using MMGCN and User Preferences

arXiv.org Artificial Intelligence

The purpose of this paper is to explore a multi-modal approach to enhancing live broadcast engagement by developing a short video recommendation system that incorporates Multi-modal Graph Convolutional Networks (MMGCN) with user preferences. To provide personalized recommendations tailored to individual interests, the proposed system considers user interaction data, video content features, and contextual information. With the aid of a hybrid approach combining collaborative filtering and content-based filtering techniques, the system can capture nuanced relationships between users, video attributes, and engagement patterns. Three datasets are used to evaluate the effectiveness of the system: Kwai, TikTok, and MovieLens. Compared to baseline models, such as DeepFM, Wide & Deep, LightGBM, and XGBoost, the proposed MMGCN-based model shows superior performance. A notable feature of the proposed model is that it outperforms all baseline methods in capturing diverse user preferences and making accurate, personalized recommendations, resulting in a Kwai F1 score of 0.574, a Tiktok F1 score of 0.506, and a MovieLens F1 score of 0.197. We emphasize the importance of multi-modal integration and user-centric approaches in advancing recommender systems, emphasizing the role they play in enhancing content discovery and audience interaction on live broadcast platforms.


Discrete Audio Tokens: More Than a Survey!

arXiv.org Artificial Intelligence

Discrete audio tokens are compact representations that aim to preserve perceptual quality, phonetic content, and speaker characteristics while enabling efficient storage and inference, as well as competitive performance across diverse downstream tasks. They provide a practical alternative to continuous features, enabling the integration of speech and audio into modern large language models (LLMs). As interest in token-based audio processing grows, various tokenization methods have emerged, and several surveys have reviewed the latest progress in the field. However, existing studies often focus on specific domains or tasks and lack a unified comparison across various benchmarks. This paper presents a systematic review and benchmark of discrete audio tokenizers, covering three domains: speech, music, and general audio. We propose a taxonomy of tokenization approaches based on encoder-decoder, quantization techniques, training paradigm, streamability, and application domains. We evaluate tokenizers on multiple benchmarks for reconstruction, downstream performance, and acoustic language modeling, and analyze trade-offs through controlled ablation studies. Our findings highlight key limitations, practical considerations, and open challenges, providing insight and guidance for future research in this rapidly evolving area.


AdversariaL attacK sAfety aLIgnment(ALKALI): Safeguarding LLMs through GRACE: Geometric Representation-Aware Contrastive Enhancement- Introducing Adversarial Vulnerability Quality Index (AVQI)

arXiv.org Artificial Intelligence

Adversarial threats against LLMs are escalating faster than current defenses can adapt. We expose a critical geometric blind spot in alignment: adversarial prompts exploit latent camouflage, embedding perilously close to the safe representation manifold while encoding unsafe intent thereby evading surface level defenses like Direct Preference Optimization (DPO), which remain blind to the latent geometry. We introduce ALKALI, the first rigorously curated adversarial benchmark and the most comprehensive to date spanning 9,000 prompts across three macro categories, six subtypes, and fifteen attack families. Evaluation of 21 leading LLMs reveals alarmingly high Attack Success Rates (ASRs) across both open and closed source models, exposing an underlying vulnerability we term latent camouflage, a structural blind spot where adversarial completions mimic the latent geometry of safe ones. To mitigate this vulnerability, we introduce GRACE - Geometric Representation Aware Contrastive Enhancement, an alignment framework coupling preference learning with latent space regularization. GRACE enforces two constraints: latent separation between safe and adversarial completions, and adversarial cohesion among unsafe and jailbreak behaviors. These operate over layerwise pooled embeddings guided by a learned attention profile, reshaping internal geometry without modifying the base model, and achieve up to 39% ASR reduction. Moreover, we introduce AVQI, a geometry aware metric that quantifies latent alignment failure via cluster separation and compactness. AVQI reveals when unsafe completions mimic the geometry of safe ones, offering a principled lens into how models internally encode safety. We make the code publicly available at https://anonymous.4open.science/r/alkali-B416/README.md.


A Practical Synthesis of Detecting AI-Generated Textual, Visual, and Audio Content

arXiv.org Artificial Intelligence

Advances in AI-generated content have led to wide adoption of large language models, diffusion-based visual generators, and synthetic audio tools. However, these developments raise critical concerns about misinformation, copyright infringement, security threats, and the erosion of public trust. In this paper, we explore an extensive range of methods designed to detect and mitigate AI-generated textual, visual, and audio content. We begin by discussing motivations and potential impacts associated with AI-based content generation, including real-world risks and ethical dilemmas. We then outline detection techniques spanning observation-based strategies, linguistic and statistical analysis, model-based pipelines, watermarking and fingerprinting, as well as emergent ensemble approaches. We also present new perspectives on robustness, adaptation to rapidly improving generative architectures, and the critical role of human-in-the-loop verification. By surveying state-of-the-art research and highlighting case studies in academic, journalistic, legal, and industrial contexts, this paper aims to inform robust solutions and policymaking. We conclude by discussing open challenges, including adversarial transformations, domain generalization, and ethical concerns, thereby offering a holistic guide for researchers, practitioners, and regulators to preserve content authenticity in the face of increasingly sophisticated AI-generated media.


Beyond the Hook: Predicting Billboard Hot 100 Chart Inclusion with Machine Learning from Streaming, Audio Signals, and Perceptual Features

arXiv.org Artificial Intelligence

The advent of digital streaming platforms have recently revolutionized the landscape of music industry, with the ensuing digitalization providing structured data collections that open new research avenues for investigating popularity dynamics and mainstream success. The present work explored which determinants hold the strongest predictive influence for a track's inclusion in the Billboard Hot 100 charts, including streaming popularity, measurable audio signal attributes, and probabilistic indicators of human listening. The analysis revealed that popularity was by far the most decisive predictor of Billboard Hot 100 inclusion, with considerable contribution from instrumentalness, valence, duration and speechiness. Logistic Regression achieved 90.0% accuracy, with very high recall for charting singles (0.986) but lower recall for non-charting ones (0.813), yielding balanced F1-scores around 0.90. Random Forest slightly improved performance to 90.4% accuracy, maintaining near-perfect precision for non-charting singles (0.990) and high recall for charting ones (0.992), with F1-scores up to 0.91. Gradient Boosting (XGBoost) reached 90.3% accuracy, delivering a more balanced trade-off by improving recall for non-charting singles (0.837) while sustaining high recall for charting ones (0.969), resulting in F1-scores comparable to the other models.



Saudi plans for video game hub grow with 55 billion EA deal

The Japan Times

The Esports World Cup 2025 at Boulevard City Arena in Riyadh on Aug. 2. Saudi Arabia is focusing on gaming as part of a national strategy to create tens of thousands of new jobs and diversify the kingdom's economy away from oil. Saudi Arabia is accelerating plans to transform itself into a hub for gamers with its blockbuster deal to take Electronic Arts private. In addition to an existing $5 billion equity stake it is rolling over into the new entity, the kingdom's Public Investment Fund is providing more fresh capital than partners Silver Lake Management and Jared Kushner's Affinity Partners to buy out the other public investors, according to people familiar with the matter. That's made it the largest contributor to the $36 billion in equity being put in to finance the deal, the people said, asking not to be identified discussing non-public information. In a time of both misinformation and too much information, quality journalism is more crucial than ever.


15 gorgeous images from the 2025 Bird Photographer of the Year awards

Popular Science

A soaring solar eclipse, a bloodied petrel, and a resilient raven. I photographed this group of king penguins emerging from the ocean on a cloudy summer morning. I laid flat on the shore to capture both the dramatic sky and the reflections in the wet sand. When one of the penguins started trumpeting and pointing its head toward the clouds, an already nice scene turned special and memorable. Breakthroughs, discoveries, and DIY tips sent every weekday.


Mysterious black pyramid UFO seen flying over US in broad daylight

Daily Mail - Science & tech

Taylor, your album should be'Life of a Callgirl'. KENNEDY's appalled take on Swift's new record... and its ultra-vivid sex shout outs for Travis the Sasquatch The truth about Keith Urban's guitarist'other woman' Maggie Baugh revealed amid Nicole Kidman divorce How I look like this at 62. I've lost 5 stone fast, 20 years off my biological age and wear size 8... without weight-loss jabs. Hollywood A-listers pay me $50,000 to cure their drug addicted nepo-babies because they can't afford for these secrets to go public Shroud of Turin mystery deepens as surgeon spots hidden detail that points to Jesus' resurrection I'm no longer sleeping with my husband - and never will again, says MOLLY RYDDELL. I love him, but counted down the moments until he climaxed. Then I couldn't bear it any more and the truth spilled out... so many women feel the same Fans erupt at Taylor Swift's'dig' at Travis Kelce's ex Kayla Nicole in wild The Life of a Showgirl track Trump dollar coin design released by Treasury... and it's inspired by an iconic political photo Lori Loughlin's husband Mossimo Giannulli seen with mystery brunette in tiny skirt day after shock split Trump appears alongside Melania at dinner hosted by JD Vance and Usha after'disappearance' rumors Top plastic surgeons reveal secrets behind Taylor Swift's'changing' face: 'It is looking very full' I'm a woman with autism... here are the signs you might be masking, even from yourself Cake-faced 90s sitcom star looks unrecognizable as she ditches the heavy eyeshadow for an LA errand run can you guess who?


UNC professor placed on leave after far-left Redneck Revolt gun club membership exposed

FOX News

The University of North Carolina has placed Asian and Middle Eastern Studies professor Dwayne Dixon on leave after his ties to the far-left gun club Redneck Revolt were exposed.