Generative AI
Balancing Privacy and Efficiency: Music Information Retrieval via Additive Homomorphic Encryption
Wang, William Zerong, Zhao, Dongfang
In the era of generative AI, ensuring the privacy of music data presents unique challenges: unlike static artworks such as images, music data is inherently temporal and multimodal, and it is sampled, transformed, and remixed at an unprecedented scale. These characteristics make its core vector embeddings, i.e, the numerical representations of the music, highly susceptible to being learned, misused, or even stolen by models without accessing the original audio files. Traditional methods like copyright licensing and digital watermarking offer limited protection for these abstract mathematical representations, thus necessitating a stronger, e.g., cryptographic, approach to safeguarding the embeddings themselves. Standard encryption schemes, such as AES, render data unintelligible for computation, making such searches impossible. While Fully Homomorphic Encryption (FHE) provides a plausible solution by allowing arbitrary computations on ciphertexts, its substantial performance overhead remains impractical for large-scale vector similarity searches. Given this trade-off, we propose a more practical approach using Additive Homomorphic Encryption (AHE) for vector similarity search. The primary contributions of this paper are threefold: we analyze threat models unique to music information retrieval systems; we provide a theoretical analysis and propose an efficient AHE-based solution through inner products of music embeddings to deliver privacy-preserving similarity search; and finally, we demonstrate the efficiency and practicality of the proposed approach through empirical evaluation and comparison to FHE schemes on real-world MP3 files.
Generative AI for Intent-Driven Network Management in 6G: A Case Study on Hierarchical Learning Approach
Habib, Md Arafat, Elsayed, Medhat, Ozcan, Yigit, Iturria-Rivera, Pedro Enrique, Bavand, Majid, Erol-Kantarci, Melike
The contents of this paper may change at any time without notice. Abstract --With the emergence of 6G, mobile networks are becoming increasingly heterogeneous and dynamic, necessitating advanced automation for efficient management. Intent-Driven Networks (IDNs) address this by translating high-level intents into optimization policies. Large Language Models (LLMs) can enhance this process by understanding complex human instructions to enable adaptive, intelligent automation. Given the rapid advancements in Generative AI (GenAI), a comprehensive survey of LLM-based IDN architectures in disaggregated Radio Access Network (RAN) environments is both timely and critical. This article provides such a survey, along with a case study on a hierarchical learning-enabled IDN architecture that integrates GenAI across three key stages: intent processing, intent validation, and intent execution. Unlike most existing approaches that apply GenAI in the form of LLMs for intent processing only, we propose a hierarchical framework that introduces GenAI across all three stages of IDN. T o demonstrate the effectiveness of the proposed IDN management architecture, we present a case study based on the latest GenAI architecture named Mamba. The case study shows how the proposed GenAI-driven architecture enhances network performance through intelligent automation, surpassing the performance of the conventional IDN architectures. Sixth-Generation (6G) networks are anticipated to support a diverse set of user requirements and have more complex deployments [1].
Local Diffusion Models and Phases of Data Distributions
Hu, Fangjun, Liu, Guangkuo, Zhang, Yifan, Gao, Xun
As a class of generative artificial intelligence frameworks inspired by statistical physics, diffusion models have shown extraordinary performance in synthesizing complicated data distributions through a denoising process gradually guided by score functions. Real-life data, like images, is often spatially structured in low-dimensional spaces. However, ordinary diffusion models ignore this local structure and learn spatially global score functions, which are often computationally expensive. In this work, we introduce a new perspective on the phases of data distributions, which provides insight into constructing local denoisers with reduced computational costs. We define two distributions as belonging to the same data distribution phase if they can be mutually connected via spatially local operations such as local denoisers. Then, we show that the reverse denoising process consists of an early trivial phase and a late data phase, sandwiching a rapid phase transition where local denoisers must fail. To diagnose such phase transitions, we prove an information-theoretic bound on the fidelity of local denoisers based on conditional mutual information, and conduct numerical experiments in a real-world dataset. This work suggests simpler and more efficient architectures of diffusion models: far from the phase transition point, we can use small local neural networks to compute the score function; global neural networks are only necessary around the narrow time interval of phase transitions. This result also opens up new directions for studying phases of data distributions, the broader science of generative artificial intelligence, and guiding the design of neural networks inspired by physics concepts.
Recommendation with Generative Models
Deldjoo, Yashar, He, Zhankui, McAuley, Julian, Korikov, Anton, Sanner, Scott, Ramisa, Arnau, Vidal, Rene, Sathiamoorthy, Maheswaran, Kasrizadeh, Atoosa, Milano, Silvia, Ricci, Francesco
Generative models are a class of AI models capable of creating new instances of data by learning and sampling from their statistical distributions. In recent years, these models have gained prominence in machine learning due to the development of approaches such as generative adversarial networks (GANs), variational autoencoders (VAEs), and transformer-based architectures such as GPT. These models have applications across various domains, such as image generation, text synthesis, and music composition. In recommender systems, generative models, referred to as Gen-RecSys, improve the accuracy and diversity of recommendations by generating structured outputs, text-based interactions, and multimedia content. By leveraging these capabilities, Gen-RecSys can produce more personalized, engaging, and dynamic user experiences, expanding the role of AI in eCommerce, media, and beyond. Our book goes beyond existing literature by offering a comprehensive understanding of generative models and their applications, with a special focus on deep generative models (DGMs) and their classification. We introduce a taxonomy that categorizes DGMs into three types: ID-driven models, large language models (LLMs), and multimodal models. Each category addresses unique technical and architectural advancements within its respective research area. This taxonomy allows researchers to easily navigate developments in Gen-RecSys across domains such as conversational AI and multimodal content generation. Additionally, we examine the impact and potential risks of generative models, emphasizing the importance of robust evaluation frameworks.
OpenAI Scrambles to Update GPT-5 After Users Revolt
OpenAI's GPT-5 model was meant to be a world-changing upgrade to its wildly popular and precocious chatbot. But for some users, last Thursday's release felt more like a wrenching downgrade, with the new ChatGPT presenting a diluted personality and making surprisingly dumb mistakes. On Friday, OpenAI CEO Sam Altman took to X to say the company would keep the previous model, GPT-4o, running for Plus users. A new feature designed to seamlessly switch between models depending on the complexity of the query had broken on Thursday, Altman said, "and the result was GPT-5 seemed way dumber." He promised to implement fixes to improve GPT-5's performance and the overall user experience.
Sam Altman and the whale
But where is the transition from the BlackBerry keyboard to the touch-screen iPhone? Where is the assisted GPS and the API for location services that enables real-time directions and gives rise to companies like Uber and Grindr and lets me order a taxi for my burrito? Where are the real breakthroughs? In fact, following the release of GPT-5, OpenAI found itself with something of a user revolt on its hands. Customers who missed GPT-4o's personality successfully lobbied the company to bring it back as an option for its Plus users.
WIRED Roundup: Unpacking OpenAI's Government Partnership
On today's episode, our host Zoë Schiffer is joined by WIRED's senior politics writer Jake Lahut to run through five of the most important stories we published this week--from how bitcoin miners have been racing this year to beat the tariffs, to how AI was used to find a missing hiker in the Italian Alps. Then, Zoë and Jake discuss the details around OpenAI's latest partnership with the federal government. Mentioned in this episode: OpenAI Announces Massive US Government Partnership by Zoë Schiffer and Will Knight Trumpworld Knows Epstein Is a Problem. But They Can't Solve It by Jake Lahut Charter Planes and Bidding Wars: How Bitcoin Miners Raced to Beat Trump's Tariffs by Joel Khalili Google Will Use AI to Guess People's Ages Based on Search History by Dell Cameron US Coast Guard Report on Titan Submersible Implosion Singles Out OceanGate CEO Stockton Rush by Mark Harris A Hiker Was Missing for Nearly a Year--Until an AI System Recognized His Helmet by Marta Abbà Get tickets to our live show, happening on September 9th, here. Write to us at uncannyvalley@wired.com.
Is the A.I. Boom Turning Into an A.I. Bubble?
When Jensen Huang, the chief executive of the chipmaker Nvidia, met with Donald Trump in the White House last week, he had reason to be cheerful. Most of Nvidia's chips, which are widely used to train generative artificial-intelligence models, are manufactured in Asia. Earlier this year, it pledged to increase production in the United States, and on Wednesday Trump announced that chip companies that promise to build products in the United States would be exempt from some hefty new tariffs on semiconductors that his Administration is preparing to impose. The next day, Nvidia's stock hit a new all-time high, and its market capitalization reached 4.4 trillion, making it the world's most valuable company, ahead of Microsoft, which is also heavily involved in A.I. Welcome to the A.I. boom, or should I say the A.I. bubble? It has been more than a quarter of a century since the bursting of the great dot-com bubble, during which hundreds of unprofitable internet startups issued stock on the Nasdaq, and the share prices of many tech companies rose into the stratosphere.
Large Language Model Data Generation for Enhanced Intent Recognition in German Speech
Rosin, Theresa Pekarek, Kaplan, Burak Can, Wermter, Stefan
Intent recognition (IR) for speech commands is essential for artificial intelligence (AI) assistant systems; however, most existing approaches are limited to short commands and are predominantly developed for English. This paper addresses these limitations by focusing on IR from speech by elderly German speakers. We propose a novel approach that combines an adapted Whisper ASR model, fine-tuned on elderly German speech (SVC-de), with Transformer-based language models trained on synthetic text datasets generated by three well-known large language models (LLMs): LeoLM, Llama3, and ChatGPT. To evaluate the robustness of our approach, we generate synthetic speech with a text-to-speech model and conduct extensive cross-dataset testing. Our results show that synthetic LLM-generated data significantly boosts classification performance and robustness to different speaking styles and unseen vocabulary. Notably, we find that LeoLM, a smaller, domain-specific 13B LLM, surpasses the much larger ChatGPT (175B) in dataset quality for German intent recognition. Our approach demonstrates that generative AI can effectively bridge data gaps in low-resource domains. We provide detailed documentation of our data generation and training process to ensure transparency and reproducibility.
Pragmatics beyond humans: meaning, communication, and LLMs
The paper reconceptualizes pragmatics not as a subordinate, third dimension of meaning, but as a dynamic interface through which language operates as a socially embedded tool for action. With the emergence of large language models (LLMs) in communicative contexts, this understanding needs to be further refined and methodologically reconsidered. The first section challenges the traditional semiotic trichotomy, arguing that connectionist LLM architectures destabilize established hierarchies of meaning, and proposes the Human-Machine Communication (HMC) framework as a more suitable alternative. The second section examines the tension between human-centred pragmatic theories and the machine-centred nature of LLMs. While traditional, Gricean-inspired pragmatics continue to dominate, it relies on human-specific assumptions ill-suited to predictive systems like LLMs. Probabilistic pragmatics, particularly the Rational Speech Act framework, offers a more compatible teleology by focusing on optimization rather than truth-evaluation. The third section addresses the issue of substitutionalism in three forms - generalizing, linguistic, and communicative - highlighting the anthropomorphic biases that distort LLM evaluation and obscure the role of human communicative subjects. Finally, the paper introduces the concept of context frustration to describe the paradox of increased contextual input paired with a collapse in contextual understanding, emphasizing how users are compelled to co-construct pragmatic conditions both for the model and themselves. These arguments suggest that pragmatic theory may need to be adjusted or expanded to better account for communication involving generative AI.