Generative AI
Exploring the flavor structure of leptons via diffusion models
Nishimura, Satsuki, Otsuka, Hajime, Uchiyama, Haruki
We propose a method to explore the flavor structure of leptons using diffusion models, which are known as one of generative artificial intelligence (generative AI). We consider a simple extension of the Standard Model with the type I seesaw mechanism and train a neural network to generate the neutrino mass matrix. By utilizing transfer learning, the diffusion model generates 104 solutions that are consistent with the neutrino mass squared differences and the leptonic mixing angles. The distributions of the CP phases and the sums of neutrino masses, which are not included in the conditional labels but are calculated from the solutions, exhibit non-trivial tendencies. In addition, the effective mass in neutrinoless double beta decay is concentrated near the boundaries of the existing confidence intervals, allowing us to verify the obtained solutions through future experiments. An inverse approach using the diffusion model is expected to facilitate the experimental verification of flavor models from a perspective distinct from conventional analytical methods.
Exploring the Roles of Large Language Models in Reshaping Transportation Systems: A Survey, Framework, and Roadmap
Modern transportation systems face pressing challenges due to increasing demand, dynamic environments, and heterogeneous information integration. The rapid evolution of Large Language Models (LLMs) offers transformative potential to address these challenges. Extensive knowledge and high-level capabilities derived from pretraining evolve the default role of LLMs as text generators to become versatile, knowledge-driven task solvers for intelligent transportation systems. This survey first presents LLM4TR, a novel conceptual framework that systematically categorizes the roles of LLMs in transportation into four synergetic dimensions: information processors, knowledge encoders, component generators, and decision facilitators. Through a unified taxonomy, we systematically elucidate how LLMs bridge fragmented data pipelines, enhance predictive analytics, simulate human-like reasoning, and enable closed-loop interactions across sensing, learning, modeling, and managing tasks in transportation systems. For each role, our review spans diverse applications, from traffic prediction and autonomous driving to safety analytics and urban mobility optimization, highlighting how emergent capabilities of LLMs such as in-context learning and step-by-step reasoning can enhance the operation and management of transportation systems. We further curate practical guidance, including available resources and computational guidelines, to support real-world deployment. By identifying challenges in existing LLM-based solutions, this survey charts a roadmap for advancing LLM-driven transportation research, positioning LLMs as central actors in the next generation of cyber-physical-social mobility ecosystems. Online resources can be found in the project page: https://github.com/tongnie/awesome-llm4tr.
Exploring the Energy Landscape of RBMs: Reciprocal Space Insights into Bosons, Hierarchical Learning and Symmetry Breaking
Toledo-Marin, J. Quetzalcóatl, Maiti, Anindita, Fox, Geoffrey C., Melko, Roger G.
Deep generative models have become ubiquitous due to their ability to learn and sample from complex distributions. Despite the proliferation of various frameworks, the relationships among these models remain largely unexplored, a gap that hinders the development of a unified theory of AI learning. We address two central challenges: clarifying the connections between different deep generative models and deepening our understanding of their learning mechanisms. We focus on Restricted Boltzmann Machines (RBMs), known for their universal approximation capabilities for discrete distributions. By introducing a reciprocal space formulation, we reveal a connection between RBMs, diffusion processes, and coupled Bosons. We show that at initialization, the RBM operates at a saddle point, where the local curvature is determined by the singular values, whose distribution follows the Marcenko-Pastur law and exhibits rotational symmetry. During training, this rotational symmetry is broken due to hierarchical learning, where different degrees of freedom progressively capture features at multiple levels of abstraction. This leads to a symmetry breaking in the energy landscape, reminiscent of Landau theory. This symmetry breaking in the energy landscape is characterized by the singular values and the weight matrix eigenvector matrix. We derive the corresponding free energy in a mean-field approximation. We show that in the limit of infinite size RBM, the reciprocal variables are Gaussian distributed. Our findings indicate that in this regime, there will be some modes for which the diffusion process will not converge to the Boltzmann distribution. To illustrate our results, we trained replicas of RBMs with different hidden layer sizes using the MNIST dataset. Our findings bridge the gap between disparate generative frameworks and also shed light on the processes underpinning learning in generative models.
ChatGPT now speaks even more naturally with fewer interruptions
OpenAI has updated ChatGPT's Advanced Voice Mode feature, promising a more natural conversation experience. The aim is to make the AI-powered assistant more pleasant to talk to and less prone to interrupting you mid-sentence. In a video posted on the OpenAI YouTube channel on Monday, researcher Manuka Stratta showed off the improvements. One of the most common annoyances with voice assistants is that they tend to interrupt you when you pause to think. That's now been fixed here.
Microsoft introduces deep research and analysis tools for Copilot
Microsoft has launched two new "reasoning agents" for Copilot that were designed to analyze vast amounts of work data, including emails, meetings, chats and documents. The first tool called "Researcher" is based on OpenAI's deep research model combined with Copilot's advanced orchestration and deep search capabilities. Researcher was made for "complex, multi-step research" at work. It can take a user's internal work data along with additional information from the web, such as competitive data, emerging trends and the latest market analysis, to create market strategies and comprehensive quarterly reports, among other potential uses. Plus, it can pull data from Salesforce, ServiceNow and other external sources. Meanwhile, the new "Analyst" tool was built to function like a skilled data scientist.
The Download: China's empty data centers, and OpenAI's new practical image generator
Just months ago, China's boom in data center construction was at its height, fueled by both government and private investors. Renting out GPUs to companies that need them for training AI models was once seen as a sure bet. But with the rise of DeepSeek and a sudden change in the economics around AI, the industry is faltering. Prices for GPUs are falling and many newly built facilities are now sitting empty. Read the full story to find out why.
The AI Hype Index: DeepSeek mania, Israel's spying tool, and cheating at chess
That's why we've created the AI Hype Index--a simple, at-a-glance summary of everything you need to know about the state of the industry. While AI models are certainly capable of creating interesting and sometimes entertaining material, their output isn't necessarily useful. Google DeepMind is hoping that its new robotics model could make machines more receptive to verbal commands, paving the way for us to simply speak orders to them aloud. Elsewhere, the Chinese startup Monica has created Manus, which it claims is the very first general AI agent to complete truly useful tasks. And burnt-out coders are allowing AI to take the wheel entirely in a new practice dubbed "vibe coding."
ViLBench: A Suite for Vision-Language Process Reward Modeling
Tu, Haoqin, Feng, Weitao, Chen, Hardy, Liu, Hui, Tang, Xianfeng, Xie, Cihang
Process-supervised reward models serve as a fine-grained function that provides detailed step-wise feedback to model responses, facilitating effective selection of reasoning trajectories for complex tasks. Despite its advantages, evaluation on PRMs remains less explored, especially in the multimodal domain. To address this gap, this paper first benchmarks current vision large language models (VLLMs) as two types of reward models: output reward models (ORMs) and process reward models (PRMs) on multiple vision-language benchmarks, which reveal that neither ORM nor PRM consistently outperforms across all tasks, and superior VLLMs do not necessarily yield better rewarding performance. To further advance evaluation, we introduce ViLBench, a vision-language benchmark designed to require intensive process reward signals. Notably, OpenAI's GPT-4o with Chain-of-Thought (CoT) achieves only 27.3% accuracy, indicating the benchmark's challenge for current VLLMs. Lastly, we preliminarily showcase a promising pathway towards bridging the gap between general VLLMs and reward models -- by collecting 73.6K vision-language process reward data using an enhanced tree-search algorithm, our 3B model is able to achieve an average improvement of 3.3% over standard CoT and up to 2.5% compared to its untrained counterpart on ViLBench by selecting OpenAI o1's generations. We release the implementations at https://ucsc-vlaa.github.io/ViLBench with our code, model, and data.
SoK: How Robust is Audio Watermarking in Generative AI models?
Wen, Yizhu, Innuganti, Ashwin, Ramos, Aaron Bien, Guo, Hanqing, Yan, Qiben
Audio watermarking is increasingly used to verify the provenance of AI-generated content, enabling applications such as detecting AI-generated speech, protecting music IP, and defending against voice cloning. To be effective, audio watermarks must resist removal attacks that distort signals to evade detection. While many schemes claim robustness, these claims are typically tested in isolation and against a limited set of attacks. A systematic evaluation against diverse removal attacks is lacking, hindering practical deployment. In this paper, we investigate whether recent watermarking schemes that claim robustness can withstand a broad range of removal attacks. First, we introduce a taxonomy covering 22 audio watermarking schemes. Next, we summarize their underlying technologies and potential vulnerabilities. We then present a large-scale empirical study to assess their robustness. To support this, we build an evaluation framework encompassing 22 types of removal attacks (109 configurations) including signal-level, physical-level, and AI-induced distortions. We reproduce 9 watermarking schemes using open-source code, identify 8 new highly effective attacks, and highlight 11 key findings that expose the fundamental limitations of these methods across 3 public datasets. Our results reveal that none of the surveyed schemes can withstand all tested distortions. This evaluation offers a comprehensive view of how current watermarking methods perform under real-world threats. Our demo and code are available at https://sokaudiowm.github.io/.
Anti Robot Speciesism
De Freitas, Julian, Castelo, Noah, Schmitt, Bernd, Sarvary, Miklos
DATE SUBMITTED: March, 202 5 Words: 9, 22 0 2 Abstract H umanoid robots are a form of embodied artificial intelligence (AI) that look s and act s more and more like humans. Powered by generative AI and advances in robotics, humanoid robots can speak and interact with humans rather naturally but are still easily recognizable as robots. But how will we treat humanoids when they seem indistinguishable from humans in appearance and mind? We find a tendency (called "anti - robot" speciesism) to deny such robots humanlike capabilities, driven by motivations to accord members of the human species preferential treatment . Six experiments show that robots are denied humanlike attributes, simply because they are not biological beings and because humans want to avoid feelings of cognitive dissonance when utilizing such robots for unsavory tasks . Th us, pe ople do not rationally attribute capabilities to perfectly human like robots but deny them capabilities as it suits them . Keywords: robots, artificial intelligence, humanoids, speciesism, cognitive dissonance 3 In recent years, n ew artificial intelligen ce (AI) technologies have been introduced into the marketplace that have the potential to radically change people's work and lives . This paper examines how people might react to robots that seem be " perfectly human like " . With major companies like Amazon and Nvidia planning mass production of such robots, we are entering an era where the line between human and non - human entities is increasingly blurred. Our findings suggest that the advent of such robots will not lead people to rationally conclude that these robots are as capable as humans in performing some tasks . Rather, people will deny these robots humanlike attributes, driven by their motivation to prioritize their own species and to avoid feelings of cognitive dissonance from utilizing such robots for unsavory tasks. Aversion to Robots and AI People are often averse to robots. P sychological research has explained this effect by arguing that such "almost humanlike" robots appear as aesthetically dis pleasing, and that they remind people of zombies, death, or disease (Kätsyri et al., 2015; Mori, 1970; Wang et al., 2015) . Other psychological explanations focus on how people perceive robot minds, sometimes referred to as the "uncanny valley of mind" (Müller et al., 2021; Stein & Ohler, 2017) . These theories suggest that humanoid robots can be unsettling because they remind people of the human ability to experience feelings, even though these robots are not seen as having such capabilities (Gray & Wegner, 2012; Smith et al., 2021) .