Generative AI
Revealing higher-order neural representations with generative artificial intelligence
Asrari, Hojjat Azimi, Peters, Megan A. K.
Studies often aim to reveal how neural representations encode aspects of an observer's environment, such as its contents or structure. These are ``first-order" representations (FORs), because they're ``about" the external world. A less-common target is ``higher-order" representations (HORs), which are ``about" FORs -- their contents, stability, or uncertainty. HORs of uncertainty appear critically involved in adaptive behaviors including learning under uncertainty, influencing learning rates and internal model updating based on environmental feedback. However, HORs about uncertainty are unlikely to be direct ``read-outs" of FOR characteristics, instead reflecting estimation processes which may be lossy, bias-prone, or distortive and which may also incorporate estimates of distributions of uncertainty the observer is likely to experience. While some research has targeted neural representations of ``instantaneously" estimated uncertainty, how the brain represents \textit{distributions} of expected uncertainty remains largely unexplored. Here, we propose a novel reinforcement learning (RL) based generative artificial intelligence (genAI) approach to explore neural representations of uncertainty distributions. We use existing functional magnetic resonance imaging data, where humans learned to `de-noise' their brain states to achieve target neural patterns, to train denoising diffusion genAI models with RL algorithms to learn noise distributions similar to how humans might learn to do the same. We then explore these models' learned noise-distribution HORs compared to control models trained with traditional backpropagation. Results reveal model-dependent differences in noise distribution representations -- with the RL-based model offering much higher explanatory power for human behavior -- offering an exciting path towards using genAI to explore neural noise-distribution HORs.
MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models
Xu, Chejian, Zhang, Jiawei, Chen, Zhaorun, Xie, Chulin, Kang, Mintong, Potter, Yujin, Wang, Zhun, Yuan, Zhuowen, Xiong, Alexander, Xiong, Zidi, Zhang, Chenhui, Yuan, Lingzhi, Zeng, Yi, Xu, Peiyang, Guo, Chengquan, Zhou, Andy, Tan, Jeffrey Ziwei, Zhao, Xuandong, Pinto, Francesco, Xiang, Zhen, Gai, Yu, Lin, Zinan, Hendrycks, Dan, Li, Bo, Song, Dawn
Multimodal foundation models (MMFMs) play a crucial role in various applications, including autonomous driving, healthcare, and virtual assistants. However, several studies have revealed vulnerabilities in these models, such as generating unsafe content by text-to-image models. Existing benchmarks on multimodal models either predominantly assess the helpfulness of these models, or only focus on limited perspectives such as fairness and privacy. In this paper, we present the first unified platform, MMDT (Multimodal DecodingTrust), designed to provide a comprehensive safety and trustworthiness evaluation for MMFMs. Our platform assesses models from multiple perspectives, including safety, hallucination, fairness/bias, privacy, adversarial robustness, and out-of-distribution (OOD) generalization. We have designed various evaluation scenarios and red teaming algorithms under different tasks for each perspective to generate challenging data, forming a high-quality benchmark. We evaluate a range of multimodal models using MMDT, and our findings reveal a series of vulnerabilities and areas for improvement across these perspectives. This work introduces the first comprehensive and unique safety and trustworthiness evaluation platform for MMFMs, paving the way for developing safer and more reliable MMFMs and systems. Our platform and benchmark are available at https://mmdecodingtrust.github.io/.
Fake Runs, Real Fixes -- Analyzing xPU Performance Through Simulation
Zarkadas, Ioannis, Tomlinson, Amanda, Cidon, Asaf, Kasikci, Baris, Weisse, Ofir
These portable mid-level representations are then compiled into the byte-code which runs on the ML accelerator. The As models become larger, ML accelerators are a scarce resource development of each of these levels of abstraction requires a whose performance must be continually optimized to huge engineering effort, and inefficiencies introduced at any improve efficiency. Existing performance analysis tools are level can cause performance degradation for the model. The coarse grained, and fail to capture model performance at the companies that offer generative AI services are often doing so machine-code level. In addition, these tools often do not provide at a massive scale (for example, the infrastructure to provide specific recommendations for optimizations. We present inference for Microsoft's Bing AI chatbot is estimated to cost xPU-Shark, a fine-grained methodology for analyzing ML $4 billion [57]), meaning that even a small degradation in models at the machine-code level that provides actionable optimization performance can lead to large capital losses.
Alexa is about to send everything you tell it to Amazon
Amazon's Alexa service is rolling out on March 28, and with it supposedly comes a more personalized, intuitive, and powerful digital assistant thanks to its underlying generative AI technology. But for the new features to work, the company is asking a lot from its Echo and smart device users--whether or not they choose to use Alexa at all. Alexa is billed as a major upgrade that includes individual voice recognition through Alexa Voice ID, nuanced calendar scheduling, Ring home security system integrations, and product purchasing capabilities. It's Amazon's latest effort to generate a profit from Alexa, which lost 25 billion in revenue between 2007-2021 according to The Wall Street Journal last year. While Alexa will be added to all Prime subscriptions, users without Prime can enroll in the program for 19.99 per month.
Panmodal Information Interaction
The chat interface is an essential component of many generative artificial intelligence (GenAI)-based systems. Multi-turn dialog has long shown promise as a way to engage with information systems,5 but is now going mainstream in support of complex tasks via progress in GenAI15 and in GenAI-based conversational systems such as ChatGPTa and Pi.b SearchGPT, recently trialed by OpenAI, provides highly relevant, verifiable answers in a conversational experience. Search engines can now show GenAI answers directly on result pages--minimizing user effort in examining search results but also removing human control over answer generation,9 which can have its own drawbacks (for example, fewer learning opportunities)--and let users follow up via multi-turn conversation for clarification or to seek additional information. Traditional search still has utility for some tasks, for fact finding or navigational tasks, and may be preferred by some searchers given its focus on providing information sources directly rather than synthesized answers. GenAI is also prone to hallucinate (that is, generate nonsensical or inaccurate outputs), making sole reliance on its generated answers inadvisable, although source attribution and answer verification are now creeping in to help users better assess what they can use and trust.
The Download: Google playing AI search catchup, and forming relationships with chatbots
I've been mulling over something that Will Heaven, our senior editor for AI, pointed out not too long ago: all the big players in AI seem to be moving in the same directions and converging on the same things. It's just announced it's adding new AI features from Gemini to search, and adding search features to Gemini. What strikes me more than how well they work is that they are really just about catching up with OpenAI's ChatGPT. And their belated appearance in March of the year 2025 doesn't seem like a great sign for Google. This story originally appeared in The Debrief with Mat Honan, a weekly newsletter about the biggest stories in tech from our editor in chief.
Is Google playing catchup on search with OpenAI?
Take AI Mode, which it announced March 5. It's cool. But it's pretty much a follow-along of what OpenAI was already doing. Google already had something called AI Overviews in search, but AI Mode is different and deeper.) As the company explained in a blog post, "This new Search mode expands what AI Overviews can do with more advanced reasoning, thinking and multimodal capabilities so you can get help with even your toughest questions." Rather than a brief overview with links out, the AI will dig in and offer more robust answers. You can ask followup questions too, something AI Overviews doesn't support.
Position: Model Collapse Does Not Mean What You Think
Schaeffer, Rylan, Kazdan, Joshua, Arulandu, Alvan Caleb, Koyejo, Sanmi
The proliferation of AI-generated content online has fueled concerns over \emph{model collapse}, a degradation in future generative models' performance when trained on synthetic data generated by earlier models. Industry leaders, premier research journals and popular science publications alike have prophesied catastrophic societal consequences stemming from model collapse. In this position piece, we contend this widespread narrative fundamentally misunderstands the scientific evidence. We highlight that research on model collapse actually encompasses eight distinct and at times conflicting definitions of model collapse, and argue that inconsistent terminology within and between papers has hindered building a comprehensive understanding of model collapse. To assess how significantly different interpretations of model collapse threaten future generative models, we posit what we believe are realistic conditions for studying model collapse and then conduct a rigorous assessment of the literature's methodologies through this lens. While we leave room for reasonable disagreement, our analysis of research studies, weighted by how faithfully each study matches real-world conditions, leads us to conclude that certain predicted claims of model collapse rely on assumptions and conditions that poorly match real-world conditions, and in fact several prominent collapse scenarios are readily avoidable. Altogether, this position paper argues that model collapse has been warped from a nuanced multifaceted consideration into an oversimplified threat, and that the evidence suggests specific harms more likely under society's current trajectory have received disproportionately less attention.
Generative AI for Software Architecture. Applications, Trends, Challenges, and Future Directions
Esposito, Matteo, Li, Xiaozhou, Moreschini, Sergio, Ahmad, Noman, Cerny, Tomas, Vaidhyanathan, Karthik, Lenarduzzi, Valentina, Taibi, Davide
Context: Generative Artificial Intelligence (GenAI) is transforming much of software development, yet its application in software architecture is still in its infancy, and no prior study has systematically addressed the topic. Aim: We aim to systematically synthesize the use, rationale, contexts, usability, and future challenges of GenAI in software architecture. Method: We performed a multivocal literature review (MLR), analyzing peer-reviewed and gray literature, identifying current practices, models, adoption contexts, and reported challenges, extracting themes via open coding. Results: Our review identified significant adoption of GenAI for architectural decision support and architectural reconstruction. OpenAI GPT models are predominantly applied, and there is consistent use of techniques such as few-shot prompting and retrieved-augmented generation (RAG). GenAI has been applied mostly to initial stages of the Software Development Life Cycle (SDLC), such as Requirements-to-Architecture and Architecture-to-Code. Monolithic and microservice architectures were the dominant targets. However, rigorous testing of GenAI outputs was typically missing from the studies. Among the most frequent challenges are model precision, hallucinations, ethical aspects, privacy issues, lack of architecture-specific datasets, and the absence of sound evaluation frameworks. Conclusions: GenAI shows significant potential in software design, but several challenges remain on its path to greater adoption. Research efforts should target designing general evaluation methodologies, handling ethics and precision, increasing transparency and explainability, and promoting architecture-specific datasets and benchmarks to bridge the gap between theoretical possibilities and practical use.
Are LLMs (Really) Ideological? An IRT-based Analysis and Alignment Tool for Perceived Socio-Economic Bias in LLMs
Wachter, Jasmin, Radloff, Michael, Smolej, Maja, Kinder-Kurlanda, Katharina
We introduce an Item Response Theory (IRT)-based framework to detect and quantify socioeconomic bias in large language models (LLMs) without relying on subjective human judgments. Unlike traditional methods, IRT accounts for item difficulty, improving ideological bias estimation. We fine-tune two LLM families (Meta-LLaMa 3.2-1B-Instruct and Chat- GPT 3.5) to represent distinct ideological positions and introduce a two-stage approach: (1) modeling response avoidance and (2) estimating perceived bias in answered responses. Our results show that off-the-shelf LLMs often avoid ideological engagement rather than exhibit bias, challenging prior claims of partisanship. This empirically validated framework enhances AI alignment research and promotes fairer AI governance.