Goto

Collaborating Authors

 Generative AI


Fox News AI Newsletter: North Korea's suicide drone test

FOX News

North Korean leader Kim Jong Un supervises the test of suicide drones with artificial intelligence technology, according to local media, at an unknown location, in this photo released by North Korea's official Korean Central News Agency on March 27, 2025. KIM POWER PLAY: North Korean dictator Kim Jong Un oversaw tests of newly developed AI-powered suicide drones and called for their increased production, North Korean state media said Thursday. A photo taken on October 4, 2023 in Manta, near Turin, shows a smartphone and a laptop displaying the logos of the artificial intelligence OpenAI research company and ChatGPT chatbot. SUZANNE'S TWIN: Suzanne Somers passed away two years ago, but her memory lives on, not only through her Hollywood career and businesses, but artificial intelligence too. Her widower, Alan Hamel, worked with an AI company called Hollo to create a "twin" of his late wife.


Who Owns the Output? Bridging Law and Technology in LLMs Attribution

arXiv.org Artificial Intelligence

Since the introduction of ChatGPT in 2022, Large language models (LLMs) and Large Multimodal Models (LMM) have transformed content creation, enabling the generation of human-quality content, spanning every medium, text, images, videos, and audio. The chances offered by generative AI models are endless and are drastically reducing the time required to generate content and usually raising the quality of the generation. However, considering the complexity and the difficult traceability of the generated content, the use of these tools provides challenges in attributing AI-generated content. The difficult attribution resides for a variety of reasons, starting from the lack of a systematic fingerprinting of the generated content and ending with the enormous amount of data on which LLMs and LMM are trained, which makes it difficult to connect generated content to the training data. This scenario is raising concerns about intellectual property and ethical responsibilities. To address these concerns, in this paper, we bridge the technological, ethical, and legislative aspects, by proposing a review of the legislative and technological instruments today available and proposing a legal framework to ensure accountability. In the end, we propose three use cases of how these can be combined to guarantee that attribution is respected. However, even though the techniques available today can guarantee a greater attribution to a greater extent, strong limitations still apply, that can be solved uniquely by the development of new attribution techniques, to be applied to LLMs and LMMs.


The Reasoning-Memorization Interplay in Language Models Is Mediated by a Single Direction

arXiv.org Artificial Intelligence

Large language models (LLMs) excel on a variety of reasoning benchmarks, but previous studies suggest they sometimes struggle to generalize to unseen questions, potentially due to over-reliance on memorized training examples. However, the precise conditions under which LLMs switch between reasoning and memorization during text generation remain unclear. In this work, we provide a mechanistic understanding of LLMs' reasoning-memorization dynamics by identifying a set of linear features in the model's residual stream that govern the balance between genuine reasoning and memory recall. These features not only distinguish reasoning tasks from memory-intensive ones but can also be manipulated to causally influence model performance on reasoning tasks. Additionally, we show that intervening in these reasoning features helps the model more accurately activate the most relevant problem-solving capabilities during answer generation. Our findings offer new insights into the underlying mechanisms of reasoning and memory in LLMs and pave the way for the development of more robust and interpretable generative AI systems.


Can Multi-modal (reasoning) LLMs work as deepfake detectors?

arXiv.org Artificial Intelligence

Deepfake detection remains a critical challenge in the era of advanced generative models, particularly as synthetic media becomes more sophisticated. In this study, we explore the potential of state of the art multi-modal (reasoning) large language models (LLMs) for deepfake image detection such as (OpenAI O1/4o, Gemini thinking Flash 2, Deepseek Janus, Grok 3, llama 3.2, Qwen 2/2.5 VL, Mistral Pixtral, Claude 3.5/3.7 sonnet) . We benchmark 12 latest multi-modal LLMs against traditional deepfake detection methods across multiple datasets, including recently published real-world deepfake imagery. To enhance performance, we employ prompt tuning and conduct an in-depth analysis of the models' reasoning pathways to identify key contributing factors in their decision-making process. Our findings indicate that best multi-modal LLMs achieve competitive performance with promising generalization ability with zero shot, even surpass traditional deepfake detection pipelines in out-of-distribution datasets while the rest of the LLM families performs extremely disappointing with some worse than random guess. Furthermore, we found newer model version and reasoning capabilities does not contribute to performance in such niche tasks of deepfake detection while model size do help in some cases. This study highlights the potential of integrating multi-modal reasoning in future deepfake detection frameworks and provides insights into model interpretability for robustness in real-world scenarios.


Evaluating Compositional Scene Understanding in Multimodal Generative Models

arXiv.org Artificial Intelligence

The visual world is fundamentally compositional. Visual scenes are defined by the composition of objects and their relations. Hence, it is essential for computer vision systems to reflect and exploit this compositionality to achieve robust and generalizable scene understanding. While major strides have been made toward the development of general-purpose, multimodal generative models, including both text-to-image models and multimodal vision-language models, it remains unclear whether these systems are capable of accurately generating and interpreting scenes involving the composition of multiple objects and relations. In this work, we present an evaluation of the compositional visual processing capabilities in the current generation of text-to-image (DALL-E 3) and multimodal vision-language models (GPT-4V, GPT-4o, Claude Sonnet 3.5, QWEN2-VL-72B, and InternVL2.5-38B), and compare the performance of these systems to human participants. The results suggest that these systems display some ability to solve compositional and relational tasks, showing notable improvements over the previous generation of multimodal models, but with performance nevertheless well below the level of human participants, particularly for more complex scenes involving many ($>5$) objects and multiple relations. These results highlight the need for further progress toward compositional understanding of visual scenes.


The first trial of generative AI therapy shows it might help with depression

MIT Technology Review

Many psychologists and psychiatrists have shared the vision, noting that fewer than half of people with a mental disorder receive therapy, and those who do might get only 45 minutes per week. Researchers have tried to build tech so that more people can access therapy, but they have been held back by two things. One, a therapy bot that says the wrong thing could result in real harm. That's why many researchers have built bots using explicit programming: The software pulls from a finite bank of approved responses (as was the case with Eliza, a mock-psychotherapist computer program built in the 1960s). But this makes them less engaging to chat with, and people lose interest.


How Those Studio Ghibli Memes Are a Sign of OpenAI's Trump-Era Shift

TIME - Tech

In one sense, the pivot has been a long time coming. OpenAI began its decade-long life as a research lab that kept its tools under strict lock and key; when it did release early chatbots and image generation models, they had strict content filters that aimed to prevent misuse. But for years it has been widening the accessibility of its tools in an approach it calls "iterative deployment." The release of ChatGPT in November 2022 was the most popular example of this strategy, which the company believes is necessary to help society adapt to the changes AI is bringing. Still, in another sense, the change to OpenAI's model behavior policies has a more recent proximate cause: the 2024 election of President Donald Trump, and the cultural shift that has accompanied the new administration.


Hayao Miyazaki Would Hate You Losers and Your A.I. Slop

Slate

Sign up for the Slatest to get the most insightful analysis, criticism, and advice out there, delivered to your inbox daily. Since OpenAI released an update earlier this week that improved ChatGPT's ability to generate images based on detailed requests, a dark evil has infected the internet, responsible for the shriveling of souls and the wanton destruction of life and nature itself: Studio Ghibli A.I. slop. Social media has been flooded with images of the most random shit imaginable rendered in the signature style of Hayao Miyazaki, the legendary animator and co-founder of the Japanese company Studio Ghibli, renowned for hand-drawn animated films such as Princess Mononoke, Spirited Away, and My Neighbor Totoro. X in particular, Elon Musk's land of the rising bot, is rife with viral posts extolling the virtues of an innovation that steals human-made creations, chews them into paste, and spits out the reassembled remains, stripped of any of the originality, spirit, and labor that makes art art. It's been 24 hours since OpenAI unexpectedly shook the AI image world with 4o image generation.


Copyright questions loom as ChatGPT's Ghibli-style images go viral

The Japan Times

The release of the latest image generator on OpenAI's ChatGPT has triggered a flood of online memes featuring images done in the style of Studio Ghibli, the Japanese studio behind classic animated films like "My Neighbor Totoro" and "Princess Mononoke." Since the release on Wednesday, AI-generated images depicting Studio Ghibli versions of Elon Musk with U.S. President Donald Trump, "The Lord of the Rings," and even a recreation of the Sept. 11 attacks have gone viral across online platforms.


Generalization Bias in Large Language Model Summarization of Scientific Research

arXiv.org Artificial Intelligence

Artificial intelligence chatbots driven by large language models (LLMs) have the potential to increase public science literacy and support scientific research, as they can quickly summarize complex scientific information in accessible terms. However, when summarizing scientific texts, LLMs may omit details that limit the scope of research conclusions, leading to generalizations of results broader than warranted by the original study. We tested 10 prominent LLMs, including ChatGPT-4o, ChatGPT-4.5, DeepSeek, LLaMA 3.3 70B, and Claude 3.7 Sonnet, comparing 4900 LLM-generated summaries to their original scientific texts. Even when explicitly prompted for accuracy, most LLMs produced broader generalizations of scientific results than those in the original texts, with DeepSeek, ChatGPT-4o, and LLaMA 3.3 70B overgeneralizing in 26 to 73% of cases. In a direct comparison of LLM-generated and human-authored science summaries, LLM summaries were nearly five times more likely to contain broad generalizations (OR = 4.85, 95% CI [3.06, 7.70]). Notably, newer models tended to perform worse in generalization accuracy than earlier ones. Our results indicate a strong bias in many widely used LLMs towards overgeneralizing scientific conclusions, posing a significant risk of large-scale misinterpretations of research findings. We highlight potential mitigation strategies, including lowering LLM temperature settings and benchmarking LLMs for generalization accuracy.