Goto

Collaborating Authors

 Media


The Limits of A.I.-Generated Miyazaki

The New Yorker

If asked to come up with a quintessentially "human" work of art, one could do worse than to name a film by Studio Ghibli. The Japanese animation studio, founded by the legendary eighty-four-year-old director Hayao Miyazaki, is known for its hand-drawn imagery, lushly organic color palettes, epic narratives, and evocation of both the emotional ambiguities of childhood and the twisting path to becoming an adult. We American millennials were blessed to have the films translated and distributed in English just as we were growing up, and so movies including "My Neighbor Totoro," "Princess Mononoke," and "Spirited Away" are nigh-universally recognizable touchstones of our youth. Any Ghibli imagery is primed to make us feel a combination of pleasurable nostalgia and mournful shivers, evoking the doomed forest creatures, greedy bathhouse ghosts, and missed connections featured in Miyazaki's cinematic story lines. Unfortunately, that sense of poignancy quickly erodes when you are bombarded with thousands of Ghibli-esque copycat images, as we all were online last week, thanks to OpenAI's latest version of its ChatGPT tool.


Interview with Joseph Marvin Imperial: aligning generative AI with technical standards

AIHub

In this interview series, we're meeting some of the AAAI/SIGAI Doctoral Consortium participants to find out more about their research. The Doctoral Consortium provides an opportunity for a group of PhD students to discuss and explore their research interests and career objectives in an interdisciplinary workshop together with a panel of established researchers. In the latest interview, we hear from Joseph Marvin Imperial, who is focussed on aligning generative AI with technical standards for regulatory and operational compliance. Standards are documents created by industry and/or academic experts that have been recognized to ensure the quality, accuracy, and interoperability of systems and processes (aka "the best way of doing things"). You'll see standards in almost all sectors and domains, including the sciences, healthcare, education, finance, journalism, law, and engineering.


Comparative Analysis of Deepfake Detection Models: New Approaches and Perspectives

arXiv.org Machine Learning

The growing threat posed by deepfake videos, capable of manipulating realities and disseminating misinformation, drives the urgent need for effective detection methods. This work investigates and compares different approaches for identifying deepfakes, focusing on the GenConViT model and its performance relative to other architectures present in the DeepfakeBenchmark. To contextualize the research, the social and legal impacts of deepfakes are addressed, as well as the technical fundamentals of their creation and detection, including digital image processing, machine learning, and artificial neural networks, with emphasis on Convolutional Neural Networks (CNNs), Generative Adversarial Networks (GANs), and Transformers. The performance evaluation of the models was conducted using relevant metrics and new datasets established in the literature, such as WildDeep-fake and DeepSpeak, aiming to identify the most effective tools in the battle against misinformation and media manipulation. The obtained results indicated that GenConViT, after fine-tuning, exhibited superior performance in terms of accuracy (93.82%) and generalization capacity, surpassing other architectures in the DeepfakeBenchmark on the DeepSpeak dataset. This study contributes to the advancement of deepfake detection techniques, offering contributions to the development of more robust and effective solutions against the dissemination of false information.


UK needs to relax AI laws or risk transatlantic ties, thinktank warns

The Guardian

To enforce a strict licensing model, the UK would also need to restrict access to models that have been trained on such content, which could include US-owned AI systems. With the Trump administration signalling it will not pursue strict AI regulations and China pursuing AI growth at "breakneck speed", the UK could weaken its economic and national security interests by lagging in the AI race, said TBI. "If the UK imposes laws that are too strict, it risks falling behind in the AI-driven economy and weakening its capacity to protect national security interests," said TBI. The report said arguing that commercial AI models cannot be trained on content from the open web was close to saying knowledge workers – a broad category of professionals ranging from lawyers to researchers – cannot profit from insights they get when reading the same content. Rather than fighting to uphold outdated regulations, said TBI, rights holders and policymakers should help build a future where creativity is valued alongside AI innovation. Fernando Garibay, a record producer who has worked with artists including Lady Gaga and U2, said history has been dotted with "end-of-time claims" related to technological breakthroughs, from the printing press to music streaming.


AI's development is critically important for America – and it all hinges on these freedoms

FOX News

Fox News anchor Bret Baier has the latest on the Murdoch Children's Research Institute's partnership with the Gladstone Institutes for the'Decoding Broken Hearts' initiative on'Special Report.' The Trump administration recently asked American developers, including OpenAI, for input on what the U.S. needs to do to stay ahead in the global AI competition. We believe that preserving AI's ability to learn should be at the top of the list. Today, artificial intelligence is poised to scale human ingenuity itself–the sum of our freedoms to learn and know, think, create, and produce. Humans have never created a technology that can do as much to advance education, science, and discovery–and we're already seeing its benefits.


WikiVideo: Article Generation from Multiple Videos

arXiv.org Artificial Intelligence

We present the challenging task of automatically creating a high-level Wikipedia-style article that aggregates information from multiple diverse videos about real-world events, such as natural disasters or political elections. Videos are intuitive sources for retrieval-augmented generation (RAG), but most contemporary RAG workflows focus heavily on text and existing methods for video-based summarization focus on low-level scene understanding rather than high-level event semantics. To close this gap, we introduce WikiVideo, a benchmark consisting of expert-written articles and densely annotated videos that provide evidence for articles' claims, facilitating the integration of video into RAG pipelines and enabling the creation of in-depth content that is grounded in multimodal sources. We further propose Collaborative Article Generation (CAG), a novel interactive method for article creation from multiple videos. CAG leverages an iterative interaction between an r1-style reasoning model and a VideoLLM to draw higher level inferences about the target event than is possible with VideoLLMs alone, which fixate on low-level visual features. We benchmark state-of-the-art VideoLLMs and CAG in both oracle retrieval and RAG settings and find that CAG consistently outperforms alternative methods, while suggesting intriguing avenues for future work.


WorldScore: A Unified Evaluation Benchmark for World Generation

arXiv.org Artificial Intelligence

We introduce the WorldScore benchmark, the first unified benchmark for world generation. We decompose world generation into a sequence of next-scene generation tasks with explicit camera trajectory-based layout specifications, enabling unified evaluation of diverse approaches from 3D and 4D scene generation to video generation models. The WorldScore benchmark encompasses a curated dataset of 3,000 test examples that span diverse worlds: static and dynamic, indoor and outdoor, photorealistic and stylized. The WorldScore metrics evaluate generated worlds through three key aspects: controllability, quality, and dynamics. Through extensive evaluation of 19 representative models, including both open-source and closed-source ones, we reveal key insights and challenges for each category of models. Our dataset, evaluation code, and leaderboard can be found at https://haoyi-duan.github.io/WorldScore/


RECKON: Large-scale Reference-based Efficient Knowledge Evaluation for Large Language Model

arXiv.org Artificial Intelligence

As large language models (LLMs) advance, efficient knowledge evaluation becomes crucial to verifying their capabilities. Traditional methods, relying on benchmarks, face limitations such as high resource costs and information loss. We propose the Large-scale Reference-based Efficient Knowledge Evaluation for Large Language Model (RECKON), which directly uses reference data to evaluate models. RECKON organizes unstructured data into manageable units and generates targeted questions for each cluster, improving evaluation accuracy and efficiency. Experimental results show that RECKON reduces resource consumption by 56.5% compared to traditional methods while achieving over 97% accuracy across various domains, including world knowledge, code, legal, and biomedical datasets. Code is available at https://github.com/MikeGu721/reckon


News is More than a Collection of Facts: Moral Frame Preserving News Summarization

arXiv.org Artificial Intelligence

News articles are more than collections of facts; they reflect journalists' framing, shaping how events are presented to the audience. One key aspect of framing is the choice to write in (or quote verbatim) morally charged language as opposed to using neutral terms. This moral framing carries implicit judgments that automated news summarizers should recognize and preserve to maintain the original intent of the writer. In this work, we perform the first study on the preservation of moral framing in AI-generated news summaries. We propose an approach that leverages the intuition that journalists intentionally use or report specific moral-laden words, which should be retained in summaries. Through automated, crowd-sourced, and expert evaluations, we demonstrate that our approach enhances the preservation of moral framing while maintaining overall summary quality.


Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models

arXiv.org Artificial Intelligence

As language models improve and become capable of performing more complex tasks across modalities, evaluating them automatically becomes increasingly challenging. Developing strong and robust task-specific automatic metrics gets harder, and human-annotated test sets -- which are expensive to create -- saturate more quickly. A compelling alternative is to design reliable strategies to automate the creation of test data and evaluation, but previous attempts either rely on pre-existing data, or focus solely on individual tasks. We present Zero-shot Benchmarking (ZSB), a framework for creating high-quality benchmarks for any task by leveraging language models for both synthetic test data creation and evaluation. ZSB is simple and flexible: it requires only the creation of a prompt for data generation and one for evaluation; it is scalable to tasks and languages where collecting real-world data is costly or impractical; it is model-agnostic, allowing the creation of increasingly challenging benchmarks as models improve. To assess the effectiveness of our framework, we create benchmarks for five text-only tasks and a multi-modal one: general capabilities in four languages (English, Chinese, French, and Korean), translation, and general vision-language capabilities in English. We then rank a broad range of open and closed systems on our benchmarks. ZSB rankings consistently correlate strongly with human rankings, outperforming widely-adopted standard benchmarks. Through ablations, we find that strong benchmarks can be created with open models, and that judge model size and dataset variety are crucial drivers of performance. We release all our benchmarks, and code to reproduce our experiments and to produce new benchmarks.