The Limits of A.I.-Generated Miyazaki
If asked to come up with a quintessentially "human" work of art, one could do worse than to name a film by Studio Ghibli. The Japanese animation studio, founded by the legendary eighty-four-year-old director Hayao Miyazaki, is known for its hand-drawn imagery, lushly organic color palettes, epic narratives, and evocation of both the emotional ambiguities of childhood and the twisting path to becoming an adult. We American millennials were blessed to have the films translated and distributed in English just as we were growing up, and so movies including "My Neighbor Totoro," "Princess Mononoke," and "Spirited Away" are nigh-universally recognizable touchstones of our youth. Any Ghibli imagery is primed to make us feel a combination of pleasurable nostalgia and mournful shivers, evoking the doomed forest creatures, greedy bathhouse ghosts, and missed connections featured in Miyazaki's cinematic story lines. Unfortunately, that sense of poignancy quickly erodes when you are bombarded with thousands of Ghibli-esque copycat images, as we all were online last week, thanks to OpenAI's latest version of its ChatGPT tool.
Interview with Joseph Marvin Imperial: aligning generative AI with technical standards
In this interview series, we're meeting some of the AAAI/SIGAI Doctoral Consortium participants to find out more about their research. The Doctoral Consortium provides an opportunity for a group of PhD students to discuss and explore their research interests and career objectives in an interdisciplinary workshop together with a panel of established researchers. In the latest interview, we hear from Joseph Marvin Imperial, who is focussed on aligning generative AI with technical standards for regulatory and operational compliance. Standards are documents created by industry and/or academic experts that have been recognized to ensure the quality, accuracy, and interoperability of systems and processes (aka "the best way of doing things"). You'll see standards in almost all sectors and domains, including the sciences, healthcare, education, finance, journalism, law, and engineering.
Rapidus begins pilot production of 2-nanometer chips in Hokkaido
Government-backed Rapidus started pilot production of advanced chips in Hokkaido this week, taking Japan one step closer to its goal of a return to semiconductor-manufacturing leadership. The company announced Tuesday that its plant near New Chitose Airport is ready for test production of next-generation 2-nanometer chips. Rapidus aims to mass produce those semiconductors -- which are vital for advanced technologies, such as artificial intelligence and autonomous driving -- in 2027.
Nintendo set to unveil Switch 2, follow-up to megahit game console
Nintendo is set to reveal key details of its follow-up to the Switch in a presentation that has been eagerly anticipated by gaming fans and investors alike. The Japanese gaming giant will kick off the hour-long presentation for the Switch 2 at 13:00 GMT on Wednesday, with the hybrid handheld and home console's price and release date among the details expected to be revealed. The Nintendo Direct event comes after the Kyoto-based company confirmed the existence of the next-generation system in a brief teaser video released in mid-January. The teaser showed the upcoming console to have a similar appearance, albeit with a larger screen, and the same hybrid functionality as the original Switch. The video also provided a short preview of the upcoming instalment of Nintendo's popular Mario Kart franchise.
Drone footage shows scale of Greek island flooding
Heavy rainfall caused flashing flooding on Greece's Paros island on Monday, 31 March. Drone footage captures the scale of destruction in the capital, Naousa, with damage to vehicles and authorities working to clear mud from the streets. Schools were closed Monday and authorities urged residents to avoid travel, according to local media. Further heavy rainfall is expected to hit this week.
RobuNFR: Evaluating the Robustness of Large Language Models on Non-Functional Requirements Aware Code Generation
Lin, Feng, Kim, Dong Jae, Li, Zhenhao, Yang, Jinqiu, Tse-Hsun, null, Chen, null
When using LLMs to address Non-Functional Requirements (NFRs), developers may behave differently (e.g., expressing the same NFR in different words). Robust LLMs should output consistent results across these variations; however, this aspect remains underexplored. We propose RobuNFR for evaluating the robustness of LLMs in NFR-aware code generation across four NFR dimensions: design, readability, reliability, and performance, using three methodologies: prompt variation, regression testing, and diverse workflows. Our experiments show that RobuNFR reveals robustness issues in the tested LLMs when considering NFRs in code generation. Specifically, under prompt variation, including NFRs leads to a decrease in Pass@1 by up to 39 percent and an increase in the standard deviation from 0.48 to 2.48 compared to the baseline without NFRs (i.e., Function-Only). While incorporating NFRs generally improves overall NFR metrics, it also results in higher prompt sensitivity. In regression settings, some LLMs exhibit differences across versions, with improvements in one aspect (e.g., reduced code smells) often accompanied by regressions in another (e.g., decreased correctness), revealing inconsistencies that challenge their robustness. When varying workflows, the tested LLMs show significantly different NFR-aware code generation capabilities between two workflows: (1) integrating NFRs and functional requirements into the initial prompt and (2) enhancing Function-Only-generated code with the same NFR.
KD$^{2}$M: An unifying framework for feature knowledge distillation
Knowledge Distillation (KD) seeks to transfer the knowledge of a teacher, towards a student neural net. This process is often done by matching the networks' predictions (i.e., their output), but, recently several works have proposed to match the distributions of neural nets' activations (i.e., their features), a process known as \emph{distribution matching}. In this paper, we propose an unifying framework, Knowledge Distillation through Distribution Matching (KD$^{2}$M), which formalizes this strategy. Our contributions are threefold. We i) provide an overview of distribution metrics used in distribution matching, ii) benchmark on computer vision datasets, and iii) derive new theoretical results for KD.
An Investigation into the Causal Mechanism of Political Opinion Dynamics: A Model of Hierarchical Coarse-Graining with Community-Bounded Social Influence
Widler, Valeria, Kaminska, Barbara, Martins, Andre C. R., Puga-Gonzalez, Ivan
The increasing polarization in democratic societies is an emergent outcome of political opinion dynamics. Yet, the fundamental mechanisms behind the formation of political opinions, from individual beliefs to collective consensus, remain unknown. Understanding that a causal mechanism must account for both bottom-up and top-down influences, we conceptualize political opinion dynamics as hierarchical coarse-graining, where microscale opinions integrate into a macro-scale state variable. Using the CODA (Continuous Opinions Discrete Actions) model, we simulate Bayesian opinion updating, social identity-based information integration, and migration between social identity groups to represent higher-level connectivity. This results in coarse-graining across micro, meso, and macro levels. Our findings show that higher-level connectivity shapes information integration, yielding three regimes: independent (disconnected, local convergence), parallel (fast, global convergence), and iterative (slow, stepwise convergence). In the iterative regime, low connectivity fosters transient diversity, indicating an informed consensus. In all regimes, time-scale separation leads to downward causation, where agents converge on the aggregate majority choice, driving consensus. Critically, any degree of coherent higher-level information integration can overcome misalignment via global downward causation. The results highlight how emergent properties of the causal mechanism, such as downward causation, are essential for consensus and may inform more precise investigations into polarized political discourse.
AutoEval: Autonomous Evaluation of Generalist Robot Manipulation Policies in the Real World
Zhou, Zhiyuan, Atreya, Pranav, Tan, You Liang, Pertsch, Karl, Levine, Sergey
Scalable and reproducible policy evaluation has been a long-standing challenge in robot learning. Evaluations are critical to assess progress and build better policies, but evaluation in the real world, especially at a scale that would provide statistically reliable results, is costly in terms of human time and hard to obtain. Evaluation of increasingly generalist robot policies requires an increasingly diverse repertoire of evaluation environments, making the evaluation bottleneck even more pronounced. To make real-world evaluation of robotic policies more practical, we propose AutoEval, a system to autonomously evaluate generalist robot policies around the clock with minimal human intervention. Users interact with AutoEval by submitting evaluation jobs to the AutoEval queue, much like how software jobs are submitted with a cluster scheduling system, and AutoEval will schedule the policies for evaluation within a framework supplying automatic success detection and automatic scene resets. We show that AutoEval can nearly fully eliminate human involvement in the evaluation process, permitting around the clock evaluations, and the evaluation results correspond closely to ground truth evaluations conducted by hand. To facilitate the evaluation of generalist policies in the robotics community, we provide public access to multiple AutoEval scenes in the popular BridgeData robot setup with WidowX robot arms. In the future, we hope that AutoEval scenes can be set up across institutions to form a diverse and distributed evaluation network.
Towards Interpretable Soft Prompts
Patel, Oam, Wang, Jason, Nayak, Nikhil Shivakumar, Srinivas, Suraj, Lakkaraju, Himabindu
Soft prompts have been popularized as a cheap and easy way to improve task-specific LLM performance beyond few-shot prompts. Despite their origin as an automated prompting method, however, soft prompts and other trainable prompts remain a black-box method with no immediately interpretable connections to prompting. We create a novel theoretical framework for evaluating the interpretability of trainable prompts based on two desiderata: faithfulness and scrutability. We find that existing methods do not naturally satisfy our proposed interpretability criterion. Instead, our framework inspires a new direction of trainable prompting methods that explicitly optimizes for interpretability. To this end, we formulate and test new interpretability-oriented objective functions for two state-of-the-art prompt tuners: Hard Prompts Made Easy (PEZ) and RLPrompt. Our experiments with GPT-2 demonstrate a fundamental trade-off between interpretability and the task-performance of the trainable prompt, explicating the hardness of the soft prompt interpretability problem and revealing odd behavior that arises when one optimizes for an interpretability proxy.