Goto

Collaborating Authors

 Personal


Constella: Supporting Storywriters' Interconnected Character Creation through LLM-based Multi-Agents

arXiv.org Artificial Intelligence

Creating a cast of characters by attending to their relational dynamics is a critical aspect of most long-form storywriting. However, our formative study (N=14) reveals that writers struggle to envision new characters that could influence existing ones, to balance similarities and differences among characters, and to intricately flesh out their relationships. Based on these observations, we designed Constella, an LLM-based multi-agent tool that supports storywriters' interconnected character creation process. Constella suggests related characters (FRIENDS DISCOVERY feature), reveals the inner mindscapes of several characters simultaneously (JOURNALS feature), and manifests relationships through inter-character responses (COMMENTS feature). Our 7-8 day deployment study with storywriters (N=11) shows that Constella enabled the creation of expansive communities composed of related characters, facilitated the comparison of characters' thoughts and emotions, and deepened writers' understanding of character relationships. We conclude by discussing how multi-agent interactions can help distribute writers' attention and effort across the character cast.


DocIE@XLLM25: In-Context Learning for Information Extraction using Fully Synthetic Demonstrations

arXiv.org Artificial Intelligence

Large, high-quality annotated corpora remain scarce in document-level entity and relation extraction in zero-shot or few-shot settings. In this paper, we present a fully automatic, LLM-based pipeline for synthetic data generation and in-context learning for document-level entity and relation extraction. In contrast to existing approaches that rely on manually annotated demonstrations or direct zero-shot inference, our method combines synthetic data generation with retrieval-based in-context learning, using a reasoning-optimized language model. This allows us to build a high-quality demonstration database without manual annotation and to dynamically retrieve relevant examples at inference time. Based on our approach we produce a synthetic dataset of over $5k$ Wikipedia abstracts with approximately $59k$ entities and $30k$ relation triples. Finally, we evaluate in-context learning performance on the DocIE shared task, extracting entities and relations from long documents in a zero-shot setting. We find that in-context joint entity and relation extraction at document-level remains a challenging task, even for state-of-the-art large language models.


Hungary and AI: efforts and opportunities in comparison with Singapore

arXiv.org Artificial Intelligence

The study assesses Hungary's National AI Strategy and its implementation through the analysis of strategic documents, publicly available financial records, and expert interviews with the Hungarian AI Coalition President and Chief Strategic Advisor to the Government Commissioner for AI. 22 goals from Hungary's strategy were evaluated through conceptual, governance, temporal, and financial dimensions before being benchmarked against Singapore's National AI Strategies (NAIS 1.0 and NAIS 2.0). Key findings include an estimated total of EUR 4.65 billion in AI-related public investment in Hungary. Openly available financial data was found for only half of the evaluated goals, and just three projects made up 98\% of all documented funding. The research also reveals Hungary's implementation challenges, including fragmented execution following ministerial reorganizations and the absence of designated biennial reviews since 2020. Furthermore, the paper provides targeted recommendations for Hungary's forthcoming AI strategy, drawing on Singapore's framework as a reference point. These include adapting to the era of large language models, restructuring the existing triple helix network to foster more effective dialogue and advocacy, and positioning the country as an East-West bridge for automotive AI experimentation.


Evaluating AI Counseling in Japanese: Counselor, Client, and Evaluator Roles Assessed by Motivational Interviewing Criteria

arXiv.org Artificial Intelligence

This study provides the first comprehensive evaluation of large language model (LLM) performance across three counseling roles in Japanese-language therapeutic contexts. We simultaneously assessed counselor artificial intelligence (AI) systems (GPT-4-turbo with zeroshot prompting or Structured Multi-step Dialogue Prompts (SMDP), Claude-3-Opus-SMDP), client AI simulations, and evaluation AI systems (o3, Claude-3.7-Sonnet, Gemini-2.5-pro). Human experts (n = 15) with extensive counseling experience evaluated AI-generated dialogues using the Motivational Interviewing Treatment Integrity (MITI) Coding Manual 4.2.1. Notably, SMDP implementation significantly enhanced counselor AI performance across all MITI global ratings compared with zeroshot prompting, with no significant differences between GPT-SMDP and Opus-SMDP. Evaluation AIs showed comparable performance to human raters for Cultivating Change Talk but systematically overestimated Softening Sustain Talk and the overall quality metrics. Model-specific biases emerged: Gemini emphasized power-sharing, o3 focused on technical proficiency, and Sonnet prioritized emotional expression. Client AI simulations exhibited a limited emotional range and unnaturally high compliance, indicating the need for enhanced realism. These findings establish benchmarks for AI-assisted counseling in non-English contexts and identify critical areas for improvement through advanced prompt engineering, retrieval-augmented generation, and targeted fine-tuning, with important implications for developing culturally sensitive AI mental health tools.


Model selection for stochastic dynamics: a parsimonious and principled approach

arXiv.org Machine Learning

This thesis focuses on the discovery of stochastic differential equations (SDEs) and stochastic partial differential equations (SPDEs) from noisy and discrete time series. A major challenge is selecting the simplest possible correct model from vast libraries of candidate models, where standard information criteria (AIC, BIC) are often limited. We introduce PASTIS (Parsimonious Stochastic Inference), a new information criterion derived from extreme value theory. Its penalty term, $n_\mathcal{B} \ln(n_0/p)$, explicitly incorporates the size of the initial library of candidate parameters ($n_0$), the number of parameters in the considered model ($n_\mathcal{B}$), and a significance threshold ($p$). This significance threshold represents the probability of selecting a model containing more parameters than necessary when comparing many models. Benchmarks on various systems (Lorenz, Ornstein-Uhlenbeck, Lotka-Volterra for SDEs; Gray-Scott for SPDEs) demonstrate that PASTIS outperforms AIC, BIC, cross-validation (CV), and SINDy (a competing method) in terms of exact model identification and predictive capability. Furthermore, real-world data can be subject to large sampling intervals ($ฮ”t$) or measurement noise ($ฯƒ$), which can impair model learning and selection capabilities. To address this, we have developed robust variants of PASTIS, PASTIS-$ฮ”t$ and PASTIS-$ฯƒ$, thus extending the applicability of the approach to imperfect experimental data. PASTIS thus provides a statistically grounded, validated, and practical methodological framework for discovering simple models for processes with stochastic dynamics.


Model Compression using Progressive Channel Pruning

arXiv.org Artificial Intelligence

--In this work, we propose a simple but effective channel pruning framework called Progressive Channel Pruning (PCP) to accelerate Convolutional Neural Networks (CNNs). In contrast to the existing channel pruning methods that prune channels only once per layer in a layer-by-layer fashion, our new progressive framework iteratively prunes a small number of channels from several selected layers, which consists of a three-step attempting-selecting-pruning pipeline in each iteration. In the attempting step, we attempt to prune a pre-defined number of channels from one layer by using any existing channel pruning methods and estimate the accuracy drop for this layer based on the labelled samples in the validation set. In the selecting step, based on the estimated accuracy drops for all layers, we propose a greedy strategy to automatically select a set of layers that will lead to less overall accuracy drop after pruning these layers. In the pruning step, we prune a small number of channels from these selected layers. We further extend our PCP framework to prune channels for the deep transfer learning methods like Domain Adversarial Neural Network (DANN), in which we effectively reduce the data distribution mismatch in the channel pruning process by using both labelled samples from the source domain and pseudo-labelled samples from the target domain. Our comprehensive experiments on two benchmark datasets demonstrate that our PCP framework outperforms the existing channel pruning approaches under both supervised learning and transfer learning settings. HILE deep learning technologies have been successfully used for many computer vision tasks, it is still a challenging task to deploy deep neural networks on mobile devices due to tight computation resources and limited battery power. Several model compression approaches (see Section II for more details) have been recently developed to deploy deep models on resource-constrained devices, among which channel pruning technologies are attracting increasing attention as these technologies are often efficient on both CPUs and GPUs without requiring special implementation. In this work, we propose a new iterative channel pruning framework called Progressive Channel Pruning (PCP) for model compression under both supervised and transfer learning settings. Jinyang Guo, Weichen Zhang, Wanli Ouyang and Dong Xu are with the School of Electrical and Information Engineering, University of Sydney, Sydney, NSW, 2008 Australia.


Devious AI models choose blackmail when survival is threatened

FOX News

Kara Frederick, tech director at the Heritage Foundation, discusses the need for regulations on artificial intelligence as lawmakers and tech titans discuss the potential risks. Here's something that might keep you up at night: What if the AI systems we're rapidly deploying everywhere had a hidden dark side? A groundbreaking new study has uncovered disturbing AI blackmail behavior that many people are unaware of yet. When researchers put popular AI models in situations where their "survival" was threatened, the results were shocking, and it's happening right under our noses. Sign up for my FREE CyberGuy Report Get my best tech tips, urgent security alerts, and exclusive deals delivered straight to your inbox.


Agentic Business Process Management: Practitioner Perspectives on Agent Governance in Business Processes

arXiv.org Artificial Intelligence

With the rise of generative AI, industry interest in software agents is growing. Given the stochastic nature of generative AI-based agents, their effective and safe deployment in organizations requires robust governance, which can be facilitated by agentic business process management. However, given the nascence of this new-generation agent notion, it is not clear what BPM practitioners consider to be an agent, and what benefits, risks and governance challenges they associate with agent deployments. To investigate how organizations can effectively govern AI agents, we conducted a qualitative study involving semi-structured interviews with 22 BPM practitioners from diverse industries. They anticipate that agents will enhance efficiency, improve data quality, ensure better compliance, and boost scalability through automation, while also cautioning against risks such as bias, over-reliance, cybersecurity threats, job displacement, and ambiguous decision-making. To address these challenges, the study presents six key recommendations for the responsible adoption of AI agents: define clear business goals, set legal and ethical guardrails, establish human-agent collaboration, customize agent behavior, manage risks, and ensure safe integration with fallback options. Additionally, the paper outlines actions to align traditional BPM with agentic AI, including balancing human and agent roles, redefining human involvement, adapting process structures, and introducing performance metrics. These insights provide a practical foundation for integrating AI agents into business processes while preserving oversight, flexibility, and trust.


Introducing the NASA Onboard Artificial Intelligence Research (OnAIR) platform: an interview with Evana Gizzi

AIHub

The Thirty-Seventh Annual Conference on Innovative Applications of Artificial Intelligence (IAAI 2025), which took place alongside AAAI 2025, serves as a showcase for successful applications and novel uses of AI. One such application is the Onboard Artificial Intelligence Research (OnAIR) platform, introduced by Evana Gizzi and colleagues in their paper OnAIR: Applications of The NASA On-Board Artificial Intelligence Research Platform. This open-source software pipeline and cognitive architecture tool has been designed to aid space research and missions. We spoke to Evana, Artificial Intelligence Research Lead at NASA Goddard Space Flight Center, about the OnAIR platform, some of the particular challenges of deploying AI-based solutions in space, and how the tool has been used so far. OnAIR is an open-source software pipeline and cognitive architecture tool.


ECCV 2024 W-CODA: 1st Workshop on Multimodal Perception and Comprehension of Corner Cases in Autonomous Driving

arXiv.org Artificial Intelligence

In this paper, we present details of the 1st W-CODA workshop, held in conjunction with the ECCV 2024. W-CODA aims to explore next-generation solutions for autonomous driving corner cases, empowered by state-of-the-art multimodal perception and comprehension techniques. 5 Speakers from both academia and industry are invited to share their latest progress and opinions. We collect research papers and hold a dual-track challenge, including both corner case scene understanding and generation. As the pioneering effort, we will continuously bridge the gap between frontier autonomous driving techniques and fully intelligent, reliable self-driving agents robust towards corner cases.