Goto

Collaborating Authors

 urgency


FABRIC: Framework for Agent-Based Realistic Intelligence Creation

Verma, Abhigya, Subramanian, Seganrasan, Kandasamy, Nandhakumar, Gupta, Naman

arXiv.org Artificial Intelligence

Large language models (LLMs) are increasingly deployed as agents, expected to decompose goals, invoke tools, and verify results in dynamic environments. Realizing these capabilities requires access to agentic data-structured interaction records that couple user intents with tool specifications, argument-grounded calls, and verifiable execution traces. However, collecting such data from human annotators is costly, time-consuming, and difficult to scale. We present a unified framework for synthesizing agentic data using only LLMs, without any human-in-the-loop supervision. This framework decomposes generation into modular pipelines that produce complete interaction records spanning task specifications, tool definitions, policy pseudocode, natural language exchanges, and execution traces. Records conform to strict syntactic and semantic constraints, ensuring machine-parseability and faithful alignment across inputs, outputs, and tool calls. Beyond single tasks, there is support for both multi-task and multi-turn agent interactions, enabling the construction of datasets that reflect the full spectrum of tool-use competencies. To ensure quality and consistency, the framework integrates constrained generation formats, JSON-schema validation, and judge-based filtering. This paper formalizes the schema for agentic records, details the prompt design principles that guide generation, and introduces scalable pipelines for high-quality synthetic data. By providing a reproducible, LLM-only alternative to manual collection, hence advancing the development of agentic LLMs capable of robust tool use.


RoleConflictBench: A Benchmark of Role Conflict Scenarios for Evaluating LLMs' Contextual Sensitivity

Shin, Jisu, Song, Hoyun, Oh, Juhyun, Ko, Changgeon, Kim, Eunsu, Jung, Chani, Oh, Alice

arXiv.org Artificial Intelligence

Humans often encounter role conflicts -- social dilemmas where the expectations of multiple roles clash and cannot be simultaneously fulfilled. As large language models (LLMs) become increasingly influential in human decision-making, understanding how they behave in complex social situations is essential. While previous research has evaluated LLMs' social abilities in contexts with predefined correct answers, role conflicts represent inherently ambiguous social dilemmas that require contextual sensitivity: the ability to recognize and appropriately weigh situational cues that can fundamentally alter decision priorities. To address this gap, we introduce RoleConflictBench, a novel benchmark designed to evaluate LLMs' contextual sensitivity in complex social dilemmas. Our benchmark employs a three-stage pipeline to generate over 13K realistic role conflict scenarios across 65 roles, systematically varying their associated expectations (i.e., their responsibilities and obligations) and situational urgency levels. By analyzing model choices across 10 different LLMs, we find that while LLMs show some capacity to respond to these contextual cues, this sensitivity is insufficient. Instead, their decisions are predominantly governed by a powerful, inherent bias related to social roles rather than situational information. Our analysis quantifies these biases, revealing a dominant preference for roles within the Family and Occupation domains, as well as a clear prioritization of male roles and Abrahamic religions across most evaluatee models.


Visual Authority and the Rhetoric of Health Misinformation: A Multimodal Analysis of Social Media Videos

Zarei, Mohammad Reza, Stead-Coyle, Barbara, Christensen, Michael, Everts, Sarah, Komeili, Majid

arXiv.org Artificial Intelligence

Short form video platforms are central sites for health advice, where alternative narratives mix useful, misleading, and harmful content. Rather than adjudicating truth, this study examines how credibility is packaged in nutrition and supplement videos by analyzing the intersection of authority signals, narrative techniques, and monetization. We assemble a cross platform corpus of 152 public videos from TikTok, Instagram, and YouTube and annotate each on 26 features spanning visual authority, presenter attributes, narrative strategies, and engagement cues. A transparent annotation pipeline integrates automatic speech recognition, principled frame selection, and a multimodal model, with human verification on a stratified subsample showing strong agreement. Descriptively, a confident single presenter in studio or home settings dominates, and clinical contexts are rare. Analytically, authority cues such as titles, slides and charts, and certificates frequently occur with persuasive elements including jargon, references, fear or urgency, critiques of mainstream medicine, and conspiracies, and with monetization including sales links and calls to subscribe. References and science like visuals often travel with emotive and oppositional narratives rather than signaling restraint.


ParaEQsA: Parallel and Asynchronous Embodied Questions Scheduling and Answering

Wang, Haisheng, Zhi, Weiming

arXiv.org Artificial Intelligence

This paper formulates the Embodied Questions Answering (EQsA) problem, introduces a corresponding benchmark, and proposes a system to tackle the problem. Classical Embodied Question Answering (EQA) is typically formulated as answering one single question by actively exploring a 3D environment. Real deployments, however, often demand handling multiple questions that may arrive asynchronously and carry different urgencies. We formalize this setting as Embodied Questions Answering (EQsA) and present ParaEQsA, a framework for parallel, urgency-aware scheduling and answering. ParaEQsA leverages a group memory module shared among questions to reduce redundant exploration, and a priority-planning module to dynamically schedule questions. To evaluate this setting, we contribute the Parallel Asynchronous Embodied Questions (PAEQs) benchmark containing 40 indoor scenes and five questions per scene (200 in total), featuring asynchronous follow-up questions and urgency labels. We further propose metrics for EQsA performance: Direct Answer Rate (DAR), and Normalized Urgency-Weighted Latency (NUWL), which jointly measure efficiency and responsiveness of this system. ParaEQsA consistently outperforms strong sequential baselines adapted from recent EQA systems, while reducing exploration and delay. Empirical evaluations investigate the relative contributions of priority, urgency modeling, spatial scope, reward estimation, and dependency reasoning within our framework. Together, these results demonstrate that urgency-aware, parallel scheduling is key to making embodied agents responsive and efficient under realistic, multi-question workloads.


Building the AI-enabled enterprise of the future

MIT Technology Review

"This is one of those inflection points where I don't think anybody really has a full view of the significance of the change this is going to have on not just companies but society as a whole," says Patrick Milligan, chief information security officer at Ford, which is making AI an important part of its transformation efforts and expanding its use across company operations. Given its game-changing potential--and the breakneck speed with which it is evolving--it is perhaps not surprising that companies are feeling the pressure to deploy AI as soon as possible: 98% say they feel an increased sense of urgency in the last year. And 85% believe they have less than 18 months to deploy an AI strategy or they will see negative business effects. Companies that take a "wait and see" approach will fall behind, says Jeetu Patel, president and chief product officer at Cisco. "If you wait for too long, you risk becoming irrelevant," he says.


CRMAgent: A Multi-Agent LLM System for E-Commerce CRM Message Template Generation

Quan, Yinzhu, Li, Xinrui, Chen, Ying

arXiv.org Artificial Intelligence

In e-commerce private-domain channels such as instant messaging and e-mail, merchants engage customers directly as part of their Customer Relationship Management (CRM) programmes to drive retention and conversion. While a few top performers excel at crafting outbound messages, most merchants struggle to write persuasive copy because they lack both expertise and scalable tools. We introduce CRMAgent, a multi-agent system built on large language models (LLMs) that generates high-quality message templates and actionable writing guidance through three complementary modes. First, group-based learning enables the agent to learn from a merchant's own top-performing messages within the same audience segment and rewrite low-performing ones. Second, retrieval-and-adaptation fetches templates that share the same audience segment and exhibit high similarity in voucher type and product category, learns their successful patterns, and adapts them to the current campaign. Third, a rule-based fallback provides a lightweight zero-shot rewrite when no suitable references are available. Extensive experiments show that CRMAgent consistently outperforms merchants' original templates, delivering significant gains in both audience-match and marketing-effectiveness metrics.


Symmetric Policy Design for Multi-Agent Dispatch Coordination in Supply Chains

Sudhakara, Sagar

arXiv.org Artificial Intelligence

We study a decentralized dispatch coordination problem in a multi-agent supply chain setting with shared logistics capacity. We propose symmetric (identical) dispatch strategies for all agents, enabling efficient coordination without centralized control. Using a common information approach, we derive a dynamic programming solution that computes optimal symmetric dispatch strategies by transforming the multi-agent problem into a tractable dynamic program on the agents common information state. Simulation results demonstrate that our method significantly reduces coordination cost compared to baseline heuristics, including belief-based strategies and an always-dispatch policy. These findings highlight the benefits of combining symmetric strategy design with a common information-based dynamic programming framework for improving multi-agent coordination performance.


LEAVS: An LLM-based Labeler for Abdominal CT Supervision

Lanfredi, Ricardo Bigolin, Zhuang, Yan, Finkelstein, Mark, Balamuralikrishna, Praveen Thoppey Srinivasan, Krembs, Luke, Khoury, Brandon, Reddy, Arthi, Mukherjee, Pritam, Rofsky, Neil M., Summers, Ronald M.

arXiv.org Artificial Intelligence

Extracting structured labels from radiology reports has been employed to create vision models to simultaneously detect several types of abnormalities. However, existing works focus mainly on the chest region. Few works have been investigated on abdominal radiology reports due to more complex anatomy and a wider range of pathologies in the abdomen. We propose LEAVS (Large language model Extractor for Abdominal Vision Supervision). This labeler can annotate the certainty of presence and the urgency of seven types of abnormalities for nine abdominal organs on CT radiology reports. To ensure broad coverage, we chose abnormalities that encompass most of the finding types from CT reports. Our approach employs a specialized chain-of-thought prompting strategy for a locally-run LLM using sentence extraction and multiple-choice questions in a tree-based decision system. We demonstrate that the LLM can extract several abnormality types across abdominal organs with an average F1 score of 0.89, significantly outperforming competing labelers and humans. Additionally, we show that extraction of urgency labels achieved performance comparable to human annotations. Finally, we demonstrate that the abnormality labels contain valuable information for training a single vision model that classifies several organs as normal or abnormal. We release our code and structured annotations for a public CT dataset containing over 1,000 CT volumes.


Efficient VoIP Communications through LLM-based Real-Time Speech Reconstruction and Call Prioritization for Emergency Services

Venkateshperumal, Danush, Rafi, Rahman Abdul, Ahmed, Shakil, Khokhar, Ashfaq

arXiv.org Artificial Intelligence

Emergency communication systems face disruptions due to packet loss, bandwidth constraints, poor signal quality, delays, and jitter in VoIP systems, leading to degraded real-time service quality. Victims in distress often struggle to convey critical information due to panic, speech disorders, and background noise, further complicating dispatchers' ability to assess situations accurately. Staffing shortages in emergency centers exacerbate delays in coordination and assistance. This paper proposes leveraging Large Language Models (LLMs) to address these challenges by reconstructing incomplete speech, filling contextual gaps, and prioritizing calls based on severity. The system integrates real-time transcription with Retrieval-Augmented Generation (RAG) to generate contextual responses, using Twilio and AssemblyAI APIs for seamless implementation. Evaluation shows high precision, favorable BLEU and ROUGE scores, and alignment with real-world needs, demonstrating the model's potential to optimize emergency response workflows and prioritize critical cases effectively.


The United Nations Wants to Treat AI With the Same Urgency as Climate Change

WIRED

A United Nations report released today proposes having the international body oversee the first truly global effort for monitoring and governing artificial intelligence. The report, produced by the UN secretary general's High Level Advisory Body on AI, recommends the creation of a body similar to the Intergovernmental Panel on Climate Change to gather up-to-date information on AI and its risks. The report calls for a new policy dialog on AI so that the UN's 193 members can discuss risks and agree upon actions. It further recommends that the UN take steps to empower poorer nations, especially those in the global south, to benefit from AI and contribute to its governance. These should include, it says, creating an AI fund to back projects in these nations, establishing AI standards and data-sharing systems, and creating resources such as training to help nations with AI governance.