Goto

Collaborating Authors

 Generative AI


Will AI wipe out the first rung of the career ladder?

The Guardian

This week, I'm wondering what my first jobs in journalism would have been like had generative AI been around. In other news: Elon Musk leaves a trail of chaos, and influencers are selling the text they fed to AI to make art. Generative artificial intelligence may eliminate the job you got with your diploma still in hand, say executives who offered grim assessments of the entry-level job market last week in multiple forums. Dario Amodei, CEO of Anthropic, which makes the multifunctional AI model Claude, told Axios last week that he believes that AI could cut half of all entry-level white-collar jobs and send overall unemployment rocketing to 20% within the next five years. One explanation why an AI company CEO might make such a dire prediction is to hype the capabilities of his product.


'Nobody wants a robot to read them a story!' The creatives and academics rejecting AI – at work and at home

The Guardian

The novelist Ewan Morrison was alarmed, though amused, to discover he had written a book called Nine Inches Pleases a Lady. Intrigued by the limits of generative artificial intelligence (AI), he had asked ChatGPT to give him the names of the 12 novels he had written. "I've only written nine," he says. "Always eager to please, it decided to invent three." The "nine inches" from the fake title it hallucinated was stolen from a filthy Robert Burns poem.


Respond Beyond Language: A Benchmark for Video Generation in Response to Realistic User Intents

arXiv.org Artificial Intelligence

Querying generative AI models, e.g., large language models (LLMs), has become a prevalent method for information acquisition. However, existing query-answer datasets primarily focus on textual responses, making it challenging to address complex user queries that require visual demonstrations or explanations for better understanding. To bridge this gap, we construct a benchmark, RealVideoQuest, designed to evaluate the abilities of text-to-video (T2V) models in answering real-world, visually grounded queries. It identifies 7.5K real user queries with video response intents from Chatbot-Arena and builds 4.5K high-quality query-video pairs through a multistage video retrieval and refinement process. We further develop a multi-angle evaluation system to assess the quality of generated video answers. Experiments indicate that current T2V models struggle with effectively addressing real user queries, pointing to key challenges and future research opportunities in multimodal AI.


Human-Centric Evaluation for Foundation Models

arXiv.org Artificial Intelligence

Currently, nearly all evaluations of foundation models focus on objective metrics, emphasizing quiz performance to define model capabilities. While this model-centric approach enables rapid performance assessment, it fails to reflect authentic human experiences. To address this gap, we propose a Human-Centric subjective Evaluation (HCE) framework, focusing on three core dimensions: problem-solving ability, information quality, and interaction experience. Through experiments involving Deepseek R1, OpenAI o3 mini, Grok 3, and Gemini 2.5, we conduct over 540 participant-driven evaluations, where humans and models collaborate on open-ended research tasks, yielding a comprehensive subjective dataset. This dataset captures diverse user feedback across multiple disciplines, revealing distinct model strengths and adaptability. Our findings highlight Grok 3's superior performance, followed by Deepseek R1 and Gemini 2.5, with OpenAI o3 mini lagging behind. By offering a novel framework and a rich dataset, this study not only enhances subjective evaluation methodologies but also lays the foundation for standardized, automated assessments, advancing LLM development for research and practical scenarios. Our dataset link is https://github.com/yijinguo/Human-Centric-Evaluation.


Agentic AI and Multiagentic: Are We Reinventing the Wheel?

arXiv.org Artificial Intelligence

The terms Agentic AI and Multiagentic AI have recently gained popularity in discussions on generative artificial intelligence, often used to describe autonomous software agents and systems composed of such agents. However, the use of these terms confuses these buzzwords with well-established concepts in AI literature: intelligent agents and multi-agent systems. This article offers a critical analysis of this conceptual misuse. We review the theoretical origins of "agentic" in the social sciences (Bandura, 1986) and philosophical notions of intentionality (Dennett, 1971), and then summarise foundational works on intelligent agents and multi-agent systems by Wooldridge, Jennings and others. We examine classic agent architectures, from simple reactive agents to Belief-Desire-Intention (BDI) models, and highlight key properties (autonomy, reactivity, proactivity, social capability) that define agency in AI. We then discuss recent developments in large language models (LLMs) and agent platforms based on LLMs, including the emergence of LLM-powered AI agents and open-source multi-agent orchestration frameworks. We argue that the term AI Agentic is often used as a buzzword for what are essentially AI agents, and AI Multiagentic for what are multi-agent systems. This confusion overlooks decades of research in the field of autonomous agents and multi-agent systems. The article advocates for scientific and technological rigour and the use of established terminology from the state of the art in AI, incorporating the wealth of existing knowledge, including standards for multi-agent system platforms, communication languages and coordination and cooperation algorithms, agreement technologies (automated negotiation, argumentation, virtual organisations, trust, reputation, etc.), into the new and promising wave of LLM-based AI agents, so as not to end up reinventing the wheel.


FinRobot: Generative Business Process AI Agents for Enterprise Resource Planning in Finance

arXiv.org Artificial Intelligence

Enterprise Resource Planning (ERP) systems serve as the digital backbone of modern financial institutions, yet they continue to rely on static, rule-based workflows that limit adaptability, scalability, and intelligence. As business operations grow more complex and data-rich, conventional ERP platforms struggle to integrate structured and unstructured data in real time and to accommodate dynamic, cross-functional workflows. In this paper, we present the first AI-native, agent-based framework for ERP systems, introducing a novel architecture of Generative Business Process AI Agents (GBPAs) that bring autonomy, reasoning, and dynamic optimization to enterprise workflows. The proposed system integrates generative AI with business process modeling and multi-agent orchestration, enabling end-to-end automation of complex tasks such as budget planning, financial reporting, and wire transfer processing. Unlike traditional workflow engines, GBPAs interpret user intent, synthesize workflows in real time, and coordinate specialized sub-agents for modular task execution. We validate the framework through case studies in bank wire transfers and employee reimbursements, two representative financial workflows with distinct complexity and data modalities. Results show that GBPAs achieve up to 40% reduction in processing time, 94% drop in error rate, and improved regulatory compliance by enabling parallelism, risk control insertion, and semantic reasoning. These findings highlight the potential of GBPAs to bridge the gap between generative AI capabilities and enterprise-grade automation, laying the groundwork for the next generation of intelligent ERP systems.


DeepSeek in Healthcare: A Survey of Capabilities, Risks, and Clinical Applications of Open-Source Large Language Models

arXiv.org Artificial Intelligence

ABSTRACT DeepSeek - R1 is a cutting - edge open - source large language model (LLM) developed by DeepSeek, showcasing advanced reasoning capabilities through a hybrid architecture that integrates m ixture of e xperts (MoE), chain of thought (CoT) reasoning, and reinforcement learning. Released under the per missive MIT license, DeepSeek - R1 offers a transparent and cost - effective alternative to proprietary models like GPT - 4o and Claude - 3 Opus; i t excels in structured problem - solving domains such as mathematics, healthcare diagnostics, code generation, and phar maceutical research. Its architecture enables efficient inference while preserving reasoning depth, making it suitable for deployment in resource - constrained settings. However, DeepSeek - R1 also exhibits increased vulnerability to bias, misinformat ion, adversarial manipulation, and safety failures - especially in multilingual and ethically sensitive contexts. Th is survey highlights the model's strengths, including interpretability, scalability, and adaptability, alongside its limitations in general language fluency and safety alignment. Future research priorities include improving bias mitigation, natural language compreh ension, domain - specific validation, and regulatory compliance. Overall, DeepSeek - R1 represents a major advance in open, scalable AI, underscoring the need for collaborative governance to ensure responsible and equitable deployment. INTRODUCTION T he rise of AI and generative models in health and technology Artificial Intelligence (AI) has undergone transformative growth in recent years, profoundly reshaping numerous fields including language processing, automation, and complex decision - making. At its core, AI refers to the simulation of human intelligence by machines, enabling them to perform tasks such as speech recognition, natural lang uage understanding, visual perception, and predictive analytics. One of the recent remarkable advancements in the Generative AI domain is the emergence of DeepSeek - R1, a large language model (LLM) developed by the Chinese company DeepSeek. In benchmarking evaluations, it has demonstrated results competitive with, and in some domains superior to, models like OpenAI's GPT - 4o and GPT - o1 [4] . This has positioned DeepSeek - R1 as a notable advancement not only in LLM capability but also in the global AI development race. DeepSeek - R1: a paradigm shift in LLM development What sets DeepSeek - R1 apart from conventional LLMs is its novel training architecture. This hybrid approach mimics certain aspects of human learning, allowing the model to refine its behavior over time and adapt to mo re complex reasoning tasks.


Regulatory Graphs and GenAI for Real-Time Transaction Monitoring and Compliance Explanation in Banking

arXiv.org Artificial Intelligence

--This paper presents a real-time transaction monitoring framework that integrates graph-based modeling, narrative field embedding, and generative explanation to support automated financial compliance. The system constructs dynamic transaction graphs, extracts structural and contextual features, and classifies suspicious behavior using a graph neural network. A retrieval-augmented generation module generates natural-language explanations aligned with regulatory clauses for each flagged transaction. Experiments conducted on a simulated stream of financial data show that the proposed method achieves superior results, with 98.2% F1-score, 97.8% precision, and 97.0% recall. Expert evaluation further confirms the quality and interpretability of generated justifications. The findings demonstrate the potential of combining graph intelligence and generative models to support explainable, audit-ready compliance in high-risk financial environments. Graph-based analytics have become essential in financial crime detection due to their ability to represent relationships between clients, transactions, and geographic entities [1].


How do Transformer Embeddings Represent Compositions? A Functional Analysis

arXiv.org Artificial Intelligence

Compositionality is a key aspect of human intelligence, essential for reasoning and generalization. While transformer-based models have become the de facto standard for many language modeling tasks, little is known about how they represent compound words, and whether these representations are compositional. In this study, we test compositionality in Mistral, OpenAI Large, and Google embedding models, and compare them with BERT. First, we evaluate compositionality in the representations by examining six diverse models of compositionality (addition, multiplication, dilation, regression, etc.). We find that ridge regression, albeit linear, best accounts for compositionality. Surprisingly, we find that the classic vector addition model performs almost as well as any other model. Next, we verify that most embedding models are highly compositional, while BERT shows much poorer compositionality. We verify and visualize our findings with a synthetic dataset consisting of fully transparent adjective-noun compositions. Overall, we present a thorough investigation of compositionality.


'Humanity deserves better': iPhone designer on new partnership with OpenAI

The Guardian

The designer of the iPhone has promised his next artificial intelligence-enabled device will be driven by a sense that "humanity deserves better", after admitting feeling "responsibility" for some of the negative consequences of modern technology. Sir Jony Ive said his new partnership with OpenAI, the company behind ChatGPT, would renew his optimism about technology, amid widespread concerns about the impact of smartphones and social media. In an interview with the Financial Times, London-born Ive declined to give details about the device he is developing with OpenAI, but indicated unease about people's relationship with some tech products. "Many of us would say we have an uneasy relationship with technology at the moment," he said. He added that the device's design would be driven by "a sense of'we deserve better. However, Ive, Apple's former chief design officer, said he felt the burden of the negative impact of modern technology products. "While some of the less positive consequences were unintentional, I still feel responsibility.