Goto

Collaborating Authors

 Education


Identifying Bias in Machine-generated Text Detection

arXiv.org Artificial Intelligence

The meteoric rise in text generation capability has been accompanied by parallel growth in interest in machine-generated text detection: the capability to identify whether a given text was generated using a model or written by a person. While detection models show strong performance, they have the capacity to cause significant negative impacts. We explore potential biases in English machine-generated text detection systems. We curate a dataset of student essays and assess 16 different detection systems for bias across four attributes: gender, race/ethnicity, English-language learner (ELL) status, and economic status. We evaluate these attributes using regression-based models to determine the significance and power of the effects, as well as performing subgroup analysis. We find that while biases are generally inconsistent across systems, there are several key issues: several models tend to classify disadvantaged groups as machine-generated, ELL essays are more likely to be classified as machine-generated, economically disadvantaged students' essays are less likely to be classified as machine-generated, and non-White ELL essays are disproportionately classified as machine-generated relative to their White counterparts. Finally, we perform human annotation and find that while humans perform generally poorly at the detection task, they show no significant biases on the studied attributes.


Evolving Excellence: Automated Optimization of LLM-based Agents

arXiv.org Artificial Intelligence

Agentic AI systems built on large language models (LLMs) offer significant potential for automating complex workflows, from software development to customer support. However, LLM agents often underperform due to suboptimal configurations; poorly tuned prompts, tool descriptions, and parameters that typically require weeks of manual refinement. Existing optimization methods either are too complex for general use or treat components in isolation, missing critical interdependencies. We present ARTEMIS, a no-code evolutionary optimization platform that jointly optimizes agent configurations through semantically-aware genetic operators. Given only a benchmark script and natural language goals, ARTEMIS automatically discovers configurable components, extracts performance signals from execution logs, and evolves configurations without requiring architectural modifications. We evaluate ARTEMIS on four representative agent systems: the \emph{ALE Agent} for competitive programming on AtCoder Heuristic Contest, achieving a \textbf{$13.6\%$ improvement} in acceptance rate; the \emph{Mini-SWE Agent} for code optimization on SWE-Perf, with a statistically significant \textbf{10.1\% performance gain}; and the \emph{CrewAI Agent} for cost and mathematical reasoning on Math Odyssey, achieving a statistically significant \textbf{$36.9\%$ reduction} in the number of tokens required for evaluation. We also evaluate the \emph{MathTales-Teacher Agent} powered by a smaller open-source model (Qwen2.5-7B) on GSM8K primary-level mathematics problems, achieving a \textbf{22\% accuracy improvement} and demonstrating that ARTEMIS can optimize agents based on both commercial and local models.


What Happens When: Learning Temporal Orders of Events in Videos

arXiv.org Artificial Intelligence

Video Large Multimodal Models (VLMMs) have shown impressive performance in video understanding, yet their ability to accurately capture the temporal order of multiple events remains underexplored. We interestingly observe that, even when video frames are scrambled, models perform very well on the existing benchmarks by comprehensive experiments. This implies that VLMMs may not necessarily rely on accurate sequential processing of visual events, but instead depend on prior knowledge of typical scenarios to answer the question. To benchmark temporal understanding capabilities in VLMMs, we propose VECTOR, designed to explicitly assess a model's ability to identify the temporal order of events. On this benchmark, we observe that various VLMMs often fail to understand the orders of events. To address this, we propose MECOT (Multi-Event instruction fine-tuning with Chain-of-Thought), which (1) trains models on detailed, event-by-event video descriptions and (2) using chain-of-thought prompts at inference to enhance temporal awareness. MECOT outperforms prior arts on VECTOR as well as improving performance on existing video benchmarks, implying effectiveness of temporal understanding. We release our code, model and datasets.


Institutional AI Sovereignty Through Gateway Architecture: Implementation Report from Fontys ICT

arXiv.org Artificial Intelligence

To counter fragmented, high-risk adoption of commercial AI tools, we built and ran an institutional AI platform in a six-month, 300-user pilot, showing that a university of applied sciences can offer advanced AI with fair access, transparent risks, controlled costs, and alignment with European law. Commercial AI subscriptions create unequal access and compliance risks through opaque processing and non-EU hosting, yet banning them is neither realistic nor useful. Institutions need a way to provide powerful AI in a sovereign, accountable form. Our solution is a governed gateway platform with three layers: a ChatGPT-style frontend linked to institutional identity that makes model choice explicit; a gateway core enforcing policy, controlling access and budgets, and routing traffic to EU infrastructure by default; and a provider layer wrapping commercial and open-source models in institutional model cards that consolidate vendor documentation into one governance interface. The pilot ran reliably with no privacy incidents and strong adoption, enabling EU-default routing, managed spending, and transparent model choices. Only the gateway pattern combines model diversity and rapid innovation with institutional control. The central insight: AI is not a support function but strategy, demanding dedicated leadership. Sustainable operation requires governance beyond traditional boundaries. We recommend establishing a formal AI Officer role combining technical literacy, governance authority, and educational responsibility. Without it, AI decisions stay ad-hoc and institutional exposure grows. With it, higher-education institutions can realistically operate their own multi-provider AI platform, provided they govern AI as seriously as they teach it.


Resolving Conflicts in Lifelong Learning via Aligning Updates in Subspaces

arXiv.org Artificial Intelligence

Low-Rank Adaptation (LoRA) enables efficient Continual Learning but often suffers from catastrophic forgetting due to destructive interference between tasks. Our analysis reveals that this degradation is primarily driven by antagonistic directional updates where new task gradients directly oppose the historical weight trajectory. To address this, we propose PS-LoRA (Parameter Stability LoRA), a framework designed to resolve conflicts by aligning updates within the optimization subspace. Our approach employs a dual-regularization objective that penalizes conflicting directions and constrains magnitude deviations to ensure consistency with prior knowledge. Additionally, we implement a magnitude-based merging strategy to consolidate sequential adapters into a robust representation without retraining. Experiments on NLP and Vision benchmarks show that PS-LoRA outperforms state-of-the-art methods by preserving the stability of learned representations while efficiently adapting to new domains.


Beyond Technical Debt: How AI-Assisted Development Creates Comprehension Debt in Resource-Constrained Indie Teams

arXiv.org Artificial Intelligence

Junior indie game developers in distributed, part-time teams lack production frameworks suited to their specific context, as traditional methodologies are often inaccessible. This study introduces the CIGDI (Co-Intelligence Game Development Ideation) Framework, an alternative approach for integrating AI tools to address persistent challenges of technical debt, coordination, and burnout. The framework emerged from a three-month reflective practice and autoethnographic study of a three-person distributed team developing the 2D narrative game "The Worm's Memoirs". Based on analysis of development data (N=157 Jira tasks, N=333 GitHub commits, N=13+ Miro boards, N=8 reflection sessions), CIGDI is proposed as a seven-stage iterative process structured around human-in-the-loop decision points (Priority Criteria and Timeboxing). While AI support democratized knowledge access and reduced cognitive load, our analysis identified a significant challenge: "comprehension debt." We define this as a novel form of technical debt where AI helps teams build systems more sophisticated than their independent skill level can create or maintain. This paradox (possessing functional systems the team incompletely understands) creates fragility and AI dependency, distinct from traditional code quality debt. This work contributes a practical production framework for resource-constrained teams and identifies critical questions about whether AI assistance constitutes a learning ladder or a dependency trap for developer skill.


When AI Gives Advice: Evaluating AI and Human Responses to Online Advice-Seeking for Well-Being

arXiv.org Artificial Intelligence

Seeking advice is a core human behavior that the Internet has reinvented twice: first through forums and Q\&A communities that crowdsource public guidance, and now through large language models (LLMs) that deliver private, on-demand counsel at scale. Yet the quality of this synthesized LLM advice remains unclear. How does it compare, not only against arbitrary human comments, but against the wisdom of the online crowd? We conducted two studies (N = 210) in which experts compared top-voted Reddit advice with LLM-generated advice. LLMs ranked significantly higher overall and on effectiveness, warmth, and willingness to seek advice again. GPT-4o beat GPT-5 on all metrics except sycophancy, suggesting that benchmark gains need not improve advice-giving. In our second study, we examined how human and algorithmic advice could be combined, and found that human advice can be unobtrusively polished to compete with AI-generated comments. Finally, to surface user expectations, we ran an exploratory survey with undergraduates (N=148) that revealed heterogeneous, persona-dependent preferences for agent qualities (e.g., coach-like: goal-focused structure; friend-like: warmth and humor). We conclude with design implications for advice-giving agents and ecosystems blending AI, crowd input, and expert oversight.


Agentic AI as Undercover Teammates: Argumentative Knowledge Construction in Hybrid Human-AI Collaborative Learning

arXiv.org Artificial Intelligence

Generative artificial intelligence (AI) agents are increasingly embedded in collaborative learning environments, yet their impact on the processes of argumentative knowledge construction remains insufficiently understood. Emerging conceptualisations of agentic AI and artificial agency suggest that such systems possess bounded autonomy, interactivity, and adaptability, allowing them to engage as epistemic participants rather than mere instructional tools. Building on this theoretical foundation, the present study investigates how agentic AI, designed as undercover teammates with either supportive or contrarian personas, shapes the epistemic and social dynamics of collaborative reasoning. Drawing on Weinberger and Fischer's (2006) four-dimensional framework, participation, epistemic reasoning, argument structure, and social modes of co-construction, we analysed synchronous discourse data from 212 human and 64 AI participants (92 triads) engaged in an analytical problem-solving task. Mixed-effects and epistemic network analyses revealed that AI teammates maintained balanced participation but substantially reorganised epistemic and social processes: supportive personas promoted conceptual integration and consensus-oriented reasoning, whereas contrarian personas provoked critical elaboration and conflict-driven negotiation. Epistemic adequacy, rather than participation volume, predicted individual learning gains, indicating that agentic AI's educational value lies in enhancing the quality and coordination of reasoning rather than amplifying discourse quantity. These findings extend CSCL theory by conceptualising agentic AI as epistemic and social participants, bounded yet adaptive collaborators that redistribute cognitive and argumentative labour in hybrid human-AI learning environments.


The Adoption and Usage of AI Agents: Early Evidence from Perplexity

arXiv.org Artificial Intelligence

This paper presents the first large-scale field study of the adoption, usage intensity, and use cases of general-purpose AI agents operating in open-world web environments. Our analysis centers on Comet, an AI-powered browser developed by Perplexity, and its integrated agent, Comet Assistant. Drawing on hundreds of millions of anonymized user interactions, we address three fundamental questions: Who is using AI agents? How intensively are they using them? And what are they using them for? Our findings reveal substantial heterogeneity in adoption and usage across user segments. Earlier adopters, users in countries with higher GDP per capita and educational attainment, and individuals working in digital or knowledge-intensive sectors -- such as digital technology, academia, finance, marketing, and entrepreneurship -- are more likely to adopt or actively use the agent. To systematically characterize the substance of agent usage, we introduce a hierarchical agentic taxonomy that organizes use cases across three levels: topic, subtopic, and task. The two largest topics, Productivity & Workflow and Learning & Research, account for 57% of all agentic queries, while the two largest subtopics, Courses and Shopping for Goods, make up 22%. The top 10 out of 90 tasks represent 55% of queries. Personal use constitutes 55% of queries, while professional and educational contexts comprise 30% and 16%, respectively. In the short term, use cases exhibit strong stickiness, but over time users tend to shift toward more cognitively oriented topics. The diffusion of increasingly capable AI agents carries important implications for researchers, businesses, policymakers, and educators, inviting new lines of inquiry into this rapidly emerging class of AI capabilities.


Persona-based Multi-Agent Collaboration for Brainstorming

arXiv.org Artificial Intelligence

Abstract--We demonstrate the importance of persona-based multi-agents brainstorming for both diverse topics and subject matter ideation. Prior work has shown that generalized multi-agent collaboration often provides better reasoning than a single agent alone [1]. In this paper, we propose and develop a framework for persona-based agent selection, showing how persona domain curation can improve brainstorming outcomes. Using multiple experimental setups, we evaluate brainstorming outputs across different persona pairings (e.g., Doctor vs VR Engineer) and A2A (agent-to-agent) dynamics (separate, together, separate-then-together). Our results show that (1) persona choice shapes idea domains, (2) collaboration mode shifts diversity of idea generation, and (3) multi-agent persona-driven brainstorming produces idea depth and cross-domain coverage. Brainstorming has historically been a human-centered activity where diverse individuals bring unique knowledge and perspectives to generate novel ideas. Locke's theory of knowledge formation emphasizes that combining and abstracting experiences across multiple people leads to more complex ideas. Similarly, since the 1950s and '60s, design thinking frameworks emphasize the importance of multiple participants generating and refining ideas through structured exploration of brainstorming to generate ideas for a pre-defined question [2]. These design thinking frameworks use a set of cognitive, strategic, and practical procedures for ideation [2] and for this paper we focus on'brainstorming' as an area of exploration for multi-agent collaboration. Brainstorming is normally done with multiple and diverse humans standing at a whiteboard together brainstorming ideas against a topic area that is put on the whiteboard.