AITopics

2504.12682

Country: North America > United States > California (0.67)

Genre: Research Report (0.66)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

arXiv.org Artificial IntelligenceApr-18-2025

REAL: Benchmarking Autonomous Agents on Deterministic Simulations of Real Websites

Garg, Divyansh, VanWeelden, Shaun, Caples, Diego, Draguns, Andis, Ravi, Nikil, Putta, Pranav, Garg, Naman, Abraham, Tomas, Lara, Michael, Lopez, Federico, Liu, James, Gundawar, Atharva, Hebbar, Prannay, Joo, Youngchul, Gu, Jindong, London, Charles, de Witt, Christian Schroeder, Motwani, Sumeet

We introduce REAL, a benchmark and framework for multi-turn agent evaluations on deterministic simulations of real-world websites. REAL comprises high-fidelity, deterministic replicas of 11 widely-used websites across domains such as e-commerce, travel, communication, and professional networking. We also release a benchmark consisting of 112 practical tasks that mirror everyday complex user interactions requiring both accurate information retrieval and state-changing actions. All interactions occur within this fully controlled setting, eliminating safety risks and enabling robust, reproducible evaluation of agent capability and reliability. Our novel evaluation framework combines programmatic checks of website state for action-based tasks with rubric-guided LLM-based judgments for information retrieval. The framework supports both open-source and proprietary agent systems through a flexible evaluation harness that accommodates black-box commands within browser environments, allowing research labs to test agentic systems without modification. Our empirical results show that frontier language models achieve at most a 41% success rate on REAL, highlighting critical gaps in autonomous web navigation and task completion capabilities. Our framework supports easy integration of new tasks, reproducible evaluation, and scalable post-training data generation, marking a significant step forward in evaluating and advancing agent capabilities.

large language model, machine learning, natural language, (18 more...)

2504.11543

Country: Europe (0.28)

Genre: Research Report > New Finding (0.66)

Industry: Information Technology > Services > e-Commerce Services (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Zhu, Feng, Mitra, Aritra, Heath, Robert W.

Achieving Tighter Finite-Time Rates for Heterogeneous Federated Stochastic Approximation under Markovian Sampling

arXiv.org Artificial IntelligenceApr-17-2025

Motivated by collaborative reinforcement learning (RL) and optimization with time-correlated data, we study a generic federated stochastic approximation problem involving $M$ agents, where each agent is characterized by an agent-specific (potentially nonlinear) local operator. The goal is for the agents to communicate intermittently via a server to find the root of the average of the agents' local operators. The generality of our setting stems from allowing for (i) Markovian data at each agent and (ii) heterogeneity in the roots of the agents' local operators. The limited recent work that has accounted for both these features in a federated setting fails to guarantee convergence to the desired point or to show any benefit of collaboration; furthermore, they rely on projection steps in their algorithms to guarantee bounded iterates. Our work overcomes each of these limitations. We develop a novel algorithm titled \texttt{FedHSA}, and prove that it guarantees convergence to the correct point, while enjoying an $M$-fold linear speedup in sample-complexity due to collaboration. To our knowledge, \emph{this is the first finite-time result of its kind}, and establishing it (without relying on a projection step) entails a fairly intricate argument that accounts for the interplay between complex temporal correlations due to Markovian sampling, multiple local steps to save communication, and the drift-effects induced by heterogeneous local operators. Our results have implications for a broad class of heterogeneous federated RL problems (e.g., policy evaluation and control) with function approximation, where the agents' Markov decision processes can differ in their probability transition kernels and reward functions.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2504.11645

Country: North America > United States (0.67)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

arXiv.org Artificial IntelligenceApr-16-2025

Dynamic Compressing Prompts for Efficient Inference of Large Language Models

Hu, Jinwu, Zhang, Wei, Wang, Yufeng, Hu, Yu, Xiao, Bin, Tan, Mingkui, Du, Qing

--Large Language Models (LLMs) have shown outstanding performance across a variety of tasks, partly due to advanced prompting techniques. However, these techniques often require lengthy prompts, which increase computational costs and can hinder performance because of the limited context windows of LLMs. While prompt compression is a straightforward solution, existing methods confront the challenges of retaining essential information, adapting to context changes, and remaining effective across different tasks. T o tackle these issues, we propose a task-agnostic method called Dynamic Compressing Prompts (LLM-DCP). Our method reduces the number of prompt tokens while aiming to preserve the performance as much as possible. We model prompt compression as a Markov Decision Process (MDP), enabling the DCP-Agent to sequentially remove redundant tokens by adapting to dynamic contexts and retaining crucial content. We develop a reward function for training the DCP-Agent that balances the compression rate, the quality of the LLM output, and the retention of key information. This allows for prompt token reduction without needing an external black-box LLM. Inspired by the progressive difficulty adjustment in curriculum learning, we introduce a Hierarchical Prompt Compression (HPC) training strategy that gradually increases the compression difficulty, enabling the DCP-Agent to learn an effective compression method that maintains information integrity. Experiments demonstrate that our method outperforms state-of-the-art techniques, especially at higher compression rates. The code for our approach will be available at https://github.com/Fhujinwu/DCP .

large language model, machine learning, natural language, (18 more...)

2504.11004

Country: Asia > China (0.69)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

arXiv.org Artificial IntelligenceApr-16-2025

Breaking the Data Barrier -- Building GUI Agents Through Task Generalization

Zhang, Junlei, Ding, Zichen, Ma, Chang, Chen, Zijie, Sun, Qiushi, Lan, Zhenzhong, He, Junxian

Graphical User Interface (GUI) agents offer cross-platform solutions for automating complex digital tasks, with significant potential to transform productivity workflows. However, their performance is often constrained by the scarcity of high-quality trajectory data. To address this limitation, we propose training Vision Language Models (VLMs) on data-rich, reasoning-intensive tasks during a dedicated mid-training stage, and then examine how incorporating these tasks facilitates generalization to GUI planning scenarios. Specifically, we explore a range of tasks with readily available instruction-tuning data, including GUI perception, multimodal reasoning, and textual reasoning. Through extensive experiments across 11 mid-training tasks, we demonstrate that: (1) Task generalization proves highly effective, yielding substantial improvements across most settings. For instance, multimodal mathematical reasoning enhances performance on AndroidWorld by an absolute 6.3%. Remarkably, text-only mathematical data significantly boosts GUI web agent performance, achieving a 5.6% improvement on WebArena and 5.4% improvement on AndroidWorld, underscoring notable cross-modal generalization from text-based to visual domains; (2) Contrary to prior assumptions, GUI perception data - previously considered closely aligned with GUI agent tasks and widely utilized for training - has a comparatively limited impact on final performance; (3) Building on these insights, we identify the most effective mid-training tasks and curate optimized mixture datasets, resulting in absolute performance gains of 8.0% on WebArena and 12.2% on AndroidWorld. Our work provides valuable insights into cross-domain knowledge transfer for GUI agents and offers a practical approach to addressing data scarcity challenges in this emerging field. The code, data and models will be available at https://github.com/hkust-nlp/GUIMid.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

2504.10127

Country: Asia (0.28)

Genre:

Research Report (1.00)
Workflow (0.88)

Industry: Education (0.46)

Technology:

Information Technology > Graphics (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
(3 more...)

Nishida, Naoto, Ishiguro, Yoshio, Rekiomto, Jun, Yamashita, Naomi

Dynamik: Syntactically-Driven Dynamic Font Sizing for Emphasis of Key Information

In today's globalized world, there are increasing opportunities for individuals to communicate using a common non-native language (lingua franca). Non-native speakers often have opportunities to listen to foreign languages, but may not comprehend them as fully as native speakers do. To aid real-time comprehension, live transcription of subtitles is frequently used in everyday life (e.g., during Zoom conversations, watching YouTube videos, or on social networking sites). However, simultaneously reading subtitles while listening can increase cognitive load. In this study, we propose Dynamik, a system that reduces cognitive load during reading by decreasing the size of less important words and enlarging important ones, thereby enhancing sentence contrast. Our results indicate that Dynamik can reduce certain aspects of cognitive load, specifically, participants' perceived performance and effort among individuals with low proficiency in English, as well as enhance the users' sense of comprehension, especially among people with low English ability. We further discuss our methods' applicability to other languages and potential improvements and further research directions.

artificial intelligence, machine learning, social media, (18 more...)

doi: 10.1145/3708359.3712115

2504.09734

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Europe > Italy > Sardinia > Cagliari (0.06)
North America > United States > New York > New York County > New York City (0.06)
(39 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Media (1.00)
Education (1.00)
Government > Regional Government > North America Government > United States Government (0.68)
(3 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Otter, Rick den, Weindel, Gabriel, Stuit, Sjoerd, van Maanen, Leendert

Sequence models for by-trial decoding of cognitive strategies from neural data

Understanding the sequence of cognitive operations that underlie decision-making is a fundamental challenge in cognitive neuroscience. Traditional approaches often rely on group-level statistics, which obscure trial-by-trial variations in cognitive strategies. In this study, we introduce a novel machine learning method that combines Hidden Multivariate Pattern analysis with a Structured State Space Sequence model to decode cognitive strategies from electroencephalography data at the trial level. We apply this method to a decision-making task, where participants were instructed to prioritize either speed or accuracy in their responses. Our results reveal an additional cognitive operation, labeled Confirmation, which seems to occur predominantly in the accuracy condition but also frequently in the speed condition. The modeled probability that this operation occurs is associated with higher probability of responding correctly as well as changes of mind, as indexed by electromyography data. By successfully modeling cognitive operations at the trial level, we provide empirical evidence for dynamic variability in decision strategies, challenging the assumption of homogeneous cognitive processes within experimental conditions. Our approach shows the potential of sequence modeling in cognitive neuroscience to capture trial-level variability that is obscured by aggregate analyses. The introduced method offers a new way to detect and understand cognitive strategies in a data-driven manner, with implications for both theoretical research and practical applications in many fields.

artificial intelligence, machine learning, opération, (20 more...)

2504.10028

Country:

Europe (0.67)
North America > United States (0.46)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Cognitive Science > Neuroscience (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Charvet, Valentin, Stein, Sebastian, Murray-Smith, Roderick

Improving Controller Generalization with Dimensionless Markov Decision Processes

Controllers trained with Reinforcement Learning tend to be very specialized and thus generalize poorly when their testing environment differs from their training one. We propose a Model-Based approach to increase generalization where both world model and policy are trained in a dimensionless state-action space. To do so, we introduce the Dimensionless Markov Decision Process ($Π$-MDP): an extension of Contextual-MDPs in which state and action spaces are non-dimensionalized with the Buckingham-$Π$ theorem. This procedure induces policies that are equivariant with respect to changes in the context of the underlying dynamics. We provide a generic framework for this approach and apply it to a model-based policy search algorithm using Gaussian Process models. We demonstrate the applicability of our method on simulated actuated pendulum and cartpole systems, where policies trained on a single environment are robust to shifts in the distribution of the context.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

2504.10006

Genre: Research Report (1.00)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.85)

Nash Equilibrium Between Consumer Electronic Devices and DoS Attacker for Distributed IoT-enabled RSE Systems

Chen, Gengcan, Cai, Donghong, Khan, Zahid, Ahmad, Jawad, Boulila, Wadii

In electronic consumer Internet of Things (IoT), consumer electronic devices as edge devices require less computational overhead and the remote state estimation (RSE) of consumer electronic devices is always at risk of denial-of-service (DoS) attacks. Therefore, the adversarial strategy between consumer electronic devices and DoS attackers is critical. This paper focuses on the adversarial strategy between consumer electronic devices and DoS attackers in IoT-enabled RSE Systems. We first propose a remote joint estimation model for distributed measurements to effectively reduce consumer electronic device workload and minimize data leakage risks. The Kalman filter is deployed on the remote estimator, and the DoS attacks with open-loop as well as closed-loop are considered. We further introduce advanced reinforcement learning techniques, including centralized and distributed Minimax-DQN, to address high-dimensional decision-making challenges in both open-loop and closed-loop scenarios. Especially, the Q-network instead of the Q-table is used in the proposed approaches, which effectively solves the challenge of Q-learning. Moreover, the proposed distributed Minimax-DQN reduces the action space to expedite the search for Nash Equilibrium (NE). The experimental results validate that the proposed model can expeditiously restore the RSE error covariance to a stable state in the presence of DoS attacks, exhibiting notable attack robustness. The proposed centralized and distributed Minimax-DQN effectively resolves the NE in both open and closed-loop case, showcasing remarkable performance in terms of convergence. It reveals that substantial advantages in both efficiency and stability are achieved compared with the state-of-the-art methods.

consumer electronic device, machine learning, reinforcement learning, (14 more...)

2504.09415

Country: Asia > China (0.28)

Genre: Research Report > Promising Solution (0.34)

Industry:

Semiconductors & Electronics (1.00)
Information Technology > Security & Privacy (1.00)
Energy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Broadbent, Dominic, Whiteley, Nick, Allison, Robert, Lovett, Tom

Conditional Distribution Compression via the Kernel Conditional Mean Embedding

arXiv.org Machine LearningApr-14-2025

Existing distribution compression methods, like Kernel Herding (KH), were originally developed for unlabelled data. However, no existing approach directly compresses the conditional distribution of labelled data. To address this gap, we first introduce the Average Maximum Conditional Mean Discrepancy (AMCMD), a natural metric for comparing conditional distributions. We then derive a consistent estimator for the AMCMD and establish its rate of convergence. Next, we make a key observation: in the context of distribution compression, the cost of constructing a compressed set targeting the AMCMD can be reduced from $\mathcal{O}(n^3)$ to $\mathcal{O}(n)$. Building on this, we extend the idea of KH to develop Average Conditional Kernel Herding (ACKH), a linear-time greedy algorithm that constructs a compressed set targeting the AMCMD. To better understand the advantages of directly compressing the conditional distribution rather than doing so via the joint distribution, we introduce Joint Kernel Herding (JKH), a straightforward adaptation of KH designed to compress the joint distribution of labelled data. While herding methods provide a simple and interpretable selection process, they rely on a greedy heuristic. To explore alternative optimisation strategies, we propose Joint Kernel Inducing Points (JKIP) and Average Conditional Kernel Inducing Points (ACKIP), which jointly optimise the compressed set while maintaining linear complexity. Experiments show that directly preserving conditional distributions with ACKIP outperforms both joint distribution compression (via JKH and JKIP) and the greedy selection used in ACKH. Moreover, we see that JKIP consistently outperforms JKH.

artificial intelligence, machine learning, tr 3, (16 more...)

arXiv.org Machine Learning

2504.10139

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > Virginia > Arlington County > Arlington (0.04)
Europe > United Kingdom > England > Bristol (0.04)
(5 more...)

Genre:

Research Report (0.64)
Overview (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.45)