AITopics

2503.17454

Genre: Research Report (0.69)

Industry: Information Technology > Security & Privacy (0.53)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.87)

A Comprehensive Survey on Long Context Language Modeling

Liu, Jiaheng, Zhu, Dawei, Bai, Zhiqi, He, Yancheng, Liao, Huanxuan, Que, Haoran, Wang, Zekun, Zhang, Chenchen, Zhang, Ge, Zhang, Jiebin, Zhang, Yuanxing, Chen, Zhuo, Guo, Hangyu, Li, Shilong, Liu, Ziqiang, Shan, Yong, Song, Yifan, Tian, Jiayi, Wu, Wenhao, Zhou, Zhejian, Zhu, Ruijie, Feng, Junlan, Gao, Yang, He, Shizhu, Li, Zhoujun, Liu, Tianyu, Meng, Fanyu, Su, Wenbo, Tan, Yingshui, Wang, Zili, Yang, Jian, Ye, Wei, Zheng, Bo, Zhou, Wangchunshu, Huang, Wenhao, Li, Sujian, Zhang, Zhaoxiang

Efficient processing of long contexts has been a persistent pursuit in Natural Language Processing. With the growing number of long documents, dialogues, and other textual data, it is important to develop Long Context Language Models (LCLMs) that can process and analyze extensive inputs in an effective and efficient way. In this paper, we present a comprehensive survey on recent advances in long-context modeling for large language models. Our survey is structured around three key aspects: how to obtain effective and efficient LCLMs, how to train and deploy LCLMs efficiently, and how to evaluate and analyze LCLMs comprehensively. For the first aspect, we discuss data strategies, architectural designs, and workflow approaches oriented with long context processing. For the second aspect, we provide a detailed examination of the infrastructure required for LCLM training and inference. For the third aspect, we present evaluation paradigms for long-context comprehension and long-form generation, as well as behavioral analysis and mechanism interpretability of LCLMs. Beyond these three key aspects, we thoroughly explore the diverse application scenarios where existing LCLMs have been deployed and outline promising future development directions. This survey provides an up-to-date review of the literature on long-context LLMs, which we wish to serve as a valuable resource for both researchers and engineers. An associated GitHub repository collecting the latest papers and repos is available at: \href{https://github.com/LCLM-Horizon/A-Comprehensive-Survey-For-Long-Context-Language-Modeling}{\color[RGB]{175,36,67}{LCLM-Horizon}}.

information retrieval, large language model, machine learning, (25 more...)

2503.17407

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Florida > Miami-Dade County > Miami (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
(28 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.45)
Research Report > Experimental Study (0.45)

Industry:

Health & Medicine (1.00)
Information Technology (0.92)
Leisure & Entertainment (0.67)
Education > Curriculum > Subject-Specific Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(4 more...)

Survey on Evaluation of LLM-based Agents

Yehudai, Asaf, Eden, Lilach, Li, Alan, Uziel, Guy, Zhao, Yilun, Bar-Haim, Roy, Cohan, Arman, Shmueli-Scheuer, Michal

The emergence of LLM-based agents represents a paradigm shift in AI, enabling autonomous systems to plan, reason, use tools, and maintain memory while interacting with dynamic environments. This paper provides the first comprehensive survey of evaluation methodologies for these increasingly capable agents. We systematically analyze evaluation benchmarks and frameworks across four critical dimensions: (1) fundamental agent capabilities, including planning, tool use, self-reflection, and memory; (2) application-specific benchmarks for web, software engineering, scientific, and conversational agents; (3) benchmarks for generalist agents; and (4) frameworks for evaluating agents. Our analysis reveals emerging trends, including a shift toward more realistic, challenging evaluations with continuously updated benchmarks. We also identify critical gaps that future research must address-particularly in assessing cost-efficiency, safety, and robustness, and in developing fine-grained, and scalable evaluation methods. This survey maps the rapidly evolving landscape of agent evaluation, reveals the emerging trends in the field, identifies current limitations, and proposes directions for future research.

large language model, machine learning, natural language, (17 more...)

2503.16416

Country:

Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)
(8 more...)

Genre: Overview (1.00)

Industry:

Information Technology (0.46)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints

Qin, Yiran, Kang, Li, Song, Xiufeng, Yin, Zhenfei, Liu, Xiaohong, Liu, Xihui, Zhang, Ruimao, Bai, Lei

Designing effective embodied multi-agent systems is critical for solving complex real-world tasks across domains. Due to the complexity of multi-agent embodied systems, existing methods fail to automatically generate safe and efficient training data for such systems. To this end, we propose the concept of compositional constraints for embodied multi-agent systems, addressing the challenges arising from collaboration among embodied agents. We design various interfaces tailored to different types of constraints, enabling seamless interaction with the physical world. Leveraging compositional constraints and specifically designed interfaces, we develop an automated data collection framework for embodied multi-agent systems and introduce the first benchmark for embodied multi-agent manipulation, RoboFactory. Based on RoboFactory benchmark, we adapt and evaluate the method of imitation learning and analyzed its performance in different difficulty agent tasks. Furthermore, we explore the architectures and training strategies for multi-agent imitation learning, aiming to build safe and efficient embodied multi-agent systems.

agent, artificial intelligence, constraint, (14 more...)

2503.16408

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Consensus Tracking Control of Multi-agent Systems with A Time-varying Reference State under Binary-valued Communication

Wang, Ting, Qiu, Zhuangzhuang, Lu, Xiaodong, Zhao, Yanlong

This paper investigates the problem of consensus tracking control of discrete time multi-agent systems under binary-valued communication. Different from most existing studies on consensus tracking, the transmitted information between agents is the binary-valued. Parameter identification with binary-valued observations is applied to the estimation of neighbors'states and the tracking control is designed based on the estimation. Two Lyapunov functions are constructed to deal with the strong coupling of estimation and control. Compared with consensus problems under binary-valued communication, a reference state is required for consensus tracking control. Two scenarios of the time-varying reference state are studied respectively. (1) The reference state is asymptotically convergent. An online algorithm that performs estimation and control simultaneously is proposed, in which the estimation step size and the control gain are decreasing with time. By this algorithm, the multi-agent system is proved to achieve consensus tracking with convergence rate O(1/k^{\epsilon} ) under certain conditions. (2) The reference state is bounded, which is less conservative than that in the first case. In this case, the estimation step size and control gain are designed to be constant. By this algorithm, all the followers can reach to a neighborhood of the leader with an exponential rate. Finally, simulations are given to demonstrate theoretical results.

algorithm, artificial intelligence, reference state, (16 more...)

2503.15955

Country:

Asia > China > Beijing > Beijing (0.05)
Asia > China > Tianjin Province > Tianjin (0.04)
Asia > China > Shandong Province > Jinan (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Kshemkalyani, Ajay D., Kumar, Manish, Molla, Anisur Rahaman, Sharma, Gokarna

Dispersion is (Almost) Optimal under (A)synchrony

The dispersion problem has received much attention recently in the distributed computing literature. In this problem, $k\leq n$ agents placed initially arbitrarily on the nodes of an $n$-node, $m$-edge anonymous graph of maximum degree $\Delta$ have to reposition autonomously to reach a configuration in which each agent is on a distinct node of the graph. Dispersion is interesting as well as important due to its connections to many fundamental coordination problems by mobile agents on graphs, such as exploration, scattering, load balancing, relocation of self-driven electric cars (robots) to recharge stations (nodes), etc. The objective has been to provide a solution that optimizes simultaneously time and memory complexities. There exist graphs for which the lower bound on time complexity is $\Omega(k)$. Memory complexity is $\Omega(\log k)$ per agent independent of graph topology. The state-of-the-art algorithms have (i) time complexity $O(k\log^2k)$ and memory complexity $O(\log(k+\Delta))$ under the synchronous setting [DISC'24] and (ii) time complexity $O(\min\{m,k\Delta\})$ and memory complexity $O(\log(k+\Delta))$ under the asynchronous setting [OPODIS'21]. In this paper, we improve substantially on this state-of-the-art. Under the synchronous setting as in [DISC'24], we present the first optimal $O(k)$ time algorithm keeping memory complexity $O(\log (k+\Delta))$. Under the asynchronous setting as in [OPODIS'21], we present the first algorithm with time complexity $O(k\log k)$ keeping memory complexity $O(\log (k+\Delta))$, which is time-optimal within an $O(\log k)$ factor despite asynchrony. Both results were obtained through novel techniques to quickly find empty nodes to settle agents, which may be of independent interest.

agent, artificial intelligence, node, (16 more...)

2503.16216

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
Asia > India > West Bengal > Kolkata (0.04)

Genre: Research Report (1.00)

Industry: Energy (0.34)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Kreutz, Thomas, Mühlhäuser, Max, Guinea, Alejandro Sanchez

Whenever, Wherever: Towards Orchestrating Crowd Simulations with Spatio-Temporal Spawn Dynamics

Realistic crowd simulations are essential for immersive virtual environments, relying on both individual behaviors (microscopic dynamics) and overall crowd patterns (macroscopic characteristics). While recent data-driven methods like deep reinforcement learning improve microscopic realism, they often overlook critical macroscopic features such as crowd density and flow, which are governed by spatio-temporal spawn dynamics, namely, when and where agents enter a scene. Traditional methods, like random spawn rates, stochastic processes, or fixed schedules, are not guaranteed to capture the underlying complexity or lack diversity and realism. To address this issue, we propose a novel approach called nTPP-GMM that models spatio-temporal spawn dynamics using Neural Temporal Point Processes (nTPPs) that are coupled with a spawn-conditional Gaussian Mixture Model (GMM) for agent spawn and goal positions. We evaluate our approach by orchestrating crowd simulations of three diverse real-world datasets with nTPP-GMM. Our experiments demonstrate the orchestration with nTPP-GMM leads to realistic simulations that reflect real-world crowd scenarios and allow crowd analysis.

machine learning, reinforcement learning, simulation, (20 more...)

2503.16639

Country:

Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
Europe > Greece (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment (0.30)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Communications > Social Media > Crowdsourcing (0.87)
(2 more...)

Advancing Mobile GUI Agents: A Verifier-Driven Approach to Practical Deployment

Dai, Gaole, Jiang, Shiqi, Cao, Ting, Li, Yuanchun, Yang, Yuqing, Tan, Rui, Li, Mo, Qiu, Lili

We propose V-Droid, a mobile GUI task automation agent. Unlike previous mobile agents that utilize Large Language Models (LLMs) as generators to directly generate actions at each step, V-Droid employs LLMs as verifiers to evaluate candidate actions before making final decisions. To realize this novel paradigm, we introduce a comprehensive framework for constructing verifier-driven mobile agents: the discretized action space construction coupled with the prefilling-only workflow to accelerate the verification process, the pair-wise progress preference training to significantly enhance the verifier's decision-making capabilities, and the scalable human-agent joint annotation scheme to efficiently collect the necessary data at scale. V-Droid sets a new state-of-the-art task success rate across several public mobile task automation benchmarks: 59.5% on AndroidWorld, 38.3% on AndroidLab, and 49% on MobileAgentBench, surpassing existing agents by 9.5%, 2.1%, and 9%, respectively. Furthermore, V-Droid achieves an impressively low latency of 0.7 seconds per step, making it the first mobile agent capable of delivering near-real-time, effective decision-making capabilities.

large language model, machine learning, natural language, (19 more...)

2503.15937

Country: Asia > China > Hong Kong (0.04)

Genre: Workflow (1.00)

Industry: Information Technology (0.68)

Technology:

Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Unreal-MAP: Unreal-Engine-Based General Platform for Multi-Agent Reinforcement Learning

Hu, Tianyi, Fu, Qingxu, Pu, Zhiqiang, Wang, Yuan, Qiu, Tenghai

In this paper, we propose Unreal Multi-Agent Playground (Unreal-MAP), an MARL general platform based on the Unreal-Engine (UE). Unreal-MAP allows users to freely create multi-agent tasks using the vast visual and physical resources available in the UE community, and deploy state-of-the-art (SOTA) MARL algorithms within them. Unreal-MAP is user-friendly in terms of deployment, modification, and visualization, and all its components are open-source. We also develop an experimental framework compatible with algorithms ranging from rule-based to learning-based provided by third-party frameworks. Lastly, we deploy several SOTA algorithms in example tasks developed via Unreal-MAP, and conduct corresponding experimental analyses. We believe Unreal-MAP can play an important role in the MARL field by closely integrating existing algorithms with user-customized tasks, thus advancing the field of MARL.

machine learning, reinforcement learning, unreal-map, (14 more...)

2503.15947

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States > Tennessee (0.04)
North America > United States > California (0.04)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

Li, Mingxuan, Zhang, Junzhe, Bareinboim, Elias

Causally Aligned Curriculum Learning

A pervasive challenge in Reinforcement Learning (RL) is the "curse of dimensionality" which is the exponential growth in the state-action space when optimizing a high-dimensional target task. The framework of curriculum learning trains the agent in a curriculum composed of a sequence of related and more manageable source tasks. The expectation is that when some optimal decision rules are shared across source tasks and the target task, the agent could more quickly pick up the necessary skills to behave optimally in the environment, thus accelerating the learning process. However, this critical assumption of invariant optimal decision rules does not necessarily hold in many practical applications, specifically when the underlying environment contains unobserved confounders. This paper studies the problem of curriculum RL through causal lenses. We derive a sufficient graphical condition characterizing causally aligned source tasks, i.e., the invariance of optimal decision rules holds. We further develop an efficient algorithm to generate a causally aligned curriculum, provided with qualitative causal knowledge of the target task. Finally, we validate our proposed methodology through experiments in discrete and continuous confounded tasks with pixel observations.

machine learning, reinforcement learning, source task, (15 more...)

2503.16799

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
North America > United States > Maryland > Baltimore (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(16 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Education (0.93)
Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)