Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration

Mar-16-2025, 03:05:49 GMT–Neural Information Processing Systems

Mobile device operation tasks are increasingly becoming a popular multi-modal AI application scenario. Current Multi-modal Large Language Models (MLLMs), constrained by their training data, lack the capability to function effectively as operation assistants. Instead, MLLM-based agents, which enhance capabilities through tool invocation, are gradually being applied to this scenario. However, the two major navigation challenges in mobile device operation tasks -- task progress navigation and focus content navigation -- are difficult to effectively solve under the single-agent architecture of existing work. This is due to the overly long token sequences and the interleaved text-image data format, which limit performance.

agent, artificial intelligence, mobile-agent-v2, (11 more...)

Neural Information Processing Systems

Mar-16-2025, 03:05:49 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology
  - Artificial Intelligence > Representation & Reasoning
    - Agents (1.00)
  - Communications > Mobile (1.00)