Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration
–Neural Information Processing Systems
Mobile device operation tasks are increasingly becoming a popular multi-modal AI application scenario. Current Multi-modal Large Language Models (MLLMs), constrained by their training data, lack the capability to function effectively as operation assistants. Instead, MLLM-based agents, which enhance capabilities through tool invocation, are gradually being applied to this scenario. However, the two major navigation challenges in mobile device operation tasks -- task progress navigation and focus content navigation -- are difficult to effectively solve under the single-agent architecture of existing work. This is due to the overly long token sequences and the interleaved text-image data format, which limit performance.
Neural Information Processing Systems
May-28-2025, 07:53:12 GMT
- Genre:
- Research Report (1.00)
- Industry:
- Information Technology (0.46)
- Leisure & Entertainment (0.46)
- Media (0.46)
- Technology: