Goto

Collaborating Authors

 Mobile


FedMABench: Benchmarking Mobile Agents on Decentralized Heterogeneous User Data

arXiv.org Artificial Intelligence

Mobile agents have attracted tremendous research participation recently. Traditional approaches to mobile agent training rely on centralized data collection, leading to high cost and limited scalability. Distributed training utilizing federated learning offers an alternative by harnessing real-world user data, providing scalability and reducing costs. However, pivotal challenges, including the absence of standardized benchmarks, hinder progress in this field. To tackle the challenges, we introduce FedMABench, the first benchmark for federated training and evaluation of mobile agents, specifically designed for heterogeneous scenarios. FedMABench features 6 datasets with 30+ subsets, 8 federated algorithms, 10+ base models, and over 800 apps across 5 categories, providing a comprehensive framework for evaluating mobile agents across diverse environments. Through extensive experiments, we uncover several key insights: federated algorithms consistently outperform local training; the distribution of specific apps plays a crucial role in heterogeneity; and, even apps from distinct categories can exhibit correlations during training. FedMABench is publicly available at: https://github.com/wwh0411/FedMABench with the datasets at: https://huggingface.co/datasets/wwh0411/FedMABench.


CrowdHMTware: A Cross-level Co-adaptation Middleware for Context-aware Mobile DL Deployment

arXiv.org Artificial Intelligence

There are many deep learning (DL) powered mobile and wearable applications today continuously and unobtrusively sensing the ambient surroundings to enhance all aspects of human lives.To enable robust and private mobile sensing, DL models are often deployed locally on resource-constrained mobile devices using techniques such as model compression or offloading.However, existing methods, either front-end algorithm level (i.e. DL model compression/partitioning) or back-end scheduling level (i.e. operator/resource scheduling), cannot be locally online because they require offline retraining to ensure accuracy or rely on manually pre-defined strategies, struggle with dynamic adaptability.The primary challenge lies in feeding back runtime performance from the back-end level to the front-end level optimization decision. Moreover, the adaptive mobile DL model porting middleware with cross-level co-adaptation is less explored, particularly in mobile environments with diversity and dynamics. In response, we introduce CrowdHMTware, a dynamic context-adaptive DL model deployment middleware for heterogeneous mobile devices. It establishes an automated adaptation loop between cross-level functional components, i.e. elastic inference, scalable offloading, and model-adaptive engine, enhancing scalability and adaptability. Experiments with four typical tasks across 15 platforms and a real-world case study demonstrate that CrowdHMTware can effectively scale DL model, offloading, and engine actions across diverse platforms and tasks. It hides run-time system issues from developers, reducing the required developer expertise.


Apple announces MacBook Air with M4 chip for less than 1,000

Mashable

Apple's new MacBook Air has arrived, and as expected, it has an M4 chip. Slightly less expected is the lower price point. The M4 MacBook Air starts at 999, which is 100 less than previous models. It also comes in a new sky blue color, joining the other options, midnight, starlight, and silver. Thanks to the M4 chip, Apple says the new MacBook Air is two times faster than the M1 MacBook Air, and "accelerates AI-based tasks," which means, yep, Apple Intelligence and ChatGPT.


SpiritSight Agent: Advanced GUI Agent with One Look

arXiv.org Artificial Intelligence

Graphical User Interface (GUI) agents show amazing abilities in assisting human-computer interaction, automating human user's navigation on digital devices. An ideal GUI agent is expected to achieve high accuracy, low latency, and compatibility for different GUI platforms. Recent vision-based approaches have shown promise by leveraging advanced Vision Language Models (VLMs). While they generally meet the requirements of compatibility and low latency, these vision-based GUI agents tend to have low accuracy due to their limitations in element grounding. To address this issue, we propose $\textbf{SpiritSight}$, a vision-based, end-to-end GUI agent that excels in GUI navigation tasks across various GUI platforms. First, we create a multi-level, large-scale, high-quality GUI dataset called $\textbf{GUI-Lasagne}$ using scalable methods, empowering SpiritSight with robust GUI understanding and grounding capabilities. Second, we introduce the $\textbf{Universal Block Parsing (UBP)}$ method to resolve the ambiguity problem in dynamic high-resolution of visual inputs, further enhancing SpiritSight's ability to ground GUI objects. Through these efforts, SpiritSight agent outperforms other advanced methods on diverse GUI benchmarks, demonstrating its superior capability and compatibility in GUI navigation tasks. Models are available at $\href{https://huggingface.co/SenseLLM/SpiritSight-Agent-8B}{this\ URL}$.


CHOP: Mobile Operating Assistant with Constrained High-frequency Optimized Subtask Planning

arXiv.org Artificial Intelligence

The advancement of visual language models (VLMs) has enhanced mobile device operations, allowing simulated human-like actions to address user requirements. Current VLM-based mobile operating assistants can be structured into three levels: task, subtask, and action. The subtask level, linking high-level goals with low-level executable actions, is crucial for task completion but faces two challenges: ineffective subtasks that lower-level agent cannot execute and inefficient subtasks that fail to contribute to the completion of the higher-level task. These challenges stem from VLM's lack of experience in decomposing subtasks within GUI scenarios in multi-agent architecture. To address these, we propose a new mobile assistant architecture with constrained high-frequency o}ptimized planning (CHOP). Our approach overcomes the VLM's deficiency in GUI scenarios planning by using human-planned subtasks as the basis vector. We evaluate our architecture in both English and Chinese contexts across 20 Apps, demonstrating significant improvements in both effectiveness and efficiency. Our dataset and code is available at https://github.com/Yuqi-Zhou/CHOP


MWC 2025: All the news from Samsung, Nothing, Lenovo, Xiaomi and more

Engadget

Mobile World Congress is taking place in Barcelona this week, offering manufacturers an opportunity to show off new gear without needing to hold their own splashy event. So far, we've learned about some new laptops and phones, as well as upcoming AI updates to Android and an internet connectivity announcement from Meta. Here's a look at everything announced at Mobile World Congress that caught our eye. We'll update this story throughout the week. Among the bigger-name manufacturers, Lenovo has arguably had the busiest MWC so far.


I tested every Lenovo laptop released at MWC - and these are the very best

ZDNet

MWC 2025, or the Mobile World Conference, has officially kicked off in Barcelona. It's an annual conference where tech companies come together to showcase upcoming mobile devices. Lenovo has joined the festivities by unveiling a slew of new laptops, from lightweight machines like the convertible ThinkPad T14s to powerful workhorses such as the Yoga Pro 9i Aura Edition. In addition to these computers, the company showed off some very interesting prototypes. It's unknown if the concept hardware will ever be made into official products, but it provides interesting insight into what may be coming in the not-so-distant future.


A new iPad and iPad Air are coming -- pre-order now

Mashable

PRE-ORDER NOW: On March 4, Apple dropped the new Apple iPad Air with M3 chip as well as the updated Apple iPad, now with an A16 chip. Both models are available for preorder and will ship on March 12. It's been less than a year since Apple debuted its 2024 iPad Air with M2 chip, and yet, they're already back with an upgraded model. On March 4, Apple introduced its latest model, the iPad Air with M3 chip. Apple CEO Tim Cook teased that a new product was coming earlier this week, and at Mashable we suspected it was the launch of the MacBook Air with M4 chip. However, it turns out we'll be waiting a little longer for that device.


5 easy Gemini settings tweaks to protect your privacy from AI

ZDNet

If you're an Android user, you are familiar with Gemini, as it has replaced Google Assistant as the default. Although Gemini is a powerful and helpful tool, some worry that it invades their privacy. If you use the default settings, that concern is not too far from the truth. If you happen to share that mindset, I have five tips to help you maximize your privacy when using Gemini on your Android device. Also: How to use Gemini's Deep Research to browse the web faster and better Fortunately, these tips aren't challenging, so anyone can use them.


Apple announces the M3 iPad Air with Apple Intelligence and a new Magic Keyboard

Mashable

Apple just dropped a new iPad Air with an M3 chip, and yes, it has Apple Intelligence. The M3 iPad Air is twice as fast as the M1 iPad Air which was released in 2022, according to the announcement. The M3 chip also gives the iPad Air faster graphics performance and the same dynamic caching support that comes in other M3 models, which boosts performance and response time. The M3 iPad Air comes with iPadOS 18, which supports Apple Intelligence features, including Writing Tools with ChatGPT integration, type to Siri, Image Playground, and Genmoji creation. Apple Intelligence for iPad also has photo and graphics editing tools like the Clean Up tool in Photos and Image Wand in the Notes app that works with the Apple Pencil.