Mobile
SpiritSight Agent: Advanced GUI Agent with One Look
Huang, Zhiyuan, Cheng, Ziming, Pan, Junting, Hou, Zhaohui, Zhan, Mingjie
Graphical User Interface (GUI) agents show amazing abilities in assisting human-computer interaction, automating human user's navigation on digital devices. An ideal GUI agent is expected to achieve high accuracy, low latency, and compatibility for different GUI platforms. Recent vision-based approaches have shown promise by leveraging advanced Vision Language Models (VLMs). While they generally meet the requirements of compatibility and low latency, these vision-based GUI agents tend to have low accuracy due to their limitations in element grounding. To address this issue, we propose $\textbf{SpiritSight}$, a vision-based, end-to-end GUI agent that excels in GUI navigation tasks across various GUI platforms. First, we create a multi-level, large-scale, high-quality GUI dataset called $\textbf{GUI-Lasagne}$ using scalable methods, empowering SpiritSight with robust GUI understanding and grounding capabilities. Second, we introduce the $\textbf{Universal Block Parsing (UBP)}$ method to resolve the ambiguity problem in dynamic high-resolution of visual inputs, further enhancing SpiritSight's ability to ground GUI objects. Through these efforts, SpiritSight agent outperforms other advanced methods on diverse GUI benchmarks, demonstrating its superior capability and compatibility in GUI navigation tasks. Models are available at $\href{https://huggingface.co/SenseLLM/SpiritSight-Agent-8B}{this\ URL}$.
CHOP: Mobile Operating Assistant with Constrained High-frequency Optimized Subtask Planning
Zhou, Yuqi, Wang, Shuai, Dai, Sunhao, Jia, Qinglin, Du, Zhaocheng, Dong, Zhenhua, Xu, Jun
The advancement of visual language models (VLMs) has enhanced mobile device operations, allowing simulated human-like actions to address user requirements. Current VLM-based mobile operating assistants can be structured into three levels: task, subtask, and action. The subtask level, linking high-level goals with low-level executable actions, is crucial for task completion but faces two challenges: ineffective subtasks that lower-level agent cannot execute and inefficient subtasks that fail to contribute to the completion of the higher-level task. These challenges stem from VLM's lack of experience in decomposing subtasks within GUI scenarios in multi-agent architecture. To address these, we propose a new mobile assistant architecture with constrained high-frequency o}ptimized planning (CHOP). Our approach overcomes the VLM's deficiency in GUI scenarios planning by using human-planned subtasks as the basis vector. We evaluate our architecture in both English and Chinese contexts across 20 Apps, demonstrating significant improvements in both effectiveness and efficiency. Our dataset and code is available at https://github.com/Yuqi-Zhou/CHOP
MWC 2025: All the news from Samsung, Nothing, Lenovo, Xiaomi and more
Mobile World Congress is taking place in Barcelona this week, offering manufacturers an opportunity to show off new gear without needing to hold their own splashy event. So far, we've learned about some new laptops and phones, as well as upcoming AI updates to Android and an internet connectivity announcement from Meta. Here's a look at everything announced at Mobile World Congress that caught our eye. We'll update this story throughout the week. Among the bigger-name manufacturers, Lenovo has arguably had the busiest MWC so far.
I tested every Lenovo laptop released at MWC - and these are the very best
MWC 2025, or the Mobile World Conference, has officially kicked off in Barcelona. It's an annual conference where tech companies come together to showcase upcoming mobile devices. Lenovo has joined the festivities by unveiling a slew of new laptops, from lightweight machines like the convertible ThinkPad T14s to powerful workhorses such as the Yoga Pro 9i Aura Edition. In addition to these computers, the company showed off some very interesting prototypes. It's unknown if the concept hardware will ever be made into official products, but it provides interesting insight into what may be coming in the not-so-distant future.
A new iPad and iPad Air are coming -- pre-order now
PRE-ORDER NOW: On March 4, Apple dropped the new Apple iPad Air with M3 chip as well as the updated Apple iPad, now with an A16 chip. Both models are available for preorder and will ship on March 12. It's been less than a year since Apple debuted its 2024 iPad Air with M2 chip, and yet, they're already back with an upgraded model. On March 4, Apple introduced its latest model, the iPad Air with M3 chip. Apple CEO Tim Cook teased that a new product was coming earlier this week, and at Mashable we suspected it was the launch of the MacBook Air with M4 chip. However, it turns out we'll be waiting a little longer for that device.
5 easy Gemini settings tweaks to protect your privacy from AI
If you're an Android user, you are familiar with Gemini, as it has replaced Google Assistant as the default. Although Gemini is a powerful and helpful tool, some worry that it invades their privacy. If you use the default settings, that concern is not too far from the truth. If you happen to share that mindset, I have five tips to help you maximize your privacy when using Gemini on your Android device. Also: How to use Gemini's Deep Research to browse the web faster and better Fortunately, these tips aren't challenging, so anyone can use them.
Apple announces the M3 iPad Air with Apple Intelligence and a new Magic Keyboard
Apple just dropped a new iPad Air with an M3 chip, and yes, it has Apple Intelligence. The M3 iPad Air is twice as fast as the M1 iPad Air which was released in 2022, according to the announcement. The M3 chip also gives the iPad Air faster graphics performance and the same dynamic caching support that comes in other M3 models, which boosts performance and response time. The M3 iPad Air comes with iPadOS 18, which supports Apple Intelligence features, including Writing Tools with ChatGPT integration, type to Siri, Image Playground, and Genmoji creation. Apple Intelligence for iPad also has photo and graphics editing tools like the Clean Up tool in Photos and Image Wand in the Notes app that works with the Apple Pencil.
Goodbye Gemini, hello Pixel Sense? What we know about Google's AI assistant for Pixel 10
As far back as 2023, Google was reportedly working on an AI assistant for Pixel phones called "Pixie." Many people expected to see that assistant debut with the Pixel 9, but we haven't really heard anything about that project since. According to a report from Android Authority, Google is dropping a new context-aware assistant with the Pixel 10 -- Pixel Sense. Also: Gemini Live just got much easier to talk to - here's how Android Authority says Pixel Sense will use information on your phone to provide a much more personal assistant experience. It will be able to pull information from a number of other apps, including Calendar, Chrome, Contacts, Docs, Files, Gmail, Keep Notes, Maps, Messages, Phone, Photos, Recorder, Screenshots, Wallet, YouTube Music, and YouTube. The AI-powered assistant will run fully on-device, meaning you'll be able to use it offline and "not even Google can see" your data.
iPhone 15 Pro users just got a major AI upgrade with Visual Intelligence
Owners of the iPhone 15 Pro and Pro Max can now tap into a helpful AI-powered feature, courtesy of the latest iOS 18.4 developer beta. Launched on Monday, the new beta gives users of these older phones the ability to set up and use Visual Intelligence. Previously accessible only on the iPhone 16, Visual Intelligence lets you run web searches and ask questions about the people, places, and things you view through the camera. Also: Apple Intelligence now needs 7GB of storage, up from 4GB - here's why Beyond supporting the AI-powered feature, the new beta adds a couple of new ways to trigger it. The four iPhone 16 models use the physical Camera Control to launch Visual Intelligence.
The iPhone 15 Pro will get Visual Intelligence with iOS 18.4
What started as an Apple Intelligence feature exclusive to the Camera Control-endowed iPhone 16 line is coming to older iPhones, and soon. We already knew that the iPhone 15 Pro and Pro Max would get Visual Intelligence at some point in the future, and thanks to 9to5Mac, we now know it's one of several options you can assign to the Action Button in the second iOS 18.4 beta. That likely means the feature could end up in the final release of the update. Visual Intelligence lets you draw on AI models from Google and OpenAI to find information (and websites) about anything you point your iPhone's camera at. You can also use the feature to add information from a flyer to your calendar and oddly, identify dog breeds.