Goto

Collaborating Authors

 maco






Prima.cpp: Fast 30-70B LLM Inference on Heterogeneous and Low-Resource Home Clusters

Li, Zonghang, Li, Tao, Feng, Wenjiao, Xiao, Rongxing, She, Jianshu, Huang, Hong, Guizani, Mohsen, Yu, Hongfang, Ho, Qirong, Xiang, Wei, Liu, Steve

arXiv.org Artificial Intelligence

On-device inference offers privacy, offline use, and instant response, but consumer hardware restricts large language models (LLMs) to low throughput and capability. To overcome this challenge, we present prima.cpp, a distributed on-device inference system that runs 30-70B LLMs on consumer home clusters with mixed CPUs/GPUs, insufficient RAM/VRAM, slow disks, Wi-Fi links, and heterogeneous OSs. We introduce pipelined-ring parallelism (PRP) to overlap disk I/O with compute and communication, and address the prefetch-release conflict in mmap-based offloading. We further propose Halda, a heterogeneity-aware scheduler that co-optimizes per-device CPU/GPU workloads and device selection under RAM/VRAM constraints. On four consumer home devices, a 70B model reaches 674 ms/token TPOT with <6% memory pressure, and a 32B model with speculative decoding achieves 26 tokens/s. Compared with llama.cpp, exo, and dllama, our proposed prima.cpp achieves 5-17x lower TPOT, supports fine-grained model sizes from 8B to 70B, ensures broader cross-OS and quantization compatibility, and remains OOM-free, while also being Wi-Fi tolerant, privacy-preserving, and hardware-independent. The code is available at https://gitee.com/zonghang-li/prima.cpp.


ChatGPT can now access Gmail, Outlook, and Google Drive in real time

PCWorld

Earlier this week, OpenAI announced that ChatGPT can now be connected to more apps and services, allowing you to pull your data from those sources in real time. Newly connectable sources that were explicitly mentioned include Outlook, Teams, Google Drive, Gmail, and Linear. ChatGPT can now connect to more internal sources & pull in real-time context--keeping existing user-level permissions. These new connections are only available for paid ChatGPT Plus, Pro, Team, Enterprise, and Edu users. Furthermore, the feature is excluded for users in the European Economic Area (EEA), Switzerland, and UK.


9 settings to change on your Mac

Popular Science

Breakthroughs, discoveries, and DIY tips sent every weekday. You've unwrapped your new Mac desktop or laptop and you're ready to dive in: Where should you start? Modern-day macOS is designed to be intuitive and straightforward, but it's also stuffed with options and features you can tweak to fit your needs. Here we'll look at some of the fundamental settings that you should change first, to ensure you're getting the best possible experience. All of these options can be found by opening the Apple menu in the top left corner of the macOS interface, then choosing System Settings.


Apple should focus on fixing Siri, not redesigning iOS again

Engadget

Now that Apple's recent slew of hardware releases are behind us, we got some news on the software side last week. First, the company publicly announced that it was delaying the smarter, more personal version of Siri that'll be powered by Apple Intelligence. Then, rumors sprang up again that Apple was giving an extensive visual update to its software platforms, including iOS 19 and macOS 16 which are expected to be revealed at WWDC in June. The sources for this redesign rumor are solid. Jon Prosser dropped a video on his YouTube channel Front Page Tech back in January where he said that he had seen a redesigned Camera app for the next version of iOS that had a number of interface changes that made it feel more like a visionOS app. His thinking is that Apple wouldn't redesign a core app like Camera without bringing changes to some of the rest of the OS, as well.


ChatGPT for macOS can now directly edit Xcode projects

Engadget

ChatGPT on macOS is about to become more useful for coding. ChatGPT can now edit code directly within an integrated development environment -- no need to copy and paste. You can find the full list of supported IDEs on OpenAI's website, but some of the more notable inclusions are Apple's own Xcode, Visual Code Studio and offshoots of Jetbrains like Android Studio and PyCharm. According to OpenAI, IDE integration has been one of the most-requested features from macOS users since the company released its "works with app" framework back in November. If you're a Plus, Pro or Team subscriber, you can start using the integration today.


How to use Apple Intelligence to sort your emails

Popular Science

If there's one area of digital life where some AI help is welcome, it's with processing emails: For most of us who spend all day sitting at a desk, sifting through the deluge of incoming messages and making any sort of progress towards inbox zero is a daunting task. And given the time-consuming and repetitive nature of email management, it feels like a job AI is well suited for. Apple has clearly been thinking along the same lines, because Apple Intelligence features have been rolled out to the Apple Mail app on iOS, iPadOS, and macOS. The built-in AI can now sort through your messages, summarize incoming emails, and help you write your own missives if you're stuck. Here's what's available on iPhones, iPads, and Macs--you just need to make sure you have enabled Apple Intelligence, which you'll find under Apple Intelligence & Siri in Settings (iOS and iPadOS) or System Settings (macOS).