AITopics | halda

Collaborating Authors

halda

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Prima.cpp: Fast 30-70B LLM Inference on Heterogeneous and Low-Resource Home Clusters

Li, Zonghang, Li, Tao, Feng, Wenjiao, Xiao, Rongxing, She, Jianshu, Huang, Hong, Guizani, Mohsen, Yu, Hongfang, Ho, Qirong, Xiang, Wei, Liu, Steve

arXiv.org Artificial IntelligenceSep-29-2025

On-device inference offers privacy, offline use, and instant response, but consumer hardware restricts large language models (LLMs) to low throughput and capability. To overcome this challenge, we present prima.cpp, a distributed on-device inference system that runs 30-70B LLMs on consumer home clusters with mixed CPUs/GPUs, insufficient RAM/VRAM, slow disks, Wi-Fi links, and heterogeneous OSs. We introduce pipelined-ring parallelism (PRP) to overlap disk I/O with compute and communication, and address the prefetch-release conflict in mmap-based offloading. We further propose Halda, a heterogeneity-aware scheduler that co-optimizes per-device CPU/GPU workloads and device selection under RAM/VRAM constraints. On four consumer home devices, a 70B model reaches 674 ms/token TPOT with <6% memory pressure, and a 32B model with speculative decoding achieves 26 tokens/s. Compared with llama.cpp, exo, and dllama, our proposed prima.cpp achieves 5-17x lower TPOT, supports fine-grained model sizes from 8B to 70B, ensures broader cross-OS and quantization compatibility, and remains OOM-free, while also being Wi-Fi tolerant, privacy-preserving, and hardware-independent. The code is available at https://gitee.com/zonghang-li/prima.cpp.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2504.08791

Country: Asia > China > Hong Kong (0.04)

Genre: Research Report (0.40)

Industry: Information Technology (1.00)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.95)

Add feedback

The A.I. Surveillance Companies That Say They Can Thwart Mass Shootings and Suicides

SlateDec-11-2023, 17:18:39 GMT

Our world has long been filled with cameras peering out over streets, malls, and schools. Many have been recording for years. But for the most part, no one ever looks at the footage. These little devices, perched on shelves and poles, exist primarily to create a record. If something happens and someone wants to learn more, they can go back.

ambient, halda, thwart mass shooting and suicide, (9 more...)

Slate

Country:

North America > United States > Ohio (0.05)
North America > United States > North Carolina (0.05)
North America > United States > California > San Bernardino County > Rialto (0.05)
Asia > South Korea > Seoul > Seoul (0.05)

Industry:

Commercial Services & Supplies > Security & Alarm Services (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.84)
Information Technology > Security & Privacy (0.70)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Vision (0.31)

Add feedback