hola
LLMs on a Budget? Say HOLA
Siddiqui, Zohaib Hasan, Gao, Jiechao, Shabbir, Ebad, Azeez, Mohammad Anas, Ali, Rafiq, Kashyap, Gautam Siddharth, Naseem, Usman
Running Large Language Models (LLMs) on edge devices is constrained by high compute and memory demands posing a barrier for real-time applications in sectors like healthcare, education, and embedded systems. Current solutions such as quantization, pruning, and retrieval-augmented generation (RAG) offer only partial optimizations and often compromise on speed or accuracy. We introduce HOLA, an end-to-end optimization framework for efficient LLM deployment. Internally, it leverages Hierarchical Speculative Decoding (HSD) for faster inference without quality loss. Externally, AdaComp-RAG adjusts retrieval complexity based on context needs. Together with LoBi, which blends structured pruning (LoRA) and quantization, HOLA delivers significant gains: 17.6% EMA on GSM8K, 10.5% MCA on ARC, and reduced latency and memory on edge devices like Jetson Nano--proving both scalable and production-ready.
HOLa: HoloLens Object Labeling
Schwimmbeck, Michael, Khajarian, Serouj, Holzapfel, Konstantin, Schmidt, Johannes, Remmele, Stefanie
In the context of medical Augmented Reality (AR) applications, object tracking is a key challenge and requires a significant amount of annotation masks. As segmentation foundation models like the Segment Anything Model (SAM) begin to emerge, zero-shot segmentation requires only minimal human participation obtaining high-quality object masks. We introduce a HoloLens-Object-Labeling (HOLa) Unity and Python application based on the SAM-Track algorithm that offers fully automatic single object annotation for HoloLens 2 while requiring minimal human participation. HOLa does not have to be adjusted to a specific image appearance and could thus alleviate AR research in any application field. We evaluate HOLa for different degrees of image complexity in open liver surgery and in medical phantom experiments. Using HOLa for image annotation can increase the labeling speed by more than 500 times while providing Dice scores between 0.875 and 0.982, which are comparable to human annotators. Our code is publicly available at: https://github.com/mschwimmbeck/HOLa
HoLA Robots: Mitigating Plan-Deviation Attacks in Multi-Robot Systems with Co-Observations and Horizon-Limiting Announcements
Wardega, Kacper, von Hippel, Max, Tron, Roberto, Nita-Rotaru, Cristina, Li, Wenchao
Emerging multi-robot systems rely on cooperation between humans and robots, with robots following automatically generated motion plans to service application-level tasks. Given the safety requirements associated with operating in proximity to humans and expensive infrastructure, it is important to understand and mitigate the security vulnerabilities of such systems caused by compromised robots who diverge from their assigned plans. We focus on centralized systems, where a *central entity* (CE) is responsible for determining and transmitting the motion plans to the robots, which report their location as they move following the plan. The CE checks that robots follow their assigned plans by comparing their expected location to the location they self-report. We show that this self-reporting monitoring mechanism is vulnerable to *plan-deviation attacks* where compromised robots don't follow their assigned plans while trying to conceal their movement by mis-reporting their location. We propose a two-pronged mitigation for plan-deviation attacks: (1) an attack detection technique leveraging both the robots' local sensing capabilities to report observations of other robots and *co-observation schedules* generated by the CE, and (2) a prevention technique where the CE issues *horizon-limiting announcements* to the robots, reducing their instantaneous knowledge of forward lookahead steps in the global motion plan. On a large-scale automated warehouse benchmark, we show that our solution enables attack prevention guarantees from a stealthy attacker that has compromised multiple robots.
- North America > United States (0.05)
- Europe > Portugal (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.68)
- Information Technology > Artificial Intelligence > Robots > Robots in the Workplace (0.61)
- Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.48)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.48)
A Light-Weight Multi-Objective Asynchronous Hyper-Parameter Optimizer
Maher, Gabriel, Boyd, Stephen, Kochenderfer, Mykel, Matache, Cristian, Reuter, Dylan, Ulitsky, Alex, Yukhymuk, Slava, Kopman, Leonid
We describe a light-weight yet performant system for hyper-parameter optimization that approximately minimizes an overall scalar cost function that is obtained by combining multiple performance objectives using a target-priority-limit scalarizer. It also supports a trade-off mode, where the goal is to find an appropriate trade-off among objectives by interacting with the user. We focus on the common scenario where there are on the order of tens of hyper-parameters, each with various attributes such as a range of continuous values, or a finite list of values, and whether it should be treated on a linear or logarithmic scale. The system supports multiple asynchronous simulations and is robust to simulation stragglers and failures. While the algorithm we describe will work in principle for any value of n, our focus is on the case where n is modest (e.g., < 10). Each of these objectives has a sense, which means that we either want the objective to be large (i.e., to maximize it) or small (i.e., to minimize it).