edinburgh
GuideNav: User-Informed Development of a Vision-Only Robotic Navigation Assistant For Blind Travelers
Hwang, Hochul, Yang, Soowan, Monon, Jahir Sadik, Giudice, Nicholas A, Lee, Sunghoon Ivan, Biswas, Joydeep, Kim, Donghyun
While commendable progress has been made in user-centric research on mobile assistive systems for blind and low-vision (BLV) individuals, references that directly inform robot navigation design remain rare. To bridge this gap, we conducted a comprehensive human study involving interviews with 26 guide dog handlers, four white cane users, nine guide dog trainers, and one O\&M trainer, along with 15+ hours of observing guide dog-assisted walking. After de-identification, we open-sourced the dataset to promote human-centered development and informed decision-making for assistive systems for BLV people. Building on insights from this formative study, we developed GuideNav, a vision-only, teach-and-repeat navigation system. Inspired by how guide dogs are trained and assist their handlers, GuideNav autonomously repeats a path demonstrated by a sighted person using a robot. Specifically, the system constructs a topological representation of the taught route, integrates visual place recognition with temporal filtering, and employs a relative pose estimator to compute navigation actions - all without relying on costly, heavy, power-hungry sensors such as LiDAR. In field tests, GuideNav consistently achieved kilometer-scale route following across five outdoor environments, maintaining reliability despite noticeable scene variations between teach and repeat runs. A user study with 3 guide dog handlers and 1 guide dog trainer further confirmed the system's feasibility, marking (to our knowledge) the first demonstration of a quadruped mobile system retrieving a path in a manner comparable to guide dogs.
- North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
- North America > United States > Texas > Travis County > Austin (0.14)
- North America > United States > Maine > Penobscot County > Orono (0.14)
- (28 more...)
- Questionnaire & Opinion Survey (1.00)
- Research Report > New Finding (0.46)
MinatoLoader: Accelerating Machine Learning Training Through Efficient Data Preprocessing
Nouaji, Rahma, Bitchebe, Stella, Macedo, Ricardo, Balmau, Oana
Data loaders are used by Machine Learning (ML) frameworks like PyTorch and TensorFlow to apply transformations to data before feeding it into the accelerator. This operation is called data preprocessing. Data preprocessing plays an important role in the ML training workflow because if it is inefficiently pipelined with the training, it can yield high GPU idleness, resulting in important training delays. Unfortunately, existing data loaders turn out to waste GPU resources, with $76\%$ GPU idleness when using the PyTorch data loader, for example. One key source of inefficiency is the variability in preprocessing time across samples within the same dataset. Existing data loaders are oblivious to this variability, and they construct batches without any consideration of slow or fast samples. In this case, the entire batch is delayed by a single slow sample, stalling the training pipeline and resulting in head-of-line blocking. To address these inefficiencies, we present MinatoLoader, a general-purpose data loader for PyTorch that accelerates training and improves GPU utilization. MinatoLoader is designed for a single-server setup, containing multiple GPUs. It continuously prepares data in the background and actively constructs batches by prioritizing fast-to-preprocess samples, while slower samples are processed in parallel. We evaluate MinatoLoader on servers with V100 and A100 GPUs. On a machine with four A100 GPUs, MinatoLoader improves the training time of a wide range of workloads by up to $7.5\times$ ($3.6\times$ on average) over PyTorch DataLoader and Pecan, and up to $3\times$ ($2.2\times$ on average) over DALI. It also increases average GPU utilization from 46.4\% with PyTorch to 90.45\%, while preserving model accuracy and enabling faster convergence.
- North America > Canada > Quebec > Montreal (0.76)
- Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.05)
- North America > United States > Virginia (0.04)
- (3 more...)
- Information Technology (0.47)
- Health & Medicine (0.46)
MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production
Jin, Chao, Jiang, Ziheng, Bai, Zhihao, Zhong, Zheng, Liu, Juncai, Li, Xiang, Zheng, Ningxin, Wang, Xi, Xie, Cong, Huang, Qi, Heng, Wen, Ma, Yiyuan, Bao, Wenlei, Zheng, Size, Peng, Yanghua, Lin, Haibin, Liu, Xuanzhe, Jin, Xin, Liu, Xin
We present MegaScale-MoE, a production system tailored for the efficient training of large-scale mixture-of-experts (MoE) models. MoE emerges as a promising architecture to scale large language models (LLMs) to unprecedented sizes, thereby enhancing model performance. However, existing MoE training systems experience a degradation in training efficiency, exacerbated by the escalating scale of MoE models and the continuous evolution of hardware. Recognizing the pivotal role of efficient communication in enhancing MoE training, MegaScale-MoE customizes communication-efficient parallelism strategies for attention and FFNs in each MoE layer and adopts a holistic approach to overlap communication with computation at both inter- and intra-operator levels. Additionally, MegaScale-MoE applies communication compression with adjusted communication patterns to lower precision, further improving training efficiency. When training a 352B MoE model on 1,440 NVIDIA Hopper GPUs, MegaScale-MoE achieves a training throughput of 1.41M tokens/s, improving the efficiency by 1.88$\times$ compared to Megatron-LM. We share our operational experience in accelerating MoE training and hope that by offering our insights in system design, this work will motivate future research in MoE systems.
- Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.05)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Asia > China (0.04)
- North America > United States > New York > New York County > New York City (0.04)
Grand Theft Auto made him a legend. His latest game was a disaster
Grand Theft Auto made him a legend. In July this year workers at Build a Rocket Boy, a video game studio in Edinburgh, were called to an all-staff meeting. Their first ever game, a sci-fi adventure called MindsEye, had been released three weeks earlier - and it had been a total disaster. Critics and players called it broken, buggy, and the worst game of 2025. Addressing staff via video link, the company's boss, Leslie Benzies, assured them there was a plan to get things back on track and said the negativity they'd seen was uncalled for.
- Europe > United Kingdom (0.96)
- North America (0.95)
- Asia (0.70)
Efficient and Adaptable Overlapping for Computation and Communication via Signaling and Reordering
Hong, Ke, Li, Xiuhong, Liu, Minxu, Mao, Qiuli, Wu, Tianqi, Huang, Zixiao, Chen, Lufang, Wang, Zhong, Zhang, Yichong, Zhu, Zhenhua, Dai, Guohao, Wang, Yu
Generative models have achieved remarkable success across various applications, driving the demand for multi-GPU computing. Inter-GPU communication becomes a bottleneck in multi-GPU computing systems, particularly on consumer-grade GPUs. By exploiting concurrent hardware execution, overlapping computation and communication latency becomes an effective technique for mitigating the communication overhead. We identify that an efficient and adaptable overlapping design should satisfy (1) tile-wise overlapping to maximize the overlapping opportunity, (2) interference-free computation to maintain the original computational performance, and (3) communication agnosticism to reduce the development burden against varying communication primitives. Nevertheless, current designs fail to simultaneously optimize for all of those features. To address the issue, we propose FlashOverlap, which utilizes a novel signaling mechanism: when part of the output finishes, the computation kernel sends a signal to trigger the communication of that part, while continuing the computation of the remaining part (interference-free computation). Consequently, the communication of the finished part and the computation of the remaining part can be overlapped. On top of the signaling mechanism, FlashOverlap comprises two key components: (1) the determination of the signaling timing to boost the overlap efficiency (tile-wise overlapping), and (2) a pre-communication reordering to create the contiguous address for finished data, enabling communication by simply calling NCCL APIs (communication agnosticism), and a post-communication reordering to correct the data order. Experiments show that FlashOverlap achieves up to 1.65x speedup through overlap, outperforming existing works in most cases. Code is available at https://github.com/infinigence/FlashOverlap.
- Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.05)
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
- (2 more...)
TokenFlow: Responsive LLM Text Streaming Serving under Request Burst via Preemptive Scheduling
Chen, Junyi, Du, Chuheng, Liu, Renyuan, Yao, Shuochao, Yan, Dingtian, Liao, Jiang, Liu, Shengzhong, Wu, Fan, Chen, Guihai
Real-time LLM interactions demand streamed token generations, where text tokens are progressively generated and delivered to users while balancing two objectives: responsiveness (i.e., low time-to-first-token) and steady generation (i.e.,required time-between-tokens). Standard LLM serving systems suffer from the inflexibility caused by non-preemptive request scheduling and reactive memory management, leading to poor resource utilization and low request processing parallelism under request bursts. Therefore, we present TokenFlow, a novel LLM serving system with enhanced text streaming performance via preemptive request scheduling and proactive key-value (KV) cache management. TokenFlow dynamically prioritizes requests based on real-time token buffer occupancy and token consumption rate, while actively transferring KV cache between GPU and CPU memory in the background and overlapping I/O with computation to minimize request preemption overhead. Extensive experiments on Llama3-8B and Qwen2.5-32B across multiple GPUs (RTX 4090, A6000, H200) demonstrate that TokenFlow achieves up to 82.5% higher effective throughput (accounting for actual user consumption) while reducing P99 TTFT by up to 80.2%, without degrading overall token throughput.
- Europe > Austria > Vienna (0.14)
- Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.05)
- Asia > China > Shanghai > Shanghai (0.05)
- (4 more...)
Scaling LLM Test-Time Compute with Mobile NPU on Smartphones
Hao, Zixu, Wei, Jianyu, Wang, Tuowei, Huang, Minxing, Jiang, Huiqiang, Jiang, Shiqi, Cao, Ting, Ren, Ju
Deploying Large Language Models (LLMs) on mobile devices faces the challenge of insufficient performance in smaller models and excessive resource consumption in larger ones. This paper highlights that mobile Neural Processing Units (NPUs) have underutilized computational resources, particularly their matrix multiplication units, during typical LLM inference. To leverage this wasted compute capacity, we propose applying parallel test-time scaling techniques on mobile NPUs to enhance the performance of smaller LLMs. However, this approach confronts inherent NPU challenges, including inadequate hardware support for fine-grained quantization and low efficiency in general-purpose computations. To overcome these, we introduce two key techniques: a hardware-aware tile quantization scheme that aligns group quantization with NPU memory access patterns, and efficient LUT-based replacements for complex operations such as Softmax and dequantization. We design and implement an end-to-end inference system that leverages the NPU's compute capability to support test-time scaling on Qualcomm Snapdragon platforms. Experiments show our approach brings significant speedups: up to 19.0 for mixed-precision GEMM and 2.2 for Softmax. More importantly, we demonstrate that smaller models using test-time scaling can match or exceed the accuracy of larger models, achieving a new performance-cost Pareto frontier.
- Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.06)
- Asia > China (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Asia > Middle East > Jordan (0.04)
- Energy (0.69)
- Telecommunications (0.49)
Appendices for: Gradient-based Hyperparameter Optimization Over Long Horizons Paul Micaelli University of Edinburgh {paul.micaelli}@ed.ac.uk Amos Storkey University of Edinburgh {a.storkey }@ed.ac.uk
Now we return to the second part of (9). This illustrates how tight the upper bound is. We use a GeForce RTX 2080 Ti GPU for all experiments. Instead, we always carve out a validation set from our training set. Figure 1 The batch size is set to 128, and 1000 fixed images are used for the validation data. Here we provide the raw hypergradients corresponding to the outer optimization shown in Appendices: Figure 1.
Precise Localization of Memories: A Fine-grained Neuron-level Knowledge Editing Technique for LLMs
Pan, Haowen, Wang, Xiaozhi, Cao, Yixin, Shi, Zenglin, Yang, Xun, Li, Juanzi, Wang, Meng
Knowledge editing aims to update outdated information in Large Language Models (LLMs). A representative line of study is locate-then-edit methods, which typically employ causal tracing to identify the modules responsible for recalling factual knowledge about entities. However, we find these methods are often sensitive only to changes in the subject entity, leaving them less effective at adapting to changes in relations. This limitation results in poor editing locality, which can lead to the persistence of irrelevant or inaccurate facts, ultimately compromising the reliability of LLMs. We believe this issue arises from the insufficient precision of knowledge localization. To address this, we propose a Fine-grained Neuron-level Knowledge Editing (FiNE) method that enhances editing locality without affecting overall success rates. By precisely identifying and modifying specific neurons within feed-forward networks, FiNE significantly improves knowledge localization and editing. Quantitative experiments demonstrate that FiNE efficiently achieves better overall performance compared to existing techniques, providing new insights into the localization and modification of knowledge within LLMs. Recently, various methods for the precise editing of outdated or wrong knowledge within Large Language Models (LLMs) (Touvron et al., 2023a;b; Jiang et al., 2024; Dubey et al., 2024) have been proposed (Mazzia et al., 2023; Yao et al., 2023; Wang et al., 2023). This paper primarily focuses on locate-then-edit methods, which have emerged as a promising and mainstream approach for knowledge editing in LLMs. A key representative of these approaches is ROME (Meng et al., 2022), which employs causal tracing to identify specific modules responsible for recalling facts about subject entities.
- Asia > Russia (0.15)
- Europe > United Kingdom (0.15)
- Europe > Italy (0.05)
- (29 more...)
- Government (0.94)
- Education > Curriculum > Subject-Specific Education (0.46)
Model-based optimisation for the personalisation of robot-assisted gait training
Christou, Andreas, Gordon, Daniel F. N., Stouraitis, Theodoros, Moreno, Juan C., Vijayakumar, Sethu
PAPER ID: TMRB-06-24-OA-0958 1 Model-based optimisation for the personalisation of robot-assisted gait training Andreas Christou, Daniel F. N. Gordon, Theodoros Stouraitis, Juan C. Moreno and Sethu Vijayakumar Abstract--Personalised rehabilitation can be key to promoting gait independence and quality of life. Robots can enhance therapy by systematically delivering support in gait training, but often use one-size-fits-all control methods, which can be suboptimal. Here, we describe a model-based optimisation method for designing and fine-tuning personalised robotic controllers. As a case study, we formulate the objective of providing assistance as needed as an optimisation problem, and we demonstrate how musculoskeletal modelling can be used to develop personalised interventions. Eighteen healthy participants (age = 26 4) were recruited and the personalised control parameters for each were obtained to provide assistance as needed during a unilateral tracking task. A comparison was carried out between the personalised controller and the non-personalised controller. In simulation, a significant improvement was predicted when the personalised parameters were used. Experimentally, responses varied: six subjects showed significant improvements with the personalised parameters, eight subjects showed no obvious change, while four subjects performed worse. High interpersonal and intra-personal variability was observed with both controllers. This study highlights the importance of personalised control in robot-assisted gait training, and the need for a better estimation of human-robot interaction and human behaviour to realise the benefits of model-based optimisation. I. Introduction Motor function deficits are often the result of neurological disorders and can significantly impact the quality of This research was supported in part by the Engineering and Physical Sciences Research Council (EPSRC, grant reference EP/L016834/1) as part of the Centre for Doctoral Training in Robotics and Autonomous Systems at Heriot-Watt University and The University of Edinburgh, in part by the Alan Turing Institute, U.K., in part by Project I+D+i RED2022-134319-T (Spain), and in part by the Japan Science and Technology Agency (JST) Moonshot R&D Program (Grant No. JPMJMS2239). This includes one multimedia MP4 format movie clip, which provides scenes of the experimental setup. This material is 24.1 MB in size. T. Stouraitis is with DeepSea Technologies, 105 64 Athens, Greece (email: stoutheo@gmail.com).
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)