locator
SpecAgent: A Speculative Retrieval and Forecasting Agent for Code Completion
Ma, George, Koul, Anurag, Chen, Qi, Wu, Yawen, Kuhar, Sachit, Yu, Yu, Sengupta, Aritra, Kumar, Varun, Ramanathan, Murali Krishna
Large Language Models (LLMs) excel at code-related tasks but often struggle in realistic software repositories, where project-specific APIs and cross-file dependencies are crucial. Retrieval-augmented methods mitigate this by injecting repository context at inference time. The low inference-time latency budget affects either retrieval quality or the added latency adversely impacts user experience. We address this limitation with SpecAgent, an agent that improves both latency and code-generation quality by proactively exploring repository files during indexing and constructing speculative context that anticipates future edits in each file. This indexing-time asynchrony allows thorough context computation, masking latency, and the speculative nature of the context improves code-generation quality. Additionally, we identify the problem of future context leakage in existing benchmarks, which can inflate reported performance. To address this, we construct a synthetic, leakage-free benchmark that enables a more realistic evaluation of our agent against baselines. Experiments show that SpecAgent consistently achieves absolute gains of 9-11% (48-58% relative) compared to the best-performing baselines, while significantly reducing inference latency.
Generalist Scanner Meets Specialist Locator: A Synergistic Coarse-to-Fine Framework for Robust GUI Grounding
Li, Zhecheng, Song, Guoxian, Wang, Yiwei, Xiong, Zhen, Yuan, Junsong, Cai, Yujun
Grounding natural language queries in graphical user interfaces (GUIs) presents a challenging task that requires models to comprehend diverse UI elements across various applications and systems, while also accurately predicting the spatial coordinates for the intended operation. To tackle this problem, we propose GMS: Generalist Scanner Meets Specialist Locator, a synergistic coarse-to-fine framework that effectively improves GUI grounding performance. GMS leverages the complementary strengths of general vision-language models (VLMs) and small, task-specific GUI grounding models by assigning them distinct roles within the framework. Specifically, the general VLM acts as a 'Scanner' to identify potential regions of interest, while the fine-tuned grounding model serves as a 'Locator' that outputs precise coordinates within these regions. This design is inspired by how humans perform GUI grounding, where the eyes scan the interface and the brain focuses on interpretation and localization. Our whole framework consists of five stages and incorporates hierarchical search with cross-modal communication to achieve promising prediction results. Experimental results on the ScreenSpot-Pro dataset show that while the 'Scanner' and 'Locator' models achieve only $2.0\%$ and $3.7\%$ accuracy respectively when used independently, their integration within GMS framework yields an overall accuracy of $35.7\%$, representing a $10 \times$ improvement. Additionally, GMS significantly outperforms other strong baselines under various settings, demonstrating its robustness and potential for general-purpose GUI grounding.
Motion-Aware Optical Camera Communication with Event Cameras
Su, Hang, Gao, Ling, Liu, Tao, Kneip, Laurent
As the ubiquity of smart mobile devices continues to rise, Optical Camera Communication systems have gained more attention as a solution for efficient and private data streaming. This system utilizes optical cameras to receive data from digital screens via visible light. Despite their promise, most of them are hindered by dynamic factors such as screen refreshing and rapid camera motion. CMOS cameras, often serving as the receivers, suffer from limited frame rates and motion-induced image blur, which degrade overall performance. To address these challenges, this paper unveils a novel system that utilizes event cameras. We introduce a dynamic visual marker and design event-based tracking algorithms to achieve fast localization and data streaming. Remarkably, the event camera's unique capabilities mitigate issues related to screen refresh rates and camera motion, enabling a high throughput of up to 114 Kbps in static conditions, and a 1 cm localization accuracy with 1% bit error rate under various camera motions.
EEGUnity: Open-Source Tool in Facilitating Unified EEG Datasets Towards Large-Scale EEG Model
Qin, Chengxuan, Yang, Rui, You, Wenlong, Chen, Zhige, Zhu, Longsheng, Huang, Mengjie, Wang, Zidong
The increasing number of dispersed EEG dataset publications and the advancement of large-scale Electroencephalogram (EEG) models have increased the demand for practical tools to manage diverse EEG datasets. However, the inherent complexity of EEG data, characterized by variability in content data, metadata, and data formats, poses challenges for integrating multiple datasets and conducting large-scale EEG model research. To tackle the challenges, this paper introduces EEGUnity, an open-source tool that incorporates modules of 'EEG Parser', 'Correction', 'Batch Processing', and 'Large Language Model Boost'. Leveraging the functionality of such modules, EEGUnity facilitates the efficient management of multiple EEG datasets, such as intelligent data structure inference, data cleaning, and data unification. In addition, the capabilities of EEGUnity ensure high data quality and consistency, providing a reliable foundation for large-scale EEG data research. EEGUnity is evaluated across 25 EEG datasets from different sources, offering several typical batch processing workflows. The results demonstrate the high performance and flexibility of EEGUnity in parsing and data processing. The project code is publicly available at github.com/Baizhige/EEGUnity.
Investigating Consistency in Query-Based Meeting Summarization: A Comparative Study of Different Embedding Methods
Jia-Chen, Chen, Senabre, Guillem, Caron, Allane
With more and more advanced data analysis techniques emerging, people will expect these techniques to be applied in more complex tasks and solve problems in our daily lives. Text Summarization is one of famous applications in Natural Language Processing (NLP) field. It aims to automatically generate summary with important information based on a given context, which is important when you have to deal with piles of documents. Summarization techniques can help capture key points in a short time and bring convenience in works. One of applicable situation is meeting summarization, especially for important meeting that tend to be long, complicated, multi-topic and multi-person. Therefore, when people want to review specific content from a meeting, it will be hard and time-consuming to find the related spans in the meeting transcript. However, most of previous works focus on doing summarization for newsletters, scientific articles...etc, which have a clear document structure and an official format. For the documents with complex structure like transcripts, we think those works are not quite suitable for meeting summarization. Besides, the consistency of summary is another issue common to be discussed in NLP field. To conquer challenges of meeting summarization, we are inspired by "QMSum: A New Benchmark for Query-based Multi-domain Meeting Summarization" proposed by Microsoft and we also propose our Locater model designed to extract relevant spans based on given transcript and query, which are then summarized by Summarizer model. Furthermore, we perform a comparative study by applying different word embedding techniques to improve summary consistency.
Benchmarking Automated Clinical Language Simplification: Dataset, Algorithm, and Evaluation
Luo, Junyu, Zheng, Zifei, Ye, Hanzhong, Ye, Muchao, Wang, Yaqing, You, Quanzeng, Xiao, Cao, Ma, Fenglong
Patients with low health literacy usually have difficulty understanding medical jargon and the complex structure of professional medical language. Although some studies are proposed to automatically translate expert language into layperson-understandable language, only a few of them focus on both accuracy and readability aspects simultaneously in the clinical domain. Thus, simplification of the clinical language is still a challenging task, but unfortunately, it is not yet fully addressed in previous work. To benchmark this task, we construct a new dataset named MedLane to support the development and evaluation of automated clinical language simplification approaches. Besides, we propose a new model called DECLARE that follows the human annotation procedure and achieves state-of-the-art performance compared with eight strong baselines. To fairly evaluate the performance, we also propose three specific evaluation metrics. Experimental results demonstrate the utility of the annotated MedLane dataset and the effectiveness of the proposed model DECLARE.
Nexus sine qua non: Essentially Connected Networks for Traffic Forecasting
Nie, Tong, Qin, Guoyang, Sun, Lijun, Wang, Yunpeng, Sun, Jian
Spatiotemporal graph neural networks (STGNNs) have emerged as a leading approach for learning representations and forecasting on traffic datasets with underlying topological and correlational structures. However, current STGNNs use intricate techniques with high complexities to capture these structures, making them difficult to understand and scale. The existence of simple yet efficient architectures remains an open question. Upon closer examination, we find what lies at the core of STGNN's representations are certain forms of spatiotemporal contextualization. In light of this, we design Nexus sine qua non (NexuSQN), an essentially connected network built on an efficient message-passing backbone. NexuSQN simply uses learnable "where" and "when" locators for the aforementioned contextualization and omits any intricate components such as RNNs, Transformers, and diffusion convolutions. Results show that NexuSQN outperforms intricately designed benchmarks in terms of size, computational efficiency, and accuracy. This suggests a promising future for developing simple yet efficient neural predictors.
Deep Learning for Reference-Free Geolocation for Poplar Trees
John, Cai W., Queen, Owen, Muchero, Wellington, Emrich, Scott J.
A core task in precision agriculture is the identification of climatic and ecological conditions that are advantageous for a given crop. The most succinct approach is geolocation, which is concerned with locating the native region of a given sample based on its genetic makeup. Here, we investigate genomic geolocation of Populus trichocarpa, or poplar, which has been identified by the US Department of Energy as a fast-rotation biofuel crop to be harvested nationwide. In particular, we approach geolocation from a reference-free perspective, circumventing the need for compute-intensive processes such as variant calling and alignment. Our model, MashNet, predicts latitude and longitude for poplar trees from randomly-sampled, unaligned sequence fragments. We show that our model performs comparably to Locator, a state-of-the-art method based on aligned whole-genome sequence data. MashNet achieves an error of 34.0 km^2 compared to Locator's 22.1 km^2. MashNet allows growers to quickly and efficiently identify natural varieties that will be most productive in their growth environment based on genotype. This paper explores geolocation for precision agriculture while providing a framework and data source for further development by the machine learning community.
MAGIC: Microlensing Analysis Guided by Intelligent Computation
For a microlensing event with multiple lenses, the interpretation of the light curve can be challenging. First When a distant star (called the source) gets sufficiently of all, the computation of the multiple-lens microlensing aligned with a massive foreground object (called light curve can be time-consuming due to the finitesource the lens), the gravitational field of the lens focuses the effect (e.g., Dong et al. 2006; Bozza 2010). This light out of the distant star, thus making the distant star is especially true when the microlens system consists of appear brighter (Einstein 1936; Paczynski 1986). For a three or more objects (e.g., Gaudi et al. 2008; Kuang typical source star inside the Milky Way, one can observe et al. 2021). Additionally, the likelihood landscape of the time evolution of their brightness (i.e., light curves) the high-dimensional parameter space can be so pathological and infer the existence and properties of companion objects that traditional sampling-based methods may to the lens by monitoring the deviations in the light have a hard time searching for the correct solution (or curve from the single lens scenario (e.g., Mao & Paczynski solutions). This remains to be true even when the brute 1991; Gould & Loeb 1992). This so-called gravitational force search on a fine grid that is defined by a subset microlensing technique has been frequently used of model parameters is conducted. As a result, the current to detect exoplanets and stellar binaries and are complementary analysis of multiple-lens microlensing events is still to other techniques (see reviews by Gaudi case-by-case, with each event requiring hundreds of (or 2012 and Zhu & Dong 2021).
Benchmarking Learnt Radio Localisation under Distribution Shift
Arnold, Maximilian, Alloulah, Mohammed
Deploying radio frequency (RF) localisation systems invariably entails non-trivial effort, particularly for the latest learning-based breeds. There has been little prior work on characterising and comparing how learnt localiser networks can be deployed in the field under real-world RF distribution shifts. In this paper, we present RadioBench: a suite of 8 learnt localiser nets from the state-of-the-art to study and benchmark their real-world deployability, utilising five novel industry-grade datasets. We train 10k models to analyse the inner workings of these learnt localiser nets and uncover their differing behaviours across three performance axes: (i) learning, (ii) proneness to distribution shift, and (iii) localisation. We use insights gained from this analysis to recommend best practices for the deployability of learning-based RF localisation under practical constraints. Decades of of radio frequency (RF) localisation research have given us a variety of classic methods (Patwari et al., 2005; Gezici et al., 2005). Newer machine learning incarnations can enhance location estimation considerably (Zanjani et al., 2022; Karmanov et al., 2021), albeit at the expense of proneness to distributional shift in wireless signals. For example, models trained on signals from a warehouse environment may not work well in another different environment (Arnold et al., 2018). If learnt localiser networks are to be productised and deployed, it is imperative that we robustify them.