Goto

Collaborating Authors

 Inductive Learning


Scalable Wi-Fi RSS-Based Indoor Localization via Automatic Vision-Assisted Calibration

arXiv.org Artificial Intelligence

Wi-Fi-based positioning promises a scalable and privacy-preserving solution for location-based services in indoor environments such as malls, airports, and campuses. RSS-based methods are widely deployable as RSS data is available on all Wi-Fi-capable devices, but RSS is highly sensitive to multipath, channel variations, and receiver characteristics. While supervised learning methods offer improved robustness, they require large amounts of labeled data, which is often costly to obtain. We introduce a lightweight framework that solves this by automating high-resolution synchronized RSS-location data collection using a short, camera-assisted calibration phase. An overhead camera is calibrated only once with ArUco markers and then tracks a device collecting RSS data from broadcast packets of nearby access points across Wi-Fi channels. The resulting (x, y, RSS) dataset is used to automatically train mobile-deployable localization algorithms, avoiding the privacy concerns of continuous video monitoring. We quantify the accuracy limits of such vision-assisted RSS data collection under key factors such as tracking precision and label synchronization. Using the collected experimental data, we benchmark traditional and supervised learning approaches under varying signal conditions and device types, demonstrating improved accuracy and generalization, validating the utility of the proposed framework for practical use. All code, tools, and datasets are released as open source.


Influence-driven Curriculum Learning for Pre-training on Limited Data

arXiv.org Artificial Intelligence

Curriculum learning, a training technique where data is presented to the model in order of example difficulty (e.g., from simpler to more complex documents), has shown limited success for pre-training language models. In this work, we investigate whether curriculum learning becomes competitive if we replace conventional human-centered difficulty metrics with one that more closely corresponds to example difficulty as observed during model training. Specifically, we experiment with sorting training examples by their \textit{training data influence}, a score which estimates the effect of individual training examples on the model's output. Models trained on our curricula are able to outperform ones trained in random order by over 10 percentage points in benchmarks, confirming that curriculum learning is beneficial for language model pre-training, as long as a more model-centric notion of difficulty is adopted.


Variance-Based Defense Against Blended Backdoor Attacks

arXiv.org Artificial Intelligence

Backdoor attacks represent a subtle yet effective class of cyberattacks targeting AI models, primarily due to their stealthy nature. The model behaves normally on clean data but exhibits malicious behavior only when the attacker embeds a specific trigger into the input. This attack is performed during the training phase, where the adversary corrupts a small subset of the training data by embedding a pattern and modifying the labels to a chosen target. The objective is to make the model associate the pattern with the target label while maintaining normal performance on unaltered data. Several defense mechanisms have been proposed to sanitize training data-sets. However, these methods often rely on the availability of a clean dataset to compute statistical anomalies, which may not always be feasible in real-world scenarios where datasets can be unavailable or compromised. To address this limitation, we propose a novel defense method that trains a model on the given dataset, detects poisoned classes, and extracts the critical part of the attack trigger before identifying the poisoned instances. This approach enhances explainability by explicitly revealing the harmful part of the trigger. The effectiveness of our method is demonstrated through experimental evaluations on well-known image datasets and comparative analysis against three state-of-the-art algorithms: SCAn, ABL, and AGPD.


SlotFM: A Motion Foundation Model with Slot Attention for Diverse Downstream Tasks

arXiv.org Artificial Intelligence

Wearable accelerometers are used for a wide range of applications, such as gesture recognition, gait analysis, and sports monitoring. Y et most existing foundation models focus primarily on classifying common daily activities such as locomotion and exercise, limiting their applicability to the broader range of tasks that rely on other signal characteristics. SlotFM uses Time-Frequency Slot Attention, an extension of Slot Attention that processes both time and frequency representations of the raw signals. It generates multiple small embeddings (slots), each capturing different signal components, enabling task-specific heads to focus on the most relevant parts of the data. We also introduce two loss regularizers that capture local structure and frequency patterns, which improve reconstruction of fine-grained details and helps the embeddings preserve task-relevant information. We evaluate SlotFM on 16 classification and regression downstream tasks that extend beyond standard human activity recognition. It outperforms existing self-supervised approaches on 13 of these tasks and achieves comparable results to the best performing approaches on the remaining tasks. On average, our method yields a 4.5% performance gain, demonstrating strong generalization for sensing foundation models. Advances in self-supervised learning (SSL) and large-scale datasets have enabled foundation models that support multiple tasks through shared representations (Y ang et al., 2024; Oquab et al., 2023). This is particularly valuable for wearable devices, where maintaining separate models dedicated for each task is often impractical due to memory and compute constraints. Accelerometers are widely used sensors in wearables for diverse motion-related tasks. Recent studies show that SSL approaches can train foundation models effective in Human Activity Recognition (HAR) tasks such as exercise and locomotion classification (Logacjov, 2024). However, their applicability to broader accelerometer tasks, such as gait analysis and gesture recognition, remains largely unexplored. This contrasts with domains such as audio, where foundation models have been applied beyond a single task, spanning speech-to-text, speaker identification, and emotion recognition.


Label-Efficient Grasp Joint Prediction with Point-JEPA

arXiv.org Artificial Intelligence

Abstract--We study whether 3D self-supervised pretraining with Point-JEPA enables label-efficient grasp joint-angle prediction. Meshes are sampled to point clouds and tokenized; a ShapeNet-pretrained Point-JEPA encoder feeds a K=5 multi-hypothesis head trained with winner-takes-all and evaluated by top-logit selection. On a multi-finger hand dataset with strict object-level splits, Point-JEPA improves top-logit RMSE and Coverage@15 in low-label regimes (e.g., 26% lower RMSE at 25% data) and reaches parity at full supervision, suggesting JEPA-style pretraining is a practical lever for data-efficient grasp learning. Self-supervised learning (SSL) for 3D data has largely progressed along three directions. On point clouds this includes point/voxel masked autoencoding; e.g., V oxel-MAE reconstructs masked voxels for sparse automotive LiDAR and improves downstream tasks with fewer labels [1]-[4].


AMLgentex: Mobilizing Data-Driven Research to Combat Money Laundering

arXiv.org Artificial Intelligence

Money laundering enables organized crime by moving illicit funds into the legitimate economy. Although trillions of dollars are laundered each year, detection rates remain low because launderers evade oversight, confirmed cases are rare, and institutions see only fragments of the global transaction network. Since access to real transaction data is tightly restricted, synthetic datasets are essential for developing and evaluating detection methods. However, existing datasets fall short: they often neglect partial observability, temporal dynamics, strategic behavior, uncertain labels, class imbalance, and network-level dependencies. We introduce AMLGentex, an open-source suite for generating realistic, configurable transaction data and benchmarking detection methods. AMLGentex enables systematic evaluation of anti-money laundering systems under conditions that mirror real-world challenges. By releasing multiple country-specific datasets and practical parameter guidance, we aim to empower researchers and practitioners and provide a common foundation for collaboration and progress in combating money laundering.


AIRwaves at CheckThat! 2025: Retrieving Scientific Sources for Implicit Claims on Social Media with Dual Encoders and Neural Re-Ranking

arXiv.org Artificial Intelligence

Linking implicit scientific claims made on social media to their original publications is crucial for evidence-based fact-checking and scholarly discourse, yet it is hindered by lexical sparsity, very short queries, and domain-specific language. Team AIRwaves ranked second in Subtask 4b of the CLEF-2025 CheckThat! Lab with an evidence-retrieval approach that markedly outperforms the competition baseline. The optimized sparse-retrieval baseline(BM25) achieves MRR@5 = 0.5025 on the gold label blind test set. To surpass this baseline, a two-stage retrieval pipeline is introduced: (i) a first stage that uses a dual encoder based on E5-large, fine-tuned using in-batch and mined hard negatives and enhanced through chunked tokenization and rich document metadata; and (ii) a neural re-ranking stage using a SciBERT cross-encoder. Replacing purely lexical matching with neural representations lifts performance to MRR@5 = 0.6174, and the complete pipeline further improves to MRR@5 = 0.6828. The findings demonstrate that coupling dense retrieval with neural re-rankers delivers a powerful and efficient solution for tweet-to-study matching and provides a practical blueprint for future evidence-retrieval pipelines.


Self-evolved Imitation Learning in Simulated World

arXiv.org Artificial Intelligence

Imitation learning has been a trend recently, yet training a generalist agent across multiple tasks still requires large-scale expert demonstrations, which are costly and labor-intensive to collect. To address the challenge of limited supervision, we propose Self-Evolved Imitation Learning (SEIL), a framework that progressively improves a few-shot model through simulator interactions. The model first attempts tasksin the simulator, from which successful trajectories are collected as new demonstrations for iterative refinement. To enhance the diversity of these demonstrations, SEIL employs dual-level augmentation: (i) Model-level, using an Exponential Moving Average (EMA) model to collaborate with the primary model, and (ii) Environment-level, introducing slight variations in initial object positions. We further introduce a lightweight selector that filters complementary and informative trajectories from the generated pool to ensure demonstration quality. These curated samples enable the model to achieve competitive performance with far fewer training examples. Extensive experiments on the LIBERO benchmark show that SEIL achieves a new state-of-the-art performance in few-shot imitation learning scenarios. Code is available at https://github.com/Jasper-aaa/SEIL.git.


SAMSA: Segment Anything Model Enhanced with Spectral Angles for Hyperspectral Interactive Medical Image Segmentation

arXiv.org Artificial Intelligence

Hyperspectral imaging (HSI) provides rich spectral information for medical imaging, yet encounters significant challenges due to data limitations and hardware variations. We introduce SAMSA, a novel interactive segmentation framework that combines an RGB foundation model with spectral analysis. SAMSA efficiently utilizes user clicks to guide both RGB segmentation and spectral similarity computations. The method addresses key limitations in HSI segmentation through a unique spectral feature fusion strategy that operates independently of spectral band count and resolution. Performance evaluation on publicly available datasets has shown 81.0% 1-click and 93.4% 5-click DICE on a neurosurgical and 81.1% 1-click and 89.2% 5-click DICE on an intraoperative porcine hyperspectral dataset. Experimental results demonstrate SAMSA's effectiveness in few-shot and zero-shot learning scenarios and using minimal training examples. Our approach enables seamless integration of datasets with different spectral characteristics, providing a flexible framework for hyperspectral medical image analysis.


Learning from Similarity-Confidence and Confidence-Difference

arXiv.org Artificial Intelligence

In practical machine learning applications, it is often challenging to assign accurate labels to data, and increasing the number of labeled instances is often limited. In such cases, Weakly Supervised Learning (WSL), which enables training with incomplete or imprecise supervision, provides a practical and effective solution. However, most existing WSL methods focus on leveraging a single type of weak supervision. In this paper, we propose a novel WSL framework that leverages complementary weak supervision signals from multiple relational perspectives, which can be especially valuable when labeled data is limited. Specifically, we introduce SconfConfDiff Classification, a method that integrates two distinct forms of weaklabels: similarity-confidence and confidence-difference, which are assigned to unlabeled data pairs. To implement this method, we derive two types of unbiased risk estimators for classification: one based on a convex combination of existing estimators, and another newly designed by modeling the interaction between two weak labels. We prove that both estimators achieve optimal convergence rates with respect to estimation error bounds. Furthermore, we introduce a risk correction approach to mitigate overfitting caused by negative empirical risk, and provide theoretical analysis on the robustness of the proposed method against inaccurate class prior probability and label noise. Experimental results demonstrate that the proposed method consistently outperforms existing baselines across a variety of settings.