AITopics | activity detection

Collaborating Authors

activity detection

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Transforming volcanic monitoring: A dataset and benchmark for onboard volcano activity detection

Priyasad, Darshana, Fernando, Tharindu, Haghighat, Maryam, Gammulle, Harshala, Fookes, Clinton

arXiv.org Artificial IntelligenceOct-28-2025

Natural disasters, such as volcanic eruptions, pose significant challenges to daily life and incur considerable global economic losses. The emergence of next-generation small-satellites, capable of constellation-based operations, offers unparalleled opportunities for near-real-time monitoring and onboard processing of such events. However, a major bottleneck remains the lack of extensive annotated datasets capturing volcanic activity, which hinders the development of robust detection systems. This paper introduces a novel dataset explicitly designed for volcanic activity and eruption detection, encompassing diverse volcanoes worldwide. The dataset provides binary annotations to identify volcanic anomalies or non-anomalies, covering phenomena such as temperature anomalies, eruptions, and volcanic ash emissions. These annotations offer a foundational resource for developing and evaluating detection models, addressing a critical gap in volcanic monitoring research. Additionally, we present comprehensive benchmarks using state-of-the-art models to establish baselines for future studies. Furthermore, we explore the potential for deploying these models onboard next-generation satellites. Using the Intel Movidius Myriad X VPU as a testbed, we demonstrate the feasibility of volcanic activity detection directly onboard. This capability significantly reduces latency and enhances response times, paving the way for advanced early warning systems. This paves the way for innovative solutions in volcanic disaster management, encouraging further exploration and refinement of onboard monitoring technologies.

artificial intelligence, detection, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2510.22889

Country: Oceania > Australia (0.46)

Genre: Research Report > Promising Solution (0.68)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.95)

Add feedback

Tiny Noise-Robust Voice Activity Detector for Voice Assistants

Asl, Hamed Jafarzadeh, Nejad, Mahsa Ghazvini, Edraki, Amin, Asgharian, Masoud, Nia, Vahid Partovi

arXiv.org Artificial IntelligenceJul-31-2025

Voice Activity Detection (VAD) in the presence of background noise remains a challenging problem in speech processing. Accurate VAD is essential in automatic speech recognition, voice-to-text, conversational agents, etc, where noise can severely degrade the performance. A modern application includes the voice assistant, specially mounted on Artificial Intelligence of Things (AIoT) devices such as cell phones, smart glasses, earbuds, etc, where the voice signal includes background noise. Therefore, VAD modules must remain light-weight due to their practical on-device limitation. The existing models often struggle with low signal-to-noise ratios across diverse acoustic environments. A simple VAD often detects human voice in a clean environment, but struggles to detect the human voice in noisy conditions. We propose a noise-robust VAD that comprises a light-weight VAD, with data pre-processing and post-processing added modules to handle the background noise. This approach significantly enhances the VAD accuracy in noisy environments and requires neither a larger model, nor fine-tuning. Experimental results demonstrate that our approach achieves a notable improvement compared to baselines, particularly in environments with high background noise interference. This modified VAD additionally improving clean speech detection.

artificial intelligence, machine learning, speech, (20 more...)

arXiv.org Artificial Intelligence

2507.22157

Country: North America > Canada > Quebec > Montreal (0.29)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.95)

Add feedback

Attention Is Not Always the Answer: Optimizing Voice Activity Detection with Simple Feature Fusion

Tripathi, Kumud, Kumar, Chowdam Venkata, Wasnik, Pankaj

arXiv.org Artificial IntelligenceJun-3-2025

V oice Activity Detection (V AD) plays a key role in speech processing, often utilizing hand-crafted or neural features. This study examines the effectiveness of Mel-Frequency Cepstral Coefficients (MFCCs) and pre-trained model (PTM) features, including wav2vec 2.0, HuBERT, WavLM, UniSpeech, MMS, and Whisper. We propose FusionV AD, a unified framework that combines both feature types using three fusion strategies: concatenation, addition, and cross-attention (CA). Experimental results reveal that simple fusion techniques, particularly addition, outperform CA in both accuracy and efficiency. Fusion-based models consistently surpass single-feature models, highlighting the complementary nature of MFCCs and PTM features. Notably, our best-performing fusion model exceeds the state-of-the-art Pyannote across multiple datasets, achieving an absolute average improvement of 2.04%. These results confirm that simple feature fusion enhances V AD robustness while maintaining computational efficiency.

artificial intelligence, machine learning, representation, (16 more...)

arXiv.org Artificial Intelligence

2506.01365

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.73)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.69)

Add feedback

Fast MLE and MAPE-Based Device Activity Detection for Grant-Free Access via PSCA and PSCA-Net

Tan, Bowen, Cui, Ying

arXiv.org Artificial IntelligenceMar-19-2025

Fast and accurate device activity detection is the critical challenge in grant-free access for supporting massive machine-type communications (mMTC) and ultra-reliable low-latency communications (URLLC) in 5G and beyond. The state-of-the-art methods have unsatisfactory error rates or computation times. To address these outstanding issues, we propose new maximum likelihood estimation (MLE) and maximum a posterior estimation (MAPE) based device activity detection methods for known and unknown pathloss that achieve superior error rate and computation time tradeoffs using optimization and deep learning techniques. Specifically, we investigate four non-convex optimization problems for MLE and MAPE in the two pathloss cases, with one MAPE problem being formulated for the first time. For each non-convex problem, we develop an innovative parallel iterative algorithm using the parallel successive convex approximation (PSCA) method. Each PSCA-based algorithm allows parallel computations, uses up to the objective function's second-order information, converges to the problem's stationary points, and has a low per-iteration computational complexity compared to the state-of-the-art algorithms. Then, for each PSCA-based iterative algorithm, we present a deep unrolling neural network implementation, called PSCA-Net, to further reduce the computation time. Each PSCA-Net elegantly marries the underlying PSCA-based algorithm's parallel computation mechanism with the parallelizable neural network architecture and effectively optimizes its step sizes based on vast data samples to speed up the convergence. Numerical results demonstrate that the proposed methods can significantly reduce the error rate and computation time compared to the state-of-the-art methods, revealing their significant values for grant-free access.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TWC.2025.3545649

2503.15259

Country:

Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)
Oceania > Australia (0.04)
(4 more...)

Genre:

Research Report > Promising Solution (0.54)
Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.86)

Add feedback

Lightweight Learning for Grant-Free Activity Detection in Cell-Free Massive MIMO Networks

Elkeshawy, Ali, Fares, Haifa, Nafkha, Amor

arXiv.org Artificial IntelligenceMar-14-2025

Grant-free random access (GF-RA) is a promising access technique for massive machine-type communications (mMTC) in future wireless networks, particularly in the context of 5G and beyond (6G) systems. Within the context of GF-RA, this study investigates the efficiency of employing supervised machine learning techniques to tackle the challenges on the device activity detection (AD). GF-RA addresses scalability by employing non-orthogonal pilot sequences, which provides an efficient alternative comparing to conventional grant-based random access (GB-RA) technique that are constrained by the scarcity of orthogonal preamble resources. In this paper, we propose a novel lightweight data-driven algorithmic framework specifically designed for activity detection in GF-RA for mMTC in cell-free massive multiple-input multiple-output (CF-mMIMO) networks. We propose two distinct framework deployment strategies, centralized and decentralized, both tailored to streamline the proposed approach implementation across network infrastructures. Moreover, we introduce optimized post-detection methodologies complemented by a clustering stage to enhance overall detection performances. Our 3GPP-compliant simulations have validated that the proposed algorithm achieves state-of-the-art model-based activity detection accuracy while significantly reducing complexity. Achieving 99% accuracy, it demonstrates real-world viability and effectiveness.

algorithm, artificial intelligence, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2503.11305

Country: Europe > France (0.04)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Robust Learning-Based Sparse Recovery for Device Activity Detection in Grant-Free Random Access Cell-Free Massive MIMO: Enhancing Resilience to Impairments

Elkeshawy, Ali, Fares, Haifa, Nafkha, Amor

arXiv.org Artificial IntelligenceMar-13-2025

Massive MIMO is considered a key enabler to support massive machine-type communication (mMTC). While massive access schemes have been extensively analyzed for co-located massive MIMO arrays, this paper explores activity detection in grant-free random access for mMTC within the context of cell-free massive MIMO systems, employing distributed antenna arrays. This sparse support recovery of device activity status is performed by a finite cluster of access points (APs) from a large number of geographically distributed APs collaborating to serve a larger number of devices. Active devices transmit non-orthogonal pilot sequences to APs, which forward the received signals to a central processing unit (CPU) for collaborative activity detection. This paper proposes a simple and efficient data-driven algorithm tailored for device activity detection, implemented centrally at the CPU. Furthermore, the study assesses the algorithm's robustness to input perturbations and examines the effects of adopting fixed-point representation on its performance.

activity detection, detection, detector, (11 more...)

arXiv.org Artificial Intelligence

2503.1028

Country:

Asia > Middle East > Israel > Haifa District > Haifa (0.04)
Europe > France (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Mixture of Experts-augmented Deep Unfolding for Activity Detection in IRS-aided Systems

Ren, Zeyi, Lin, Qingfeng, Lei, Jingreng, Li, Yang, Wu, Yik-Chung

arXiv.org Artificial IntelligenceFeb-27-2025

In the realm of activity detection for massive machine-type communications, intelligent reflecting surfaces (IRS) have shown significant potential in enhancing coverage for devices lacking direct connections to the base station (BS). However, traditional activity detection methods are typically designed for a single type of channel model, which does not reflect the complexities of real-world scenarios, particularly in systems incorporating IRS. To address this challenge, this paper introduces a novel approach that combines model-driven deep unfolding with a mixture of experts (MoE) framework. By automatically selecting one of three expert designs and applying it to the unfolded projected gradient method, our approach eliminates the need for prior knowledge of channel types between devices and the BS. Simulation results demonstrate that the proposed MoE-augmented deep unfolding method surpasses the traditional covariance-based method and black-box neural network design, delivering superior detection performance under mixed channel fading conditions.

activity detection, detection, ieee trans, (14 more...)

arXiv.org Artificial Intelligence

2502.20183

Country:

North America > United States (0.84)
Asia > China > Hong Kong (0.05)
Asia > China > Guangdong Province > Shenzhen (0.05)

Genre:

Research Report > New Finding (0.34)
Research Report > Promising Solution (0.34)

Industry:

Government > Tax (0.84)
Government > Regional Government > North America Government > United States Government (0.84)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)

Add feedback

Noise-Robust Target-Speaker Voice Activity Detection Through Self-Supervised Pretraining

Bovbjerg, Holger Severin, Østergaard, Jan, Jensen, Jesper, Tan, Zheng-Hua

arXiv.org Artificial IntelligenceJan-6-2025

Target-Speaker Voice Activity Detection (TS-VAD) is the task of detecting the presence of speech from a known target-speaker in an audio frame. Recently, deep neural network-based models have shown good performance in this task. However, training these models requires extensive labelled data, which is costly and time-consuming to obtain, particularly if generalization to unseen environments is crucial. To mitigate this, we propose a causal, Self-Supervised Learning (SSL) pretraining framework, called Denoising Autoregressive Predictive Coding (DN-APC), to enhance TS-VAD performance in noisy conditions. We also explore various speaker conditioning methods and evaluate their performance under different noisy conditions. Our experiments show that DN-APC improves performance in noisy conditions, with a general improvement of approx. 2% in both seen and unseen noise. Additionally, we find that FiLM conditioning provides the best overall performance. Representation analysis via tSNE plots reveals robust initial representations of speech and non-speech from pretraining. This underscores the effectiveness of SSL pretraining in improving the robustness and performance of TS-VAD models in noisy environments.

artificial intelligence, information, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2501.03184

Country: Europe > Denmark (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.94)

Add feedback

WiFi CSI Based Temporal Activity Detection Via Dual Pyramid Network

Liu, Zhendong, Zhang, Le, Li, Bing, Zhou, Yingjie, Chen, Zhenghua, Zhu, Ce

arXiv.org Artificial IntelligenceDec-18-2024

We address the challenge of WiFi-based temporal activity detection and propose an efficient Dual Pyramid Network that integrates Temporal Signal Semantic Encoders and Local Sensitive Response Encoders. The Temporal Signal Semantic Encoder splits feature learning into high and low-frequency components, using a novel Signed Mask-Attention mechanism to emphasize important areas and downplay unimportant ones, with the features fused using ContraNorm. The Local Sensitive Response Encoder captures fluctuations without learning. These feature pyramids are then combined using a new cross-attention fusion mechanism. We also introduce a dataset with over 2,114 activity segments across 553 WiFi CSI samples, each lasting around 85 seconds. Extensive experiments show our method outperforms challenging baselines. Code and dataset are available at https://github.com/AVC2-UESTC/WiFiTAD.

artificial intelligence, information, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2412.16233

Country: Asia (0.14)

Genre: Research Report (0.82)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Communications (0.93)
(2 more...)

Add feedback

TCG CREST System Description for the Second DISPLACE Challenge

Raghav, Nikhil, Saha, Subhajit, Sahidullah, Md, Das, Swagatam

arXiv.org Artificial IntelligenceSep-16-2024

In this report, we describe the speaker diarization (SD) and language diarization (LD) systems developed by our team for the Second DISPLACE Challenge, 2024. Our contributions were dedicated to Track 1 for SD and Track 2 for LD in multilingual and multi-speaker scenarios. We investigated different speech enhancement techniques, voice activity detection (VAD) techniques, unsupervised domain categorization, and neural embedding extraction architectures. We also exploited the fusion of various embedding extraction models. We implemented our system with the open-source SpeechBrain toolkit. Our final submissions use spectral clustering for both the speaker and language diarization. We achieve about $7\%$ relative improvement over the challenge baseline in Track 1. We did not obtain improvement over the challenge baseline in Track 2.

diarization, speaker diarization, track 1, (13 more...)

arXiv.org Artificial Intelligence

2409.15356

Country:

Asia > India > West Bengal > Kolkata (0.06)
Asia > China (0.05)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.49)

Add feedback