Goto

Collaborating Authors

 cnn-bilstm


Beyond Sub-6 GHz: Leveraging mmWave Wi-Fi for Gait-Based Person Identification

Bhat, Nabeel Nisar, Karnaukh, Maksim, Struye, Jakob, Berkvens, Rafael, Famaey, Jeroen

arXiv.org Artificial Intelligence

Person identification plays a vital role in enabling intelligent, personalized, and secure human-computer interaction. Recent research has demonstrated the feasibility of leveraging Wi-Fi signals for passive person identification using a person's unique gait pattern. Although most existing work focuses on sub-6 GHz frequencies, the emergence of mmWave offers new opportunities through its finer spatial resolution, though its comparative advantages for person identification remain unexplored. This work presents the first comparative study between sub-6 GHz and mmWave Wi-Fi signals for person identification with commercial off-the-shelf (COTS) Wi-Fi, using a novel dataset of synchronized measurements from the two frequency bands in an indoor environment. To ensure a fair comparison, we apply identical training pipelines and model configurations across both frequency bands. Leveraging end-to-end deep learning, we show that even at low sampling rates (10 Hz), mmWave Wi-Fi signals can achieve high identification accuracy (91.2% on 20 individuals) when combined with effective background subtraction.


CNN-BiLSTM for sustainable and non-invasive COVID-19 detection via salivary ATR-FTIR spectroscopy

Junior, Anisio P. Santos, Sabino-Silva, Robinson, Martins, Mário Machado, Cunha, Thulio Marquez, Carneiro, Murillo G.

arXiv.org Artificial Intelligence

The COVID-19 pandemic has placed unprecedented strain on healthcare systems and remains a global health concern, especially with the emergence of new variants. Although real-time polymerase chain reaction (RT-PCR) is considered the gold standard for COVID-19 detection, it is expensive, time-consuming, labor-intensive, and sensitive to issues with RNA extraction. In this context, ATR-FTIR spectroscopy analysis of biofluids offers a reagent-free, cost-effective alternative for COVID-19 detection. We propose a novel architecture that combines Convolutional Neural Networks (CNN) with Bidirectional Long Short-Term Memory (BiLSTM) networks, referred to as CNN-BiLSTM, to process spectra generated by ATR-FTIR spectroscopy and diagnose COVID-19 from spectral samples. We compare the performance of this architecture against a standalone CNN and other state-of-the-art machine learning techniques. Experimental results demonstrate that our CNN-BiLSTM model outperforms all other models, achieving an average accuracy and F1-score of 0.80 on a challenging real-world COVID-19 dataset. The addition of the BiLSTM layer to the CNN architecture significantly enhances model performance, making CNN-BiLSTM a more accurate and reliable choice for detecting COVID-19 using ATR-FTIR spectra of non-invasive saliva samples.


MAWIFlow Benchmark: Realistic Flow-Based Evaluation for Network Intrusion Detection

Schraven, Joshua, Windmann, Alexander, Niggemann, Oliver

arXiv.org Artificial Intelligence

Benchmark datasets for network intrusion detection commonly rely on synthetically generated traffic, which fails to reflect the statistical variability and temporal drift encountered in operational environments. This paper introduces MAWIFlow, a flow-based benchmark derived from the MAWILAB v1.1 dataset, designed to enable realistic and reproducible evaluation of anomaly detection methods. A reproducible preprocessing pipeline is presented that transforms raw packet captures into flow representations conforming to the CICFlowMeter format, while preserving MAWILab's original anomaly labels. The resulting datasets comprise temporally distinct samples from January 2011, 2016, and 2021, drawn from trans-Pacific backbone traffic. To establish reference baselines, traditional machine learning methods, including Decision Trees, Random Forests, XGBoost, and Logistic Regression, are compared to a deep learning model based on a CNN-BiLSTM architecture. Empirical results demonstrate that tree-based classifiers perform well on temporally static data but experience significant performance degradation over time. In contrast, the CNN-BiLSTM model maintains better performance, thus showing improved generalization. These findings underscore the limitations of synthetic benchmarks and static models, and motivate the adoption of realistic datasets with explicit temporal structure. All datasets, pipeline code, and model implementations are made publicly available to foster transparency and reproducibility.


A light-weight and efficient punctuation and word casing prediction model for on-device streaming ASR

You, Jian, Li, Xiangfeng

arXiv.org Artificial Intelligence

Punctuation and word casing prediction are necessary for automatic speech recognition (ASR). With the popularity of on-device end-to-end streaming ASR systems, the on-device punctuation and word casing prediction become a necessity while we found little discussion on this. With the emergence of Transformer, Transformer based models have been explored for this scenario. However, Transformer based models are too large for on-device ASR systems. In this paper, we propose a light-weight and efficient model that jointly predicts punctuation and word casing in real time. The model is based on Convolutional Neural Network (CNN) and Bidirectional Long Short-Term Memory (BiLSTM). Experimental results on the IWSLT2011 test set show that the proposed model obtains 9% relative improvement compared to the best of non-Transformer models on overall F1-score. Compared to the representative of Transformer based models, the proposed model achieves comparable results to the representative model while being only one-fortieth its size and 2.5 times faster in terms of inference time. It is suitable for on-device streaming ASR systems. Our code is publicly available.


Interpretable Multi Labeled Bengali Toxic Comments Classification using Deep Learning

Belal, Tanveer Ahmed, Shahariar, G. M., Kabir, Md. Hasanul

arXiv.org Artificial Intelligence

This paper presents a deep learning-based pipeline for categorizing Bengali toxic comments, in which at first a binary classification model is used to determine whether a comment is toxic or not, and then a multi-label classifier is employed to determine which toxicity type the comment belongs to. For this purpose, we have prepared a manually labeled dataset consisting of 16,073 instances among which 8,488 are Toxic and any toxic comment may correspond to one or more of the six toxic categories - vulgar, hate, religious, threat, troll, and insult simultaneously. Long Short Term Memory (LSTM) with BERT Embedding achieved 89.42% accuracy for the binary classification task while as a multi-label classifier, a combination of Convolutional Neural Network and Bi-directional Long Short Term Memory (CNN-BiLSTM) with attention mechanism achieved 78.92% accuracy and 0.86 as weighted F1-score. To explain the predictions and interpret the word feature importance during classification by the proposed models, we utilized Local Interpretable Model-Agnostic Explanations (LIME) framework. We have made our dataset public and can be accessed at - https://github.com/deepu099cse/Multi-Labeled-Bengali-Toxic-Comments-Classification


Short-Term Aggregated Residential Load Forecasting using BiLSTM and CNN-BiLSTM

Bohara, Bharat, Fernandez, Raymond I., Gollapudi, Vysali, Li, Xingpeng

arXiv.org Artificial Intelligence

Higher penetration of renewable and smart home technologies at the residential level challenges grid stability as utility-customer interactions add complexity to power system operations. In response, short-term residential load forecasting has become an increasing area of focus. However, forecasting at the residential level is challenging due to the higher uncertainties involved. Recently deep neural networks have been leveraged to address this issue. This paper investigates the capabilities of a bidirectional long short-term memory (BiLSTM) and a convolutional neural network-based BiLSTM (CNN-BiLSTM) to provide a day ahead (24 hr.) forecasting at an hourly resolution while minimizing the root mean squared error (RMSE) between the actual and predicted load demand. Using a publicly available dataset consisting of 38 homes, the BiLSTM and CNN-BiLSTM models are trained to forecast the aggregated active power demand for each hour within a 24 hr. span, given the previous 24 hr. load data. The BiLSTM model achieved the lowest RMSE of 1.4842 for the overall daily forecast. In addition, standard LSTM and CNN-LSTM models are trained and compared with the BiLSTM architecture. The RMSE of BiLSTM is 5.60%, 2.85% and 2.60% lower than the LSTM, CNN-LSTM and CNN-BiLSTM models respectively. The source code of this work is available at https://github.com/Varat7v2/STLF-BiLSTM-CNNBiLSTM.git.


End-to-End Speech Emotion Recognition: Challenges of Real-Life Emergency Call Centers Data Recordings

Deschamps-Berger, Théo, Lamel, Lori, Devillers, Laurence

arXiv.org Artificial Intelligence

Recognizing a speaker's emotion from their speech can be a key element in emergency call centers. End-to-end deep learning systems for speech emotion recognition now achieve equivalent or even better results than conventional machine learning approaches. In this paper, in order to validate the performance of our neural network architecture for emotion recognition from speech, we first trained and tested it on the widely used corpus accessible by the community, IEMOCAP. We then used the same architecture as the real life corpus, CEMO, composed of 440 dialogs (2h16m) from 485 speakers. The most frequent emotions expressed by callers in these real life emergency dialogues are fear, anger and positive emotions such as relief. In the IEMOCAP general topic conversations, the most frequent emotions are sadness, anger and happiness. Using the same end-to-end deep learning architecture, an Unweighted Accuracy Recall (UA) of 63% is obtained on IEMOCAP and a UA of 45.6% on CEMO, each with 4 classes. Using only 2 classes (Anger, Neutral), the results for CEMO are 76.9% UA compared to 81.1% UA for IEMOCAP. We expect that these encouraging results with CEMO can be improved by combining the audio channel with the linguistic channel. Real-life emotions are clearly more complex than acted ones, mainly due to the large diversity of emotional expressions of speakers. Index Terms-emotion detection, end-to-end deep learning architecture, call center, real-life database, complex emotions.