Goto

Collaborating Authors

 rcnn




Gated Recurrent Convolution Neural Network for OCR

Neural Information Processing Systems

Optical Character Recognition (OCR) aims to recognize text in natural images. Inspired by a recently proposed model for general image classification, Recurrent Convolution Neural Network (RCNN), we propose a new architecture named Gated RCNN (GRCNN) for solving this problem. Its critical component, Gated Recurrent Convolution Layer (GRCL), is constructed by adding a gate to the Recurrent Convolution Layer (RCL), the critical component of RCNN. The gate controls the context modulation in RCL and balances the feed-forward information and the recurrent information. In addition, an efficient Bidirectional Long Short-Term Memory (BLSTM) is built for sequence modeling. The GRCNN is combined with BLSTM to recognize text in natural images. The entire GRCNN-BLSTM model can be trained end-to-end. Experiments show that the proposed model outperforms existing methods on several benchmark datasets including the IIIT-5K, Street View Text (SVT) and ICDAR.


Recurrent Convolutional Neural Networks Learn Succinct Learning Algorithms

Neural Information Processing Systems

Neural networks (NNs) struggle to efficiently solve certain problems, such as learning parities, even when there are simple learning algorithms for those problems. Can NNs discover learning algorithms on their own?




Uncertainty in AI-driven Monte Carlo simulations

Tzivrailis, Dimitrios, Rosso, Alberto, Kawasaki, Eiji

arXiv.org Machine Learning

In the study of complex systems, evaluating physical observables often requires sampling representative configurations via Monte Carlo techniques. These methods rely on repeated evaluations of the system's energy and force fields, which can become computationally expensive. To accelerate these simulations, deep learning models are increasingly employed as surrogate functions to approximate the energy landscape or force fields. However, such models introduce epistemic uncertainty in their predictions, which may propagate through the sampling process and affect the system's macroscopic behavior. In our work, we present the Penalty Ensemble Method (PEM) to quantify epistemic uncertainty and mitigate its impact on Monte Carlo sampling. Our approach introduces an uncertainty-aware modification of the Metropolis acceptance rule, which increases the rejection probability in regions of high uncertainty, thereby enhancing the reliability of the simulation outcomes.


Idiom Detection in Sorani Kurdish Texts

Omer, Skala Kamaran, Hassani, Hossein

arXiv.org Artificial Intelligence

Idiom detection using Natural Language Processing (NLP) is the computerized process of recognizing figurative expressions within a text that convey meanings beyond the literal interpretation of the words. While idiom detection has seen significant progress across various languages, the Kurdish language faces a considerable research gap in this area despite the importance of idioms in tasks like machine translation and sentiment analysis. This study addresses idiom detection in Sorani Kurdish by approaching it as a text classification task using deep learning techniques. To tackle this, we developed a dataset containing 10,580 sentences embedding 101 Sorani Kurdish idioms across diverse contexts. Using this dataset, we developed and evaluated three deep learning models: KuBERT-based transformer sequence classification, a Recurrent Convolutional Neural Network (RCNN), and a BiLSTM model with an attention mechanism. The evaluations revealed that the transformer model, the fine-tuned BERT, consistently outperformed the others, achieving nearly 99% accuracy while the RCNN achieved 96.5% and the BiLSTM 80%. These results highlight the effectiveness of Transformer-based architectures in low-resource languages like Kurdish. This research provides a dataset, three optimized models, and insights into idiom detection, laying a foundation for advancing Kurdish NLP.


Convolutional Neural Networks with Intra-Layer Recurrent Connections for Scene Labeling

Neural Information Processing Systems

Scene labeling is a challenging computer vision task. It requires the use of both local discriminative features and global context information. We adopt a deep recurrent convolutional neural network (RCNN) for this task, which is originally proposed for object recognition. Different from traditional convolutional neural networks (CNN), this model has intra-layer recurrent connections in the convolutional layers. Therefore each convolutional layer becomes a two-dimensional recurrent neural network.


TRGR: Transmissive RIS-aided Gait Recognition Through Walls

Huang, Yunlong, Liu, Junshuo, Zhang, Jianan, Mi, Tiebin, Shi, Xin, Qiu, Robert Caiming

arXiv.org Artificial Intelligence

Gait recognition with radio frequency (RF) signals enables many potential applications requiring accurate identification. However, current systems require individuals to be within a line-of-sight (LOS) environment and struggle with low signal-to-noise ratio (SNR) when signals traverse concrete and thick walls. To address these challenges, we present TRGR, a novel transmissive reconfigurable intelligent surface (RIS)-aided gait recognition system. TRGR can recognize human identities through walls using only the magnitude measurements of channel state information (CSI) from a pair of transceivers. Specifically, by leveraging transmissive RIS alongside a configuration alternating optimization algorithm, TRGR enhances wall penetration and signal quality, enabling accurate gait recognition. Furthermore, a residual convolution network (RCNN) is proposed as the backbone network to learn robust human information. Experimental results confirm the efficacy of transmissive RIS, highlighting the significant potential of transmissive RIS in enhancing RF-based gait recognition systems. Extensive experiment results show that TRGR achieves an average accuracy of 97.88\% in identifying persons when signals traverse concrete walls, demonstrating the effectiveness and robustness of TRGR.