Goto

Collaborating Authors

 Gutierrez-Osuna, Ricardo


End-to-end Streaming model for Low-Latency Speech Anonymization

arXiv.org Artificial Intelligence

Speaker anonymization aims to conceal cues to speaker identity while preserving linguistic content. Current machine learning based approaches require substantial computational resources, hindering real-time streaming applications. To address these concerns, we propose a streaming model that achieves speaker anonymization with low latency. The system is trained in an end-to-end autoencoder fashion using a lightweight content encoder that extracts HuBERT-like information, a pretrained speaker encoder that extract speaker identity, and a variance encoder that injects pitch and energy information. These three disentangled representations are fed to a decoder that resynthesizes the speech signal. We present evaluation results from two implementations of our system, a full model that achieves a latency of 230ms, and a lite version (0.1x in size) that further reduces latency to 66ms while maintaining state-of-the-art performance in naturalness, intelligibility, and privacy preservation.


Font Identification in Historical Documents Using Active Learning

arXiv.org Machine Learning

Identifying the type of font (e.g., Roman, Blackletter) used in historical documents can help optical character recognition (OCR) systems produce more accurate text transcriptions. Towards this end, we present an active-learning strategy that can significantly reduce the number of labeled samples needed to train a font classifier. Our approach extracts image-based features that exploit geometric differences between fonts at the word level, and combines them into a bag-of-word representation for each page in a document. We evaluate six sampling strategies based on uncertainty, dissimilarity and diversity criteria, and test them on a database containing over 3,000 historical documents with Blackletter, Roman and Mixed fonts. Our results show that a combination of uncertainty and diversity achieves the highest predictive accuracy (89% of test cases correctly classified) while requiring only a small fraction of the data (17%) to be labeled. We discuss the implications of this result for mass digitization projects of historical documents.


Automatic Assessment of OCR Quality in Historical Documents

AAAI Conferences

Mass digitization of historical documents is a challenging problem for optical character recognition (OCR) tools. Issues include noisy backgrounds and faded text due to aging, border/marginal noise, bleed-through, skewing, warping, as well as irregular fonts and page layouts. As a result, OCR tools often produce a large number of spurious bounding boxes (BBs) in addition to those that correspond to words in the document. This paper presents an iterative classification algorithm to automatically label BBs (i.e., as text or noise) based on their spatial distribution and geometry. The approach uses a rule-base classifier to generate initial text/noise labels for each BB, followed by an iterative classifier that refines the initial labels by incorporating local information to each BB, its spatial location, shape and size. When evaluated on a dataset containing over 72,000 manually-labeled BBs from 159 historical documents, the algorithm can classify BBs with 0.95 precision and 0.96 recall. Further evaluation on a collection of 6,775 documents with ground-truth transcriptions shows that the algorithm can also be used to predict document quality (0.7 correlation) and improve OCR transcriptions in 85% of the cases.


LOLA Probabilistic Navigation for Topological Maps

AI Magazine

LOLA's entry in the Office Delivery event of the 1995 Robot Competition and Exhibition, held in conjunction with the Fourteenth International Joint Conference on Artificial Intelligence, was the culmination of a three-month design and implementation period for an indoor navigation system for topological maps. This article describes the major components of the robot's navigation architecture. It also summarizes the experiences and lessons learned from the competition.


LOLA Probabilistic Navigation for Topological Maps

AI Magazine

LOLA's entry in the Office Delivery event of the 1995 Robot Competition and Exhibition, held in conjunction with the Fourteenth International Joint Conference on Artificial Intelligence, was the culmination of a three-month design and implementation period for an indoor navigation system for topological maps. This article describes the major components of the robot's navigation architecture. It also summarizes the experiences and lessons learned from the competition.