Goto

Collaborating Authors

 Armenia


Methods to Increase the Amount of Data for Speech Recognition for Low Resource Languages

arXiv.org Artificial Intelligence

This study explores methods to increase data volume for low-resource languages using techniques such as crowdsourcing, pseudo-labeling, advanced data preprocessing and various permissive data sources such as audiobooks, Common Voice, YouTube. While these methods are well-explored for highresource languages, their application for low-resource languages remains underexplored. Using Armenian and Georgian as case studies, we demonstrate how linguistic and resource-specific characteristics influence the success of these methods. This work provides practical guidance for researchers to choose cost-effective and quality-driven dataset extension strategies for low-resource languages. The key takeaway from various data extension approaches is that paid crowd-sourcing offers the best balance between cost and quality, outperforming volunteer crowd-sourcing, open-source audiobooks, and unlabeled data usage. Ablation study shows that models trained on the expanded datasets outperform existing baselines and achieve 5.73% for Gergian and 9.9% for Armenian ASR word error rate using a relatively small FastConformer architecture. We open-sourced both the Armenian and Georgian models to allow further research and practical applications.


AI in the Cosmos

arXiv.org Artificial Intelligence

Artificial intelligence (AI) is revolutionizing research by enabling the efficient analysis of large datasets and the discovery of hidden patterns. In astrophysics, AI has become essential, transforming the classification of celestial sources, data modeling, and the interpretation of observations. In this review, I highlight examples of AI applications in astrophysics, including source classification, spectral energy distribution modeling, and discuss the advancements achievable through generative AI. However, the use of AI introduces challenges, including biases, errors, and the "black box" nature of AI models, which must be resolved before their application. These issues can be addressed through the concept of Human-Guided AI (HG-AI), which integrates human expertise and domain-specific knowledge into AI applications. This approach aims to ensure that AI is applied in a robust, interpretable, and ethical manner, leading to deeper insights and fostering scientific excellence.


Vision Transformers for Efficient Indoor Pathloss Radio Map Prediction

arXiv.org Artificial Intelligence

Vision Transformers (ViTs) have demonstrated remarkable success in achieving state-of-the-art performance across various image-based tasks and beyond. In this study, we employ a ViT-based neural network to address the problem of indoor pathloss radio map prediction. The network's generalization ability is evaluated across diverse settings, including unseen buildings, frequencies, and antennas with varying radiation patterns. By leveraging extensive data augmentation techniques and pretrained DINOv2 weights, we achieve promising results, even under the most challenging scenarios.


Formulation of probability theory problem with subtle condition

arXiv.org Artificial Intelligence

Problems in probability theory prove to be one of the most challenging for students. Here, we formulate and discuss four related problems in probability theory that proved difficult for first to fourth-year undergraduate students whose first language was not English. These examples emphasize how crucial it is to understand the conditions and requirements of the problems precisely before starting to solve them. We discuss the solutions to those problems in detail, complement them with numerical estimations, and link the conditions in the problems to the logical statements in Python programming language. We also tested two widely used chatbots (GPT-4o and Claude 3.5 Sonnet) by checking their responses to these problems.


ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation

arXiv.org Artificial Intelligence

Previous open-source large multimodal models (LMMs) have faced several limitations: (1) they often lack native integration, requiring adapters to align visual representations with pre-trained large language models (LLMs); (2) many are restricted to single-modal generation; (3) while some support multimodal generation, they rely on separate diffusion models for visual modeling and generation. To mitigate these limitations, we present Anole, an open, autoregressive, native large multimodal model for interleaved image-text generation. We build Anole from Meta AI's Chameleon, adopting an innovative fine-tuning strategy that is both data-efficient and parameter-efficient. Anole demonstrates high-quality, coherent multimodal generation capabilities. We have open-sourced our model, training framework, and instruction tuning data.


Self-Trained Model for ECG Complex Delineation

arXiv.org Artificial Intelligence

Electrocardiogram (ECG) delineation plays a crucial role in assisting cardiologists with accurate diagnoses. Prior research studies have explored various methods, including the application of deep learning techniques, to achieve precise delineation. However, existing approaches face limitations primarily related to dataset size and robustness. In this paper, we introduce a dataset for ECG delineation and propose a novel self-trained method aimed at leveraging a vast amount of unlabeled ECG data. Our approach involves the pseudolabeling of unlabeled data using a neural network trained on our dataset. Subsequently, we train the model on the newly labeled samples to enhance the quality of delineation. We conduct experiments demonstrating that our dataset is a valuable resource for training robust models and that our proposed self-trained method improves the prediction quality of ECG delineation.


Enabling ASR for Low-Resource Languages: A Comprehensive Dataset Creation Approach

arXiv.org Artificial Intelligence

In recent years, automatic speech recognition (ASR) systems have significantly improved, especially in languages with a vast amount of transcribed speech data. However, ASR systems tend to perform poorly for low-resource languages with fewer resources, such as minority and regional languages. This study introduces a novel pipeline designed to generate ASR training datasets from audiobooks, which typically feature a single transcript associated with hours-long audios. The common structure of these audiobooks poses a unique challenge due to the extensive length of audio segments, whereas optimal ASR training requires segments ranging from 4 to 15 seconds. To address this, we propose a method for effectively aligning audio with its corresponding text and segmenting it into lengths suitable for ASR training. Our approach simplifies data preparation for ASR systems in low-resource languages and demonstrates its application through a case study involving the Armenian language. Our method, which is "portable" to many low-resource languages, not only mitigates the issue of data scarcity but also enhances the performance of ASR models for underrepresented languages.


Transition Graph Properties of Target Class Classification

arXiv.org Artificial Intelligence

Target class classification is a mixed classification and transition model whose integrated goal is to assign objects to a certain, so called target or normal class. The classification process is iterative, and in each step an object in a certain class undergoes an action attached to that class, initiating the transition of the object to one of the classes. The sequence of transitions, which we call class transitions, must be designed to provide the final assignment of objects to the target class. The transition process can be described in the form of a directed graph, and the success of the final classification is mainly due to the properties of this graph. In our previous research we showed that the desirable structure of the transition graph is an oriented rooted tree with orientation towards the root vertex, which corresponds to the normal class. It is clear that the transition graph of an arbitrary algorithm (policy) may not have this property. In this paper we study the structure of realistic transition graphs, which makes it possible to find classification inconsistencies, helping to transfer it into the desired form. The medical interpretation of "dynamic treatment regime" considered in the article further clarifies the investigated framework.


MFL Data Preprocessing and CNN-based Oil Pipeline Defects Detection

arXiv.org Artificial Intelligence

Recently, the application of computer vision for anomaly detection has been under attention in several industrial fields. An important example is oil pipeline defect detection. Failure of one oil pipeline can interrupt the operation of the entire transportation system or cause a far-reaching failure. The automated defect detection could significantly decrease the inspection time and the related costs. However, there is a gap in the related literature when it comes to dealing with this task. The existing studies do not sufficiently cover the research of the Magnetic Flux Leakage data and the preprocessing techniques that allow overcoming the limitations set by the available data. This work focuses on alleviating these issues. Moreover, in doing so, we exploited the recent convolutional neural network structures and proposed robust approaches, aiming to acquire high performance considering the related metrics. The proposed approaches and their applicability were verified using real-world data.


Unsupervised extraction of local and global keywords from a single text

arXiv.org Artificial Intelligence

We propose an unsupervised, corpus-independent method to extract keywords from a single text. It is based on the spatial distribution of words and the response of this distribution to a random permutation of words. As compared to existing methods (such as e.g. YAKE) our method has three advantages. First, it is significantly more effective at extracting keywords from long texts. Second, it allows inference of two types of keywords: local and global. Third, it uncovers basic themes in texts. Additionally, our method is language-independent and applies to short texts. The results are obtained via human annotators with previous knowledge of texts from our database of classical literary works (the agreement between annotators is from moderate to substantial). Our results are supported via human-independent arguments based on the average length of extracted content words and on the average number of nouns in extracted words. We discuss relations of keywords with higher-order textual features and reveal a connection between keywords and chapter divisions.