Bălți
Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages
Omnilingual ASR team, null, Keren, Gil, Kozhevnikov, Artyom, Meng, Yen, Ropers, Christophe, Setzler, Matthew, Wang, Skyler, Adebara, Ife, Auli, Michael, Balioglu, Can, Chan, Kevin, Cheng, Chierh, Chuang, Joe, Droof, Caley, Duppenthaler, Mark, Duquenne, Paul-Ambroise, Erben, Alexander, Gao, Cynthia, Gonzalez, Gabriel Mejia, Lyu, Kehan, Miglani, Sagar, Pratap, Vineel, Sadagopan, Kaushik Ram, Saleem, Safiyyah, Turkatenko, Arina, Ventayol-Boada, Albert, Yong, Zheng-Xin, Chung, Yu-An, Maillard, Jean, Moritz, Rashel, Mourachko, Alexandre, Williamson, Mary, Yates, Shireen
Automatic speech recognition (ASR) has advanced in high-resource languages, but most of the world's 7,000+ languages remain unsupported, leaving thousands of long-tail languages behind. Expanding ASR coverage has been costly and limited by architectures that restrict language support, making extension inaccessible to most--all while entangled with ethical concerns when pursued without community collaboration. To transcend these limitations, we introduce Omnilingual ASR, the first large-scale ASR system designed for extensibility. Omnilingual ASR enables communities to introduce unserved languages with only a handful of data samples. It scales self-supervised pre-training to 7B parameters to learn robust speech representations and introduces an encoder-decoder architecture designed for zero-shot generalization, leveraging a LLM-inspired decoder. This capability is grounded in a massive and diverse training corpus; by combining breadth of coverage with linguistic variety, the model learns representations robust enough to adapt to unseen languages. Incorporating public resources with community-sourced recordings gathered through compensated local partnerships, Omnilingual ASR expands coverage to over 1,600 languages, the largest such effort to date--including over 500 never before served by ASR. Automatic evaluations show substantial gains over prior systems, especially in low-resource conditions, and strong generalization. We release Omnilingual ASR as a family of models, from 300M variants for low-power devices to 7B for maximum accuracy. We reflect on the ethical considerations shaping this design and conclude by discussing its societal impact. In particular, we highlight how open-sourcing models and tools can lower barriers for researchers and communities, inviting new forms of participation. Open-source artifacts are available at https://github.com/facebookresearch/omnilingual-asr.
- North America > Canada > Alberta (0.14)
- Europe > Austria > Vienna (0.14)
- Africa > Sudan (0.14)
- (53 more...)
- Health & Medicine (1.00)
- Education (0.67)
- Information Technology (0.67)
CLuP practically achieves $\sim 1.77$ positive and $\sim 0.33$ negative Hopfield model ground state free energy
We study algorithmic aspects of finding $n$-dimensional \emph{positive} and \emph{negative} Hopfield ($\pm$Hop) model ground state free energies. This corresponds to classical maximization of random positive/negative semi-definite quadratic forms over binary $\left \{\pm \frac{1}{\sqrt{n}} \right \}^n$ vectors. The key algorithmic question is whether these problems can be computationally efficiently approximated within a factor $\approx 1$. Following the introduction and success of \emph{Controlled Loosening-up} (CLuP-SK) algorithms in finding near ground state energies of closely related Sherrington-Kirkpatrick (SK) models [82], we here propose a CLuP$\pm$Hop counterparts for $\pm$Hop models. Fully lifted random duality theory (fl RDT) [78] is utilized to characterize CLuP$\pm$Hop \emph{typical} dynamics. An excellent agreement between practical performance and theoretical predictions is observed. In particular, for $n$ as small as few thousands CLuP$\pm$Hop achieve $\sim 1.77$ and $\sim 0.33$ as the ground state free energies of the positive and negative Hopfield models. At the same time we obtain on the 6th level of lifting (6-spl RDT) corresponding theoretical thermodynamic ($n\rightarrow\infty$) limits $\approx 1.7784$ and $\approx 0.3281$. This positions determining Hopfield models near ground state energies as \emph{typically} easy problems. Moreover, the very same 6th lifting level evaluations allow to uncover a fundamental intrinsic difference between two models: $+$Hop's near optimal configurations are \emph{typically close} to each other whereas the $-$Hop's are \emph{typically far away}.
- Africa > Sudan (0.04)
- North America > United States > Colorado > Denver County > Denver (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- (6 more...)
The Case for "Thick Evaluations" of Cultural Representation in AI
Qadri, Rida, Diaz, Mark, Wang, Ding, Madaio, Michael
To a ddress these gaps, prior work has sought to evaluate the cultural representations within AI generated output, b ut with few exceptions [30, 67], mostly through quantified, metricized approaches to representation such as statistical similarities and benchmark-style scoring [49, 84]. However, the use of these methods presumes that representation is an o bjective construct with an empirical, definitive ground truth that outputs can be compared against [e.g., 42, 84] [fo r a critique of ground truth, see 59]. Given limitations of these computational methods, evaluation of representation is reduced to basic recognition or factual generation of artifacts. Even when human feedback on representation is sought, it is solicited through narrow, constrained, quantitative scales from anonymized crowdworkers who often do not have th e lived experiences to evaluate nuances of cultural representation of other cultures. However, this approach to measuring representation is in contravention to decades of scholarship in the social sciences that emphasizes the subjective nature of representation, where judgments about representation in visual media are constructed in conversation with the viewer's lived experiences and the broader context within which an image is Permission to make digital or hard copies of all or part of thi s work for personal or classroom use is granted without fee pr ovided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page.
- Asia > Sri Lanka (0.06)
- North America > United States > New York > New York County > New York City (0.04)
- Asia > Pakistan > Gilgit-Baltistan > Gilgit (0.04)
- (21 more...)
- Research Report (1.00)
- Instructional Material > Course Syllabus & Notes (0.46)
Unification of Balti and trans-border sister dialects in the essence of LLMs and AI Technology
Sharif, Muhammad, Yi, Jiangyan, Shoaib, Muhammad
The language called Balti belongs to the Sino-Tibetan, specifically the Tibeto-Burman language family. It is understood with variations, across populations in India, China, Pakistan, Nepal, Tibet, Burma, and Bhutan, influenced by local cultures and producing various dialects. Considering the diverse cultural, socio-political, religious, and geographical impacts, it is important to step forward unifying the dialects, the basis of common root, lexica, and phonological perspectives, is vital. In the era of globalization and the increasingly frequent developments in AI technology, understanding the diversity and the efforts of dialect unification is important to understanding commonalities and shortening the gaps impacted by unavoidable circumstances. This article analyzes and examines how artificial intelligence AI in the essence of Large Language Models LLMs, can assist in analyzing, documenting, and standardizing the endangered Balti Language, based on the efforts made in different dialects so far.
Unlocking Real-Time Fluorescence Lifetime Imaging: Multi-Pixel Parallelism for FPGA-Accelerated Processing
Erbas, Ismail, Amarnath, Aporva, Pandey, Vikas, Swaminathan, Karthik, Wang, Naigang, Intes, Xavier
Fluorescence lifetime imaging (FLI) is a widely used technique in the biomedical field for measuring the decay times of fluorescent molecules, providing insights into metabolic states, protein interactions, and ligand-receptor bindings. However, its broader application in fast biological processes, such as dynamic activity monitoring, and clinical use, such as in guided surgery, is limited by long data acquisition times and computationally demanding data processing. While deep learning has reduced post-processing times, time-resolved data acquisition remains a bottleneck for real-time applications. To address this, we propose a method to achieve real-time FLI using an FPGA-based hardware accelerator. Specifically, we implemented a GRU-based sequence-to-sequence (Seq2Seq) model on an FPGA board compatible with time-resolved cameras. The GRU model balances accurate processing with the resource constraints of FPGAs, which have limited DSP units and BRAM. The limited memory and computational resources on the FPGA require efficient scheduling of operations and memory allocation to deploy deep learning models for low-latency applications. We address these challenges by using STOMP, a queue-based discrete-event simulator that automates and optimizes task scheduling and memory management on hardware. By integrating a GRU-based Seq2Seq model and its compressed version, called Seq2SeqLite, generated through knowledge distillation, we were able to process multiple pixels in parallel, reducing latency compared to sequential processing. We explore various levels of parallelism to achieve an optimal balance between performance and resource utilization. Our results indicate that the proposed techniques achieved a 17.7x and 52.0x speedup over manual scheduling for the Seq2Seq model and the Seq2SeqLite model, respectively.
- North America > United States > District of Columbia > Washington (0.05)
- North America > United States > New York > Rensselaer County > Troy (0.04)
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
- (2 more...)
- Information Technology (1.00)
- Health & Medicine (0.94)
From Local Concepts to Universals: Evaluating the Multicultural Understanding of Vision-Language Models
Bhatia, Mehar, Ravi, Sahithya, Chinchure, Aditya, Hwang, Eunjeong, Shwartz, Vered
Despite recent advancements in vision-language models, their performance remains suboptimal on images from non-western cultures due to underrepresentation in training datasets. Various benchmarks have been proposed to test models' cultural inclusivity, but they have limited coverage of cultures and do not adequately assess cultural diversity across universal as well as culture-specific local concepts. To address these limitations, we introduce the GlobalRG benchmark, comprising two challenging tasks: retrieval across universals and cultural visual grounding. The former task entails retrieving culturally diverse images for universal concepts from 50 countries, while the latter aims at grounding culture-specific concepts within images from 15 countries. Our evaluation across a wide range of models reveals that the performance varies significantly across cultures -- underscoring the necessity for enhancing multicultural understanding in vision-language models.
- Asia > East Asia (0.20)
- Asia > Southeast Asia (0.15)
- North America > Central America (0.14)
- (59 more...)
Automatic Logical Forms improve fidelity in Table-to-Text generation
Table-to-text systems generate natural language statements from structured data like tables. While end-to-end techniques suffer from low factual correctness (fidelity), a previous study reported gains when using manual logical forms (LF) that represent the selected content and the semantics of the target text. Given the manual step, it was not clear whether automatic LFs would be effective, or whether the improvement came from content selection alone. We present TlT which, given a table and a selection of the content, first produces LFs and then the textual statement. We show for the first time that automatic LFs improve quality, with an increase in fidelity of 30 points over a comparable system not using LFs. Our experiments allow to quantify the remaining challenges for high factual correctness, with automatic selection of content coming first, followed by better Logic-to-Text generation and, to a lesser extent, better Table-to-Logic parsing.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Austria (0.04)
- Oceania > Fiji (0.04)
- (31 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
- Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.49)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
HoloNets: Spectral Convolutions do extend to Directed Graphs
Koke, Christian, Cremers, Daniel
Within the graph learning community, conventional wisdom dictates that spectral convolutional networks may only be deployed on undirected graphs: Only there could the existence of a well-defined graph Fourier transform be guaranteed, so that information may be translated between spatial- and spectral domains. Here we show this traditional reliance on the graph Fourier transform to be superfluous and -- making use of certain advanced tools from complex analysis and spectral theory -- extend spectral convolutions to directed graphs. We provide a frequency-response interpretation of newly developed filters, investigate the influence of the basis used to express filters and discuss the interplay with characteristic operators on which networks are based. In order to thoroughly test the developed theory, we conduct experiments in real world settings, showcasing that directed spectral convolutional networks provide new state of the art results for heterophilic node classification on many datasets and -- as opposed to baselines -- may be rendered stable to resolution-scale varying topological perturbations.
- Europe > Switzerland > Basel-City > Basel (0.04)
- Asia > Middle East > Jordan (0.04)
- Africa > Rwanda > Kigali > Kigali (0.04)
- (10 more...)
Revolutionizing Global Food Security: Empowering Resilience through Integrated AI Foundation Models and Data-Driven Solutions
Shoaib, Mohamed R., Emara, Heba M., Zhao, Jun
Food security, a global concern, necessitates precise and diverse data-driven solutions to address its multifaceted challenges. This paper explores the integration of AI foundation models across various food security applications, leveraging distinct data types, to overcome the limitations of current deep and machine learning methods. Specifically, we investigate their utilization in crop type mapping, cropland mapping, field delineation and crop yield prediction. By capitalizing on multispectral imagery, meteorological data, soil properties, historical records, and high-resolution satellite imagery, AI foundation models offer a versatile approach. The study demonstrates that AI foundation models enhance food security initiatives by providing accurate predictions, improving resource allocation, and supporting informed decision-making. These models serve as a transformative force in addressing global food security limitations, marking a significant leap toward a sustainable and secure food future.
- Africa > West Africa (0.05)
- Africa > Ethiopia (0.04)
- Africa > Southern Africa (0.04)
- (18 more...)
- Research Report (1.00)
- Overview (1.00)
- Food & Agriculture > Agriculture (1.00)
- Education > Health & Safety > School Nutrition (0.46)
- Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.36)
FIGO: Enhanced Fingerprint Identification Approach Using GAN and One Shot Learning Techniques
Yilmaz, Ibrahim, Abouyoussef, Mahmoud
Fingerprint evidence plays an important role in a criminal investigation for the identification of individuals. Although various techniques have been proposed for fingerprint classification and feature extraction, automated fingerprint identification of fingerprints is still in its earliest stage. The performance of traditional \textit{Automatic Fingerprint Identification System} (AFIS) depends on the presence of valid minutiae points and still requires human expert assistance in feature extraction and identification stages. Based on this motivation, we propose a Fingerprint Identification approach based on Generative adversarial network and One-shot learning techniques (FIGO). Our solution contains two components: fingerprint enhancement tier and fingerprint identification tier. First, we propose a Pix2Pix model to transform low-quality fingerprint images to a higher level of fingerprint images pixel by pixel directly in the fingerprint enhancement tier. With the proposed enhancement algorithm, the fingerprint identification model's performance is significantly improved. Furthermore, we develop another existing solution based on Gabor filters as a benchmark to compare with the proposed model by observing the fingerprint device's recognition accuracy. Experimental results show that our proposed Pix2pix model has better support than the baseline approach for fingerprint identification. Second, we construct a fully automated fingerprint feature extraction model using a one-shot learning approach to differentiate each fingerprint from the others in the fingerprint identification process. Two twin convolutional neural networks (CNNs) with shared weights and parameters are used to obtain the feature vectors in this process. Using the proposed method, we demonstrate that it is possible to learn necessary information from only one training sample with high accuracy.
- North America > United States > Arkansas > Faulkner County > Conway (0.14)
- North America > United States > Tennessee > Putnam County > Cookeville (0.04)
- Europe > Moldova > Bălți > Bălți (0.04)