mohamed
HARNESS: Lightweight Distilled Arabic Speech Foundation Models
sukhadia, Vrunda N., Chowdhury, Shammur Absar
Large pre-trained speech models excel in downstream tasks but their deployment is impractical for resource-limited environments. In this paper, we introduce HArnESS, the first Arabic-centric self-supervised speech model family, designed to capture Arabic speech nuances. Using iterative self-distillation, we train large bilingual HArnESS (HL) SSL models and then distill knowledge into compressed student models (HS, HST), preserving Arabic-specific representations. We use low-rank approximation to further compact the teacher's discrete supervision into shallow, thin models. We evaluate HArnESS on Arabic ASR, Speaker Emotion Recognition (SER), and Dialect Identification (DID), demonstrating effectiveness against HuBERT and XLS-R. With minimal fine-tuning, HArnESS achieves SOTA or comparable performance, making it a lightweight yet powerful alternative for real-world use. We release our distilled models and findings to support responsible research and deployment in low-resource settings.
- Asia > Middle East > Qatar (0.04)
- Asia > Singapore (0.04)
DeepChest: Dynamic Gradient-Free Task Weighting for Effective Multi-Task Learning in Chest X-ray Classification
Mohamed, Youssef, Mohamed, Noran, Abouhashad, Khaled, Tang, Feilong, Atito, Sara, Jameel, Shoaib, Razzak, Imran, Zaky, Ahmed B.
While Multi-Task Learning (MTL) offers inherent advantages in complex domains such as medical imaging by enabling shared representation learning, effectively balancing task contributions remains a significant challenge. This paper addresses this critical issue by introducing DeepChest, a novel, computationally efficient and effective dynamic task-weighting framework specifically designed for multi-label chest X-ray (CXR) classification. Unlike existing heuristic or gradient-based methods that often incur substantial overhead, DeepChest leverages a performance-driven weighting mechanism based on effective analysis of task-specific loss trends. Given a network architecture (e.g., ResNet18), our model-agnostic approach adaptively adjusts task importance without requiring gradient access, thereby significantly reducing memory usage and achieving a threefold increase in training speed. It can be easily applied to improve various state-of-the-art methods. Extensive experiments on a large-scale CXR dataset demonstrate that DeepChest not only outperforms state-of-the-art MTL methods by 7% in overall accuracy but also yields substantial reductions in individual task losses, indicating improved generalization and effective mitigation of negative transfer. The efficiency and performance gains of DeepChest pave the way for more practical and robust deployment of deep learning in critical medical diagnostic applications. The code is publicly available at https://github.com/youssefkhalil320/DeepChest-MTL
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > District of Columbia > Washington (0.05)
- (7 more...)
- Research Report > New Finding (0.46)
- Research Report > Promising Solution (0.34)
- Health & Medicine > Nuclear Medicine (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
A Simple Generative Network
Generative neural networks are able to mimic intricate probability distributions such as those representing handwritten text, natural images, etc. Since their inception several models were proposed. The most successful of these were based on adversarial (GAN), auto-encoding (VAE) and maximum mean discrepancy (MMD) relatively complex architectures and schemes. Surprisingly, a very simple architecture (a single feed-forward neural network) in conjunction with an obvious optimization goal (Kullback-Leibler divergence) was apparently overlooked. This paper demonstrates that such a model (denoted SGN for its simplicity) is able to generate samples visually and quantitatively competitive as compared with the fore-mentioned state of the art methods.
Look Before You Leap: Problem Elaboration Prompting Improves Mathematical Reasoning in Large Language Models
Liao, Haoran, Tian, Jidong, Hu, Shaohua, He, Hao, Jin, Yaohui
Large language models (LLMs) still grapple with complex tasks like mathematical reasoning. Despite significant efforts invested in improving prefix prompts or reasoning process, the crucial role of problem context might have been neglected. Accurate recognition of inputs is fundamental for solving mathematical tasks, as ill-formed problems could potentially mislead LLM's reasoning. In this study, we propose a new approach named Problem Elaboration Prompting (PEP) to enhance the mathematical capacities of LLMs. Specifically, PEP decomposes and elucidates the problem context before reasoning, therefore enhancing the context modeling and parsing efficiency. Experiments across datasets and models demonstrate promising performances: (1) PEP demonstrates an overall enhancement in various mathematical tasks. For instance, with the GPT-3.5 model, PEP exhibits improvements of 9.93% and 8.80% on GSM8k through greedy decoding and self-consistency, respectively. (2) PEP can be easily implemented and integrated with other prompting methods. (3) PEP shows particular strength in handling distraction problems.
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > United States > New York > Richmond County > New York City (0.04)
- North America > United States > New York > Queens County > New York City (0.04)
- (4 more...)
Adaptive-Solver Framework for Dynamic Strategy Selection in Large Language Model Reasoning
Zhou, Jianpeng, Zhong, Wanjun, Wang, Yanlin, Wang, Jiahai
Large Language Models (LLMs) are showcasing impressive ability in handling complex reasoning tasks. In real-world situations, problems often span a spectrum of complexities. Humans inherently adjust their problem-solving approaches based on task complexity. However, most methodologies that leverage LLMs tend to adopt a uniform approach: utilizing consistent models, prompting methods, and degrees of problem decomposition, regardless of the problem complexity. Inflexibility of them can bring unnecessary computational overhead or sub-optimal performance. To address this problem, we introduce an Adaptive-Solver framework. It strategically modulates solving strategies based on the difficulties of the problems. Given an initial solution, the framework functions with two primary modules. The initial evaluation module assesses the adequacy of the current solution. If improvements are needed, the subsequent adaptation module comes into play. Within this module, three key adaptation strategies are employed: (1) Model Adaptation: Switching to a stronger LLM when a weaker variant is inadequate. (2) Prompting Method Adaptation: Alternating between different prompting techniques to suit the problem's nuances. (3) Decomposition Granularity Adaptation: Breaking down a complex problem into more fine-grained sub-questions to enhance solvability. Through such dynamic adaptations, our framework not only enhances computational efficiency but also elevates the overall performance. This dual-benefit ensures both the efficiency of the system for simpler tasks and the precision required for more complex questions. Experimental results from complex reasoning tasks reveal that the prompting method adaptation and decomposition granularity adaptation enhance performance across all tasks. Furthermore, the model adaptation approach significantly reduces API costs (up to 50%) while maintaining superior performance.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Canada > Ontario > Toronto (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- (4 more...)
EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis
Nguyen, Tu Anh, Hsu, Wei-Ning, D'Avirro, Antony, Shi, Bowen, Gat, Itai, Fazel-Zarani, Maryam, Remez, Tal, Copet, Jade, Synnaeve, Gabriel, Hassid, Michael, Kreuk, Felix, Adi, Yossi, Dupoux, Emmanuel
Recent work has shown that it is possible to resynthesize high-quality speech based, not on text, but on low bitrate discrete units that have been learned in a self-supervised fashion and can therefore capture expressive aspects of speech that are hard to transcribe (prosody, voice styles, non-verbal vocalization). The adoption of these methods is still limited by the fact that most speech synthesis datasets are read, severely limiting spontaneity and expressivity. Here, we introduce Expresso, a high-quality expressive speech dataset for textless speech synthesis that includes both read speech and improvised dialogues rendered in 26 spontaneous expressive styles. We illustrate the challenges and potentials of this dataset with an expressive resynthesis benchmark where the task is to encode the input in low-bitrate units and resynthesize it in a target voice while preserving content and style. We evaluate resynthesis quality with automatic metrics for different self-supervised discrete encoders, and explore tradeoffs between quality, bitrate and invariance to speaker and style. All the dataset, evaluation metrics and baseline models are open source
- Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
- Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)
- Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.69)
Improving Textless Spoken Language Understanding with Discrete Units as Intermediate Target
Wu, Guan-Wei, Lin, Guan-Ting, Li, Shang-Wen, Lee, Hung-yi
Spoken Language Understanding (SLU) is a task that aims to extract semantic information from spoken utterances. Previous research has made progress in end-to-end SLU by using paired speech-text data, such as pre-trained Automatic Speech Recognition (ASR) models or paired text as intermediate targets. However, acquiring paired transcripts is expensive and impractical for unwritten languages. On the other hand, Textless SLU extracts semantic information from speech without utilizing paired transcripts. However, the absence of intermediate targets and training guidance for textless SLU often results in suboptimal performance. In this work, inspired by the content-disentangled discrete units from self-supervised speech models, we proposed to use discrete units as intermediate guidance to improve textless SLU performance. Our method surpasses the baseline method on five SLU benchmark corpora. Additionally, we find that unit guidance facilitates few-shot learning and enhances the model's ability to handle noise.
- Asia > Taiwan (0.05)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > Pennsylvania (0.04)
- Europe > United Kingdom (0.04)
normflows: A PyTorch Package for Normalizing Flows
Stimper, Vincent, Liu, David, Campbell, Andrew, Berenz, Vincent, Ryll, Lukas, Schölkopf, Bernhard, Hernández-Lobato, José Miguel
Normalizing flows model probability distributions through an expressive tractable density. They transform a simple base distribution, such as a Gaussian, through a sequence of invertible functions, which are referred to as layers. These layers typically use neural networks to become very expressive. Flows are ubiquitous in machine learning and have been applied to image generation, text modeling, variational inference, approximating Boltzmann distributions, and many other problems. Here, we present normflows, a Python package for normalizing flows. It allows to build normalizing flow models from a suite of base distributions, flow layers, and neural networks. The package is implemented in the popular deep learning framework PyTorch, which simplifies the integration of flows in larger machine learning models or pipelines. It supports most of the common normalizing flow architectures, such as Real NVP, Glow, Masked Autoregressive Flows, Neural Spline Flows, Residual Flows, and many more. The package can be easily installed via pip and the code is publicly available on GitHub.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.15)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.05)
- North America > United States (0.05)
Putting Natural in Natural Language Processing
Human language is firstly spoken and only secondarily written. Text, however, is a very convenient and efficient representation of language, and modern civilization has made it ubiquitous. Thus the field of NLP has overwhelmingly focused on processing written rather than spoken language. Work on spoken language, on the other hand, has been siloed off within the largely separate speech processing community which has been inordinately preoccupied with transcribing speech into text. Recent advances in deep learning have led to a fortuitous convergence in methods between speech processing and mainstream NLP. Arguably, the time is ripe for a unification of these two fields, and for starting to take spoken language seriously as the primary mode of human communication. Truly natural language processing could lead to better integration with the rest of language science and could lead to systems which are more data-efficient and more human-like, and which can communicate beyond the textual modality.
- North America > United States > New York (0.05)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Exploration of carbonate aggregates in road construction using ultrasonic and artificial intelligence approaches
Abdelhedi, Mohamed, Jabbar, Rateb, Abbes, Chedly
The COVID-19 pandemic has significantly impacted the construction sector, which is sensitive to economic cycles. In order to boost value and efficiency in this sector, the use of innovative exploration technologies such as ultrasonic and Artificial Intelligence techniques in building material research is becoming increasingly crucial. In this study, we developed two models for predicting the Los Angeles (LA) and Micro Deval (MDE) coefficients, two important geotechnical tests used to determine the quality of rock aggregates. These coefficients describe the resistance of aggregates to fragmentation and abrasion. The ultrasound velocity, porosity, and density of the rocks were determined and used as inputs to develop prediction models using multiple regression and an artificial neural network. These models may be used to assess the quality of rock aggregates at the exploration stage without the need for tedious laboratory analysis.
- North America > United States > California > Los Angeles County > Los Angeles (0.27)
- Asia > Middle East > Qatar (0.17)
- Africa > Middle East > Tunisia (0.15)
- Materials > Construction Materials (0.90)
- Energy > Oil & Gas > Upstream (0.68)
- Transportation > Ground > Road (0.51)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.49)