gop
Evaluating Logit-Based GOP Scores for Mispronunciation Detection
Parikh, Aditya Kamlesh, Tejedor-Garcia, Cristian, Cucchiarini, Catia, Strik, Helmer
Pronunciation assessment relies on goodness of pronunciation (GOP) scores, traditionally derived from softmax-based posterior probabilities. However, posterior probabilities may suffer from overconfidence and poor phoneme separation, limiting their effectiveness. This study compares logit-based GOP scores with probability-based GOP scores for mispronunciation detection. We conducted our experiment on two L2 English speech datasets spoken by Dutch and Mandarin speakers, assessing classification performance and correlation with human ratings. Logit-based methods outperform probability-based GOP in classification, but their effectiveness depends on dataset characteristics. The maximum logit GOP shows the strongest alignment with human perception, while a combination of different GOP scores balances probability and logit features. The findings suggest that hybrid GOP methods incorporating uncertainty modeling and phoneme-specific weighting improve pronunciation assessment.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- Europe > Netherlands (0.04)
Segmentation-free Goodness of Pronunciation
Cao, Xinwei, Fan, Zijian, Svendsen, Torbjørn, Salvi, Giampiero
Mispronunciation detection and diagnosis (MDD) is a significant part in modern computer aided language learning (CALL) systems. Within MDD, phoneme-level pronunciation assessment is key to helping L2 learners improve their pronunciation. However, most systems are based on a form of goodness of pronunciation (GOP) which requires pre-segmentation of speech into phonetic units. This limits the accuracy of these methods and the possibility to use modern CTC-based acoustic models for their evaluation. In this study, we first propose self-alignment GOP (GOP-SA) that enables the use of CTC-trained ASR models for MDD. Next, we define a more general alignment-free method that takes all possible alignments of the target phoneme into account (GOP-AF). We give a theoretical account of our definition of GOP-AF, an implementation that solves potential numerical issues as well as a proper normalization which makes the method applicable with acoustic models with different peakiness over time. We provide extensive experimental results on the CMU Kids and Speechocean762 datasets comparing the different definitions of our methods, estimating the dependency of GOP-AF on the peakiness of the acoustic models and on the amount of context around the target phoneme. Finally, we compare our methods with recent studies over the Speechocean762 data showing that the feature vectors derived from the proposed method achieve state-of-the-art results on phoneme-level pronunciation assessment.
LINR-PCGC: Lossless Implicit Neural Representations for Point Cloud Geometry Compression
Huang, Wenjie, Yang, Qi, Xia, Shuting, Huang, He, Li, Zhu, Xu, Yiling
Existing AI-based point cloud compression methods struggle with dependence on specific training data distributions, which limits their real-world deployment. Implicit Neural Representation (INR) methods solve the above problem by encoding overfitted network parameters to the bitstream, resulting in more distribution-agnostic results. However, due to the limitation of encoding time and decoder size, current INR based methods only consider lossy geometry compression. In this paper, we propose the first INR based lossless point cloud geometry compression method called Lossless Implicit Neural Representations for Point Cloud Geometry Compression (LINR-PCGC). To accelerate encoding speed, we design a group of point clouds level coding framework with an effective network initialization strategy, which can reduce around 60% encoding time. A lightweight coding network based on multiscale SparseConv, consisting of scale context extraction, child node prediction, and model compression modules, is proposed to realize fast inference and compact decoder size. Experimental results show that our method consistently outperforms traditional and AI-based methods: for example, with the convergence time in the MVUB dataset, our method reduces the bitstream by approximately 21.21% compared to G-PCC TMC13v23 and 21.95% compared to SparsePCGC. Our project can be seen on https://huangwenjie2023.github.io/LINR-PCGC/.
- North America > United States > Missouri > Jackson County > Kansas City (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
UAR-NVC: A Unified AutoRegressive Framework for Memory-Efficient Neural Video Compression
Wang, Jia, Zhang, Xinfeng, Zhang, Gai, Zhu, Jun, Tang, Lv, Zhang, Li
Implicit Neural Representations (INRs) have demonstrated significant potential in video compression by representing videos as neural networks. However, as the number of frames increases, the memory consumption for training and inference increases substantially, posing challenges in resource-constrained scenarios. Inspired by the success of traditional video compression frameworks, which process video frame by frame and can efficiently compress long videos, we adopt this modeling strategy for INRs to decrease memory consumption, while aiming to unify the frameworks from the perspective of timeline-based autoregressive modeling. In this work, we present a novel understanding of INR models from an autoregressive (AR) perspective and introduce a Unified AutoRegressive Framework for memory-efficient Neural Video Compression (UAR-NVC). UAR-NVC integrates timeline-based and INR-based neural video compression under a unified autoregressive paradigm. It partitions videos into several clips and processes each clip using a different INR model instance, leveraging the advantages of both compression frameworks while allowing seamless adaptation to either in form. To further reduce temporal redundancy between clips, we design two modules to optimize the initialization, training, and compression of these model parameters. UAR-NVC supports adjustable latencies by varying the clip length. Extensive experimental results demonstrate that UAR-NVC, with its flexible video clip setting, can adapt to resource-constrained environments and significantly improve performance compared to different baseline models.
- Asia > China (0.14)
- North America > United States (0.14)
- North America > Canada (0.14)
Multi-Task Decision-Making for Multi-User 360 Video Processing over Wireless Networks
Badnava, Babak, Chakareski, Jacob, Hashemi, Morteza
We study a multi-task decision-making problem for 360 video processing in a wireless multi-user virtual reality (VR) system that includes an edge computing unit (ECU) to deliver 360 videos to VR users and offer computing assistance for decoding/rendering of video frames. However, this comes at the expense of increased data volume and required bandwidth. To balance this trade-off, we formulate a constrained quality of experience (QoE) maximization problem in which the rebuffering time and quality variation between video frames are bounded by user and video requirements. To solve the formulated multi-user QoE maximization, we leverage deep reinforcement learning (DRL) for multi-task rate adaptation and computation distribution (MTRC). The proposed MTRC approach does not rely on any predefined assumption about the environment and relies on video playback statistics (i.e., past throughput, decoding time, transmission time, etc.), video information, and the resulting performance to adjust the video bitrate and computation distribution. We train MTRC with real-world wireless network traces and 360 video datasets to obtain evaluation results in terms of the average QoE, peak signal-to-noise ratio (PSNR), rebuffering time, and quality variation. Our results indicate that the MTRC improves the users' QoE compared to state-of-the-art rate adaptation algorithm. Specifically, we show a 5.97 dB to 6.44 dB improvement in PSNR, a 1.66X to 4.23X improvement in rebuffering time, and a 4.21 dB to 4.35 dB improvement in quality variation.
- North America > United States > New Jersey (0.04)
- North America > United States > Kansas (0.04)
- Telecommunications (0.46)
- Information Technology > Hardware (0.35)
- Leisure & Entertainment (0.35)
GOP, Dems push to expand Trump-era deal, Father's Day celebration takes deadly turn and more top headlines
US President Joe Biden meets with China's President Xi Jinping during a virtual summit from the Roosevelt Room of the White House in Washington, DC, November 15, 2021. OLIVE BRANCH - GOP, Dems push Biden admin to expand Trump-era deal to blunt American adversaries. 'VIOLENCE PREVAILS' - Police commander doesn't mince words after Father's Day celebration takes deadly turn. IN GOOD COMPANY? - Americans react to Mark Cuban's claim on wokeness and business. JOURNALISM JAB - Media ignores Biden's'dumb question' slam on reporter after hounding Trump.
- Asia > China (0.95)
- North America > United States > District of Columbia > Washington (0.26)
- North America > United States > Texas (0.06)
- (2 more...)
Speech Intelligibility Assessment of Dysarthric Speech by using Goodness of Pronunciation with Uncertainty Quantification
Yeo, Eun Jung, Choi, Kwanghee, Kim, Sunhee, Chung, Minhwa
This paper proposes an improved Goodness of Pronunciation (GoP) that utilizes Uncertainty Quantification (UQ) for automatic speech intelligibility assessment for dysarthric speech. Current GoP methods rely heavily on neural network-driven overconfident predictions, which is unsuitable for assessing dysarthric speech due to its significant acoustic differences from healthy speech. To alleviate the problem, UQ techniques were used on GoP by 1) normalizing the phoneme prediction (entropy, margin, maxlogit, logit-margin) and 2) modifying the scoring function (scaling, prior normalization). As a result, prior-normalized maxlogit GoP achieves the best performance, with a relative increase of 5.66%, 3.91%, and 23.65% compared to the baseline GoP for English, Korean, and Tamil, respectively. Furthermore, phoneme analysis is conducted to identify which phoneme scores significantly correlate with intelligibility scores in each language.
- North America > Canada > Quebec > Montreal (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > Florida > Hillsborough County > University (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)
GOP reacts to Hunter Biden IRS whistleblower, Fetterman raises eyebrows and more top headlines
'DEEPLY CONCERNING' - Republicans respond after IRS whistleblower says Hunter Biden investigation is being mishandled. 'FRIGHTENING' - Fetterman's opening statement upon return to Senate after hospitalization raises eyebrows. WATCH THE WALLET - Crypto criminals beware: AI is after you. MISINFORMATION MACHINES – AI chatbot'hallucinations' could pose political, intellectual, institutional dangers. Continue reading … ROYAL REACTION - Lee Cohen explains why Meghan Markle deserves praise for skipping the coronation,.
- Oceania > Australia > New South Wales (0.06)
- North America > United States > North Carolina (0.06)
- North America > United States > Kansas > Cowley County (0.06)
- Media > News (0.80)
- Government > Tax (0.74)
- Government > Regional Government > North America Government > United States Government (0.74)
Spatially Constrained Geodesign Optimization (GOP) for Improving Agricultural Watershed Sustainability
Xie, Yiqun (University of Minnesota, Twin Cities) | Yang, KwangSoo (Florida Atlantic University) | Shekhar, Shashi (University of Minnesota, Twin Cities) | Dalzell, Brent (University of Minnesota, Twin Cities) | Mulla, David (University of Minnesota, Twin Cities)
Given an agricultural watershed containing a set of spatial units, and a set of land management practices, the Geodesign Optimization (GOP) aims to find a land management practice for each spatial unit that optimizes overall water quality improvements in the watershed under both budget constraint and spatial constraints (e.g., minimum contiguous area, shape) arising from farm equipment operation practicalities. GOP is important for redesign of agricultural watersheds in Midwestern US to mitigate soil and water quality degradation and loss of habitat. The problem is computationally challenging as a large-scale combinatorial problem (NP-hard) under spatial constraints. Existing optimization techniques do not address spatial constraints, and lead to impractical solutions requiring frequent farm equipment reconfiguration. In this paper, we formalize the spatially-constrained GOP and propose a novel spatial optimizer which explores optimal solution without constraint violations. Our approach is further validated through a Geodesign case study at Seven Mile Creek watershed in Midwestern US.
- North America > United States > Minnesota (0.05)
- North America > United States > Mississippi (0.04)
- North America > United States > Kentucky (0.04)
- (2 more...)
Generative Structure Learning for Markov Logic Networks Based on Graph of Predicates
Dinh, Quang-Thang (Universite d'Orleans) | Exbrayat, Matthieu (Universite d'Orleans) | Vrain, Christel (Universite d'Orleans)
In this paper we present a new algorithm for generatively learning the structure of Markov Logic Networks. This algorithm relies on a graph of predicates, which summarizes the links existing between predicates and on relational information between ground atoms in the training database. Candidate clauses are produced by means of a heuristical variabilization technique. According to our first experiments, this approach appears to be promising.
- Europe > Middle East > Malta > Port Region > Southern Harbour District > Floriana (0.04)
- North America > United States > California > Santa Clara County > San Jose (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- Europe > France > Centre-Val de Loire > Loiret > Orleans (0.04)