AITopics

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Neural Information Processing SystemsJun-22-2026, 04:53:40 GMT

Corporate Needs You to Find the Difference: Revisiting Submodular and Supermodular Ratio Optimization Problems

We consider the following question: given a submodular/supermodular set function f: 2V R, how should one minimize/maximize its average value f(S)/|S| over non-empty subsets S V? This problem generalizes several well-known objectives, including Densest Subgraph (DSG), Densest Supermodular Set (DSS), and Submodular Function Minimization (SFM). Motivated by recent applications [42, 34], we formalize two new broad problems: the Unrestricted Sparsest Submodular Set (USSS) and Unrestricted Densest Supermodular Set (UDSS), both of which allow negative and non-monotone functions. Using classical results, we show that DSS, SFM, USSS, UDSS, and Minimum Norm Point (MNP) are all equivalent under strongly polynomial-time reductions. This equivalence enables algorithmic cross-over: methods designed for one problem can be repurposed to solve others efficiently. In particular, we use the perspective of the minimum norm point in the base polyhedron of a sub/supermodular function, which, via Fujishige's results, yields the dense decomposition as a byproduct.

artificial intelligence, data mining, machine learning, (19 more...)

Country: North America > United States (1.00)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining (0.93)

Neural Information Processing SystemsApr-29-2026, 23:05:54 GMT

Parallel Submodular Function Minimization

We consider the parallel complexity of submodular function minimization (SFM). We provide a pair of methods which obtain two new query versus depth tradeoffs a submodular function defined on subsets of n elements that has integer values between M and M. The first method has depth 2 and query complexity

artificial intelligence, machine learning, natural language, (17 more...)

Country:

Europe (0.68)
North America > United States (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.89)

Neural Information Processing SystemsMar-20-2026, 21:54:44 GMT

Categorical Flow Matching on Statistical Manifolds

We introduce Statistical Flow Matching (SFM), a novel and mathematically rigorous flow-matching framework on the manifold of parameterized probability measures inspired by the results from information geometry. We demonstrate the effectiveness of our method on the discrete generation problem by instantiating SFM on the manifold of categorical distributions whose geometric properties remain unexplored in previous discrete generative models. Utilizing the Fisher information metric, we equip the manifold with a Riemannian structure whose intrinsic geometries are effectively leveraged by following the shortest paths of geodesics. We develop an efficient training and sampling algorithm that overcomes numerical stability issues with a diffeomorphism between manifolds. Our distinctive geometric perspective of statistical manifolds allows us to apply optimal transport during training and interpret SFM as following the steepest direction of the natural gradient. Unlike previous models that rely on variational bounds for likelihood estimation, SFM enjoys the exact likelihood calculation for arbitrary probability measures. We manifest that SFM can learn more complex patterns on the statistical manifold where existing models often fail due to strong prior assumptions. Comprehensive experiments on real-world generative tasks ranging from image, text to biological domains further demonstrate that SFM achieves higher sampling quality and likelihood than other discrete diffusion or flow-based models.

artificial intelligence, machine learning, proceedings, (7 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

Neural Information Processing SystemsFeb-15-2026, 10:40:18 GMT

Categorical Flow Matching on Statistical Manifolds

We introduce Statistical Flow Matching (SFM), a novel and mathematically rigorous flow-matching framework on the manifold of parameterized probability measures inspired by the results from information geometry. We demonstrate the effectiveness of our method on the discrete generation problem by instantiat-ing SFM on the manifold of categorical distributions whose geometric properties remain unexplored in previous discrete generative models. Utilizing the Fisher information metric, we equip the manifold with a Riemannian structure whose intrinsic geometries are effectively leveraged by following the shortest paths of geodesics. We develop an efficient training and sampling algorithm that overcomes numerical stability issues with a diffeomorphism between manifolds. Our distinctive geometric perspective of statistical manifolds allows us to apply optimal transport during training and interpret SFM as following the steepest direction of the natural gradient. Unlike previous models that rely on variational bounds for likelihood estimation, SFM enjoys the exact likelihood calculation for arbitrary probability measures. We manifest that SFM can learn more complex patterns on the statistical manifold where existing models often fail due to strong prior assumptions. Comprehensive experiments on real-world generative tasks ranging from image, text to biological domains further demonstrate that SFM achieves higher sampling quality and likelihood than other discrete diffusion or flow-based models. Our code is available at https://github.com/ccr-cheng/

large language model, machine learning, natural language, (19 more...)

Country:

North America > United States > Illinois > Champaign County > Urbana (0.04)
North America > United States > New York (0.04)
North America > Canada > Quebec (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Ariyanti, Whenty, Chen, Kuan-Yu, Siniscalchi, Sabato Marco, Wang, Hsin-Min, Tsao, Yu

Towards Robust Assessment of Pathological Voices via Combined Low-Level Descriptors and Foundation Model Representations

arXiv.org Artificial IntelligenceDec-12-2025

Abstract-- Perceptual voice quality assessment plays a vital role in diagnosing and monitoring voice disorders. Traditional methods, such as the Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V) and the Grade, Roughness, Breathiness, Asthenia, and Strain (GRBAS) scales, rely on expert raters and are prone to inter-rater variability, emphasizing the need for objective solutions. This study introduces the Voice Quality Assessment Network (VOQANet), a deep learning framework that employs an attention mechanism and Speech Foundation Model (SFM) embeddings to extract high-level features. To further enhance performance, we propose VO-QANet+, which integrates self-supervised SFM embeddings with low-level acoustic descriptors--namely jitter, shimmer, and harmonics-to-noise ratio (HNR). Unlike previous approaches that focus solely on vowel-based phonation (PVQD-A), our models are evaluated on both vowel-level and sentence-level speech (PVQD-S) to assess general-izability. Experimental results demonstrate that sentence-based inputs yield higher accuracy, particularly at the patient level. Overall, VOQANet consistently outperforms baseline models in terms of root mean squared error (RMSE) and Pearson correlation coefficient across CAPE-V and GRBAS dimensions, with VOQANet+ achieving even greater performance gains. Additionally, VOQANet+ maintains consistent performance under noisy conditions, suggesting enhanced robustness for real-world and telehealth applications. This work highlights the value of combining SFM embeddings with low-level features for accurate and robust pathological voice assessment. This work was supported in part by the National Science and T ech-nology Council and Academia Sinica. Whenty Ariyanti is with the Department of Computer Science and Information Engineering, National T aiwan University of Science and T echnology, T aipei 106, T aiwan, and also with the Research Center for Information T echnology Innovation, Academia Sinica, T aipei 11529, T aiwan (e-mail: d11115805@mail.ntust.edu.tw). Kuan-Y u Chen is with the Department of Computer Science and Information Engineering, National T aiwan University of Science and T echnology, T aipei 106, T aiwan (e-mail: kychen@mail.ntust.edu.tw). Sabato Marco Siniscalchi is the University of Palermo, Palermo, Italy (e-mail: sabatomarco.siniscalchi@unipa.it).

artificial intelligence, machine learning, natural language, (17 more...)

2505.21356

Country: Europe > Italy > Sicily > Palermo (0.24)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Health Care Technology (0.68)
Health & Medicine > Therapeutic Area > Neurology > Parkinson's Disease (0.68)
Health & Medicine > Therapeutic Area > Musculoskeletal (0.68)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceNov-19-2025

AsyncVLA: Asynchronous Flow Matching for Vision-Language-Action Models

Jiang, Yuhua, Cheng, Shuang, Ding, Yan, Gao, Feifei, Qi, Biqing

Vision-language-action (VLA) models have recently emerged as a powerful paradigm for building generalist robots. However, traditional VLA models that generate actions through flow matching (FM) typically rely on rigid and uniform time schedules, i.e., synchronous FM (SFM). Without action context awareness and asynchronous self-correction, SFM becomes unstable in long-horizon tasks, where a single action error can cascade into failure. In this work, we propose asynchronous flow matching VLA (AsyncVLA), a novel framework that introduces temporal flexibility in asynchronous FM (AFM) and enables self-correction in action generation. AsyncVLA breaks from the vanilla SFM in VLA models by generating the action tokens in a non-uniform time schedule with action context awareness. Besides, our method introduces the confidence rater to extract confidence of the initially generated actions, enabling the model to selectively refine inaccurate action tokens before execution. Moreover, we propose a unified training procedure for SFM and AFM that endows a single model with both modes, improving KV-cache utilization. Extensive experiments on robotic manipulation benchmarks demonstrate that AsyncVLA is data-efficient and exhibits self-correction ability. AsyncVLA achieves state-of-the-art results across general embodied evaluations due to its asynchronous generation in AFM. Our code is available at https://github.com/YuhuaJiang2002/AsyncVLA.

artificial intelligence, arxiv preprint arxiv, machine learning, (16 more...)

2511.14148

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.54)

arXiv.org Artificial IntelligenceOct-30-2025

SFMS-ALR: Script-First Multilingual Speech Synthesis with Adaptive Locale Resolution

Donepudi, Dharma Teja

Intra - sentence multilingual speech synthesis (code - switching TTS) remains a major challenge due to abrupt language shifts, varied scripts, and mismatched prosody between languages. Conventional TTS systems are typically monolingual and fail to produce natural, intelligible speech in mixed - language contexts. We introduce Script - First Multilingual Synthesis with Adaptive Locale Resolution (SFMS - ALR) an engine - agnostic framework for fluent, real - time code - switched speech generation. SFMS - ALR segments input text by Unicode script, applies adaptive language identification to determine each segment's language and locale, and normalizes prosody using sentiment - aware adjustments to preserve expressive continuity across languages. The algorithm generates a unified SSML representation with appropriate or spans and synthesizes the utterance in a single TTS request. Unlike end - to - end multilingual models, SFMS - ALR requires no retraining and integrates seamlessly with existing voices from Google, Apple, Amazon, and other providers. Comparative analysis with data - driven pipelines such as Unicom and Mask LID demonstrates SFMS - ALR's flexibility, interpretability, and immediate deployability . The framework establishes a modular baseline for high - quality, engine - independent multilingual TTS and outlines evaluation strategies for intelligibility, naturalness, and user preference.

artificial intelligence, natural language, speech synthesis, (13 more...)

2510.25178

Genre: Research Report (0.51)

Industry: Information Technology > Services (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.88)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.74)

arXiv.org Artificial IntelligenceOct-24-2025

Shallow Flow Matching for Coarse-to-Fine Text-to-Speech Synthesis

Yang, Dong, Cai, Yiyi, Saito, Yuki, Wang, Lixu, Saruwatari, Hiroshi

We propose Shallow Flow Matching (SFM), a novel mechanism that enhances flow matching (FM)-based text-to-speech (TTS) models within a coarse-to-fine generation paradigm. Unlike conventional FM modules, which use the coarse representations from the weak generator as conditions, SFM constructs intermediate states along the FM paths from these representations. During training, we introduce an orthogonal projection method to adaptively determine the temporal position of these states, and apply a principled construction strategy based on a single-segment piecewise flow. The SFM inference starts from the intermediate state rather than pure noise, thereby focusing computation on the latter stages of the FM paths. We integrate SFM into multiple TTS models with a lightweight SFM head. Experiments demonstrate that SFM yields consistent gains in speech naturalness across both objective and subjective evaluations, and significantly accelerates inference when using adaptive-step ODE solvers. Demo and codes are available at https://ydqmkkx.github.io/SFMDemo/.

artificial intelligence, machine learning, speech synthesis, (17 more...)