AITopics | interspeech

Collaborating Authors

interspeech

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Quaternion Self-Attention with Shared Scores

Yamauchi, Shogo, Nitta, Tohru, Tamori, Hideaki

arXiv.org Machine LearningMay-26-2026

Quaternion neural networks are parameter-efficient and model multidimensional dependencies by representing four related features as a single entity. However, existing quaternion self-attention computes component-wise scores and applies independent softmax operations to each component, which increases the computational cost and allows attention distributions to diverge across components. We propose a shared-score quaternion self-attention mechanism that computes a single real-valued score using the quaternion inner product and applies a shared attention distribution across all components. This reduces score-computation multiplications by 75% and the number of softmax operations from four to one. We prove that, when queries and keys are produced by quaternion linear projections that induce component pre-mixing, the component-wise and shared scores lie in the same interaction subspace, indicating that independent component-wise attention primarily re-parameterizes the same interactions rather than expanding the feature interaction space. In speech enhancement, our method reduces inference time by up to 44.3% on a GPU and 58.1% on a CPU while maintaining quality, with consistent trends across vision and natural language processing.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

2605.2492

Country:

Asia > Japan (0.28)
North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)

Add feedback

b6404bf461c3c3186bdf5f55756af908-Paper-Conference.pdf

Neural Information Processing SystemsFeb-16-2026, 17:11:31 GMT

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Poland > Lower Silesia Province > Wroclaw (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Speech (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Add feedback

9d276b0a087efdd2404f3295b26c24c1-Paper-Conference.pdf

Neural Information Processing SystemsFeb-16-2026, 03:43:13 GMT

artificial intelligence, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country:

Asia > China > Hong Kong (0.04)
North America > United States > Massachusetts (0.04)
Asia > Singapore > Central Region > Singapore (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.72)

Add feedback

Self-Supervised Generation of Spatial Audio for 360° Video

Pedro Morgado, Nuno Nvasconcelos, Timothy Langlois, Oliver Wang

Neural Information Processing SystemsFeb-12-2026, 02:41:08 GMT

As humans rely on audio localization cues for full scene awareness,spatial audio is a crucial componentof360 video.

artificial intelligence, machine learning, video, (18 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe > Austria > Styria > Graz (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

be1bc7997695495f756312886f566110-Paper.pdf

Neural Information Processing SystemsFeb-10-2026, 23:02:20 GMT

In this work, we propose to use a bio-inspired architecture called Fully Recurrent Convolutional Neural Network(FRCNN) to solvethe separation task. This model containsbottom-up,top-downandlateral connections tofuse information processed atvarious time-scales represented by stages.

artificial intelligence, machine learning, speechandsignalprocessing, (17 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
Europe > Germany > Hamburg (0.04)
Asia > China > Beijing > Beijing (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Neural Analysisand Synthesis: Reconstructing Speechfrom Self-Supervised Representations

Neural Information Processing SystemsFeb-9-2026, 17:25:11 GMT

Adversarially Trained End-to-End Korean Singing Voice Synthesis System.

artificial intelligence, interspeech, natural language, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Asia > South Korea > Seoul > Seoul (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.47)
Information Technology > Artificial Intelligence > Speech (0.46)

Add feedback

6d7d394c9d0c886e9247542e06ebb705-Paper.pdf

Neural Information Processing SystemsFeb-8-2026, 19:46:00 GMT

Our approach is based on a keyobservation about human speech: there isoften ashort pause between each sentence orword.

artificial intelligence, machine learning, urlhttp, (18 more...)

Neural Information Processing Systems

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada > Quebec > Montreal (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Discrete Optimal Transport and Voice Conversion

Selitskiy, Anton, Kocharekar, Maitreya

arXiv.org Artificial IntelligenceDec-2-2025

In this work, we address the voice conversion (VC) task using a vector-based interface. To align audio embeddings between speakers, we employ discrete optimal transport mapping. Our evaluation results demonstrate the high quality and effectiveness of this method. Additionally, we show that applying discrete optimal transport as a post-processing step in audio generation can lead to the incorrect classification of synthetic audio as real.

artificial intelligence, machine learning, vector, (13 more...)

arXiv.org Artificial Intelligence

2505.04382

Country: North America > United States (0.15)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

Context-Aware Dynamic Chunking for Streaming Tibetan Speech Recognition

Wang, Chao, Cai, Yuqing, Duojie, Renzeng, Zhang, Jin, Liu, Yutong, Tashi, Nyima

arXiv.org Artificial IntelligenceNov-13-2025

ABSTRACT In this work, we propose a streaming speech recognition framework for Amdo Tibetan, built upon a hybrid CTC/Atten-tion architecture with a context-aware dynamic chunking mechanism. The proposed strategy adaptively adjusts chunk widths based on encoding states, enabling flexible receptive fields, cross-chunk information exchange, and robust adaptation to varying speaking rates, thereby alleviating the context truncation problem of fixed-chunk methods. To further capture the linguistic characteristics of Tibetan, we construct a lexicon grounded in its orthographic principles, providing linguistically motivated modeling units. During decoding, an external language model is integrated to enhance semantic consistency and improve recognition of long sentences. Experimental results show that the proposed framework achieves a word error rate (WER) of 6.23% on the test set, yielding a 48.15% relative improvement over the fixed-chunk baseline, while significantly reducing recognition latency and maintaining performance close to global decoding.

machine learning, natural language, recognition, (16 more...)

arXiv.org Artificial Intelligence

2511.09085

Country: Asia > China > Tibet Autonomous Region (0.46)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.89)

Add feedback

Filters

Collaborating Authors

interspeech

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Quaternion Self-Attention with Shared Scores

b6404bf461c3c3186bdf5f55756af908-Paper-Conference.pdf

9d276b0a087efdd2404f3295b26c24c1-Supplemental-Conference.pdf

9d276b0a087efdd2404f3295b26c24c1-Paper-Conference.pdf

Self-Supervised Generation of Spatial Audio for 360° Video

be1bc7997695495f756312886f566110-Paper.pdf

Neural Analysisand Synthesis: Reconstructing Speechfrom Self-Supervised Representations

6d7d394c9d0c886e9247542e06ebb705-Paper.pdf

Discrete Optimal Transport and Voice Conversion

Context-Aware Dynamic Chunking for Streaming Tibetan Speech Recognition