AITopics | phoneme classification

Collaborating Authors

phoneme classification

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

LibriBrain: Over 50 Hours of Within-Subject MEG to Improve Speech Decoding Methods at Scale

Neural Information Processing SystemsJun-15-2026, 02:41:59 GMT

LibriBrain represents the largest single-subject MEG dataset to date for speech decoding, with over 50 hours of recordings--5 larger than the next comparable dataset and 50 larger than most. This unprecedented'depth' of within-subject data enables exploration of neural representations at a scale previously unavailable with non-invasive methods. LibriBrain comprises high-quality MEG recordings together with detailed annotations from a single participant listening to naturalistic spoken English, covering nearly the full Sherlock Holmes canon. Designed to support advances in neural decoding, LibriBrain comes with a Python library for streamlined integration with deep learning frameworks, standard data splits for reproducibility, and baseline results for three foundational decoding tasks: speech detection, phoneme classification, and word classification. Baseline experiments demonstrate that increasing training data yields substantial improvements in decoding performance, highlighting the value of scaling up deep, within-subject datasets. By releasing this dataset, we aim to empower the research community to advance speech decoding methodologies and accelerate the development of safe, effective clinical brain-computer interfaces.

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country: North America > United States (0.67)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

MEGConformer: Conformer-Based MEG Decoder for Robust Speech and Phoneme Classification

de Zuazo, Xabier, Saratxaga, Ibon, Navas, Eva

arXiv.org Artificial IntelligenceDec-2-2025

For Speech Detection, a MEG-oriented SpecAugment provided a first exploration of MEG-specific augmentation. For Phoneme Classification, we used inverse-square-root class weighting and a dynamic grouping loader to handle 100-sample averaged examples. In addition, a simple instance-level normalization proved critical to mitigate distribution shifts on the holdout split. Using the official Standard track splits and F1-macro for model selection, our best systems achieved 88.9% (Speech) and 65.8% (Phoneme) on the leaderboard, surpassing the competition baselines and ranking within the top-10 in both tasks.

artificial intelligence, deep learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2512.01443

Country: Europe > Spain (0.28)

Genre: Research Report > Experimental Study (0.48)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

LibriBrain: Over 50 Hours of Within-Subject MEG to Improve Speech Decoding Methods at Scale

Özdogan, Miran, Landau, Gilad, Elvers, Gereon, Jayalath, Dulhan, Somaiya, Pratik, Mantegna, Francesco, Woolrich, Mark, Jones, Oiwi Parker

arXiv.org Artificial IntelligenceJun-4-2025

LibriBrain represents the largest single-subject MEG dataset to date for speech decoding, with over 50 hours of recordings -- 5$\times$ larger than the next comparable dataset and 50$\times$ larger than most. This unprecedented `depth' of within-subject data enables exploration of neural representations at a scale previously unavailable with non-invasive methods. LibriBrain comprises high-quality MEG recordings together with detailed annotations from a single participant listening to naturalistic spoken English, covering nearly the full Sherlock Holmes canon. Designed to support advances in neural decoding, LibriBrain comes with a Python library for streamlined integration with deep learning frameworks, standard data splits for reproducibility, and baseline results for three foundational decoding tasks: speech detection, phoneme classification, and word classification. Baseline experiments demonstrate that increasing training data yields substantial improvements in decoding performance, highlighting the value of scaling up deep, within-subject datasets. By releasing this dataset, we aim to empower the research community to advance speech decoding methodologies and accelerate the development of safe, effective clinical brain-computer interfaces.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2506.02098

Country: North America > United States (0.67)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

Understanding Probe Behaviors through Variational Bounds of Mutual Information

Choi, Kwanghee, Jung, Jee-weon, Watanabe, Shinji

arXiv.org Artificial IntelligenceDec-15-2023

With the success of self-supervised representations, researchers seek a better understanding of the information encapsulated within a representation. Among various interpretability methods, we focus on classification-based linear probing. We aim to foster a solid understanding and provide guidelines for linear probing by constructing a novel mathematical framework leveraging information theory. First, we connect probing with the variational bounds of mutual information (MI) to relax the probe design, equating linear probing with fine-tuning. Then, we investigate empirical behaviors and practices of probing through our mathematical framework. We analyze the layer-wise performance curve being convex, which seemingly violates the data processing inequality. However, we show that the intermediate representations can have the biggest MI estimate because of the tradeoff between better separability and decreasing MI. We further suggest that the margin of linearly separable representations can be a criterion for measuring the "goodness of representation." We also compare accuracy with MI as the measuring criteria. Finally, we empirically validate our claims by observing the self-supervised speech models on retaining word and phoneme information.

information, mi estimate, representation, (16 more...)

arXiv.org Artificial Intelligence

2312.10019

Country:

North America > United States > New York (0.04)
North America > United States > Massachusetts (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
(3 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

An Artificial Neural Network for Spatio-Temporal Bipolar Patterns: Application to Phoneme Classification

Neural Information Processing SystemsApr-6-2023, 20:08:07 GMT

An artificial neural network is developed to recognize spatio-temporal bipolar patterns associatively. The function of a formal neuron is generalized by replacing multiplication with convolution, weights with transfer functions, and thresholding with nonlinear transform following adaptation. The Hebbian learn(cid:173) ing rule and the delta learning rule are generalized accordingly, resulting in the learning of weights and delays. The neural network which was first developed for spatial patterns was thus generalized for spatio-temporal patterns. It was tested using a set of bipolar input patterns derived from speech signals, showing robust classification of 30 model phonemes.

artificial neural network, phoneme classification, spatio-temporal bipolar pattern, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Ensemble Methods for Phoneme Classification

Neural Information Processing SystemsApr-6-2023, 18:16:09 GMT

This paper investigates a number of ensemble methods for improv(cid:173) ing the performance of phoneme classification for use in a speech recognition system. Two ensemble methods are described; boosting and mixtures of experts, both in isolation and in combination. Re(cid:173) sults are presented on two speech recognition databases: an isolated word database and a large vocabulary continuous speech database. These results show that principled ensemble methods such as boost(cid:173) ing and mixtures provide superior performance to more naive en(cid:173) semble methods such as averaging.

cid, ensemble method, phoneme classification, (1 more...)

Neural Information Processing Systems

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Phoneme Classification using Constrained Variational Gaussian Process Dynamical System

Neural Information Processing SystemsApr-6-2023, 12:21:19 GMT

This paper describes a new acoustic model based on variational Gaussian process dynamical system (VGPDS) for phoneme classification. The proposed model overcomes the limitations of the classical HMM in modeling the real speech data, by adopting a nonlinear and nonparametric model. In our model, the GP prior on the dynamics function enables representing the complex dynamic structure of speech, while the GP prior on the emission function successfully models the global dependency over the observations. Additionally, we introduce variance constraint to the original VGPDS for mitigating sparse approximation error of the kernel matrix. The effectiveness of the proposed model is demonstrated with extensive experimental results including parameter estimation, classification performance on the synthetic and benchmark datasets.

phoneme classification, variational gaussian process dynamical system, vgpd

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.89)
Information Technology > Scientific Computing (0.67)
Information Technology > Modeling & Simulation (0.67)

Add feedback

Review -- Bidirectional LSTM

#artificialintelligenceAug-28-2022, 03:00:18 GMT

[2005 IJCNN] [Bidirectional LSTM (BLSTM)] Framewise Phoneme Classification with Bidirectional LSTM Networks [2005 ICANN] [Bidirectional LSTM (BLSTM)] Bidirectional LSTM Networks for Improved Phoneme…

bidirectional lstm, framewise phoneme classification, phoneme classification, (1 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Phoneme Classification using Constrained Variational Gaussian Process Dynamical System

Park, Hyunsin, Yun, Sungrack, Park, Sanghyuk, Kim, Jongmin, Yoo, Chang D.

Neural Information Processing SystemsFeb-14-2020, 23:26:41 GMT

phoneme classification, variational gaussian process dynamical system, vgpd

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.97)
Information Technology > Scientific Computing (0.67)
Information Technology > Modeling & Simulation (0.67)

Add feedback

Semi-supervised Learning with Sparse Autoencoders in Phone Classification

Dhaka, Akash Kumar, Salvi, Giampiero

arXiv.org Machine LearningOct-3-2016

We propose the application of a semi-supervised learning method to improve the performance of acoustic modelling for automatic speech recognition based on deep neural net- works. As opposed to unsupervised initialisation followed by supervised fine tuning, our method takes advantage of both unlabelled and labelled data simultaneously through mini- batch stochastic gradient descent. We tested the method with varying proportions of labelled vs unlabelled observations in frame-based phoneme classification on the TIMIT database. Our experiments show that the method outperforms standard supervised training for an equal amount of labelled data and provides competitive error rates compared to state-of-the-art graph-based semi-supervised learning techniques.

artificial intelligence, inductive learning, machine learning, (20 more...)

arXiv.org Machine Learning

1610.0052

Country:

North America > United States (0.14)
Europe > Sweden (0.14)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

Add feedback