Reviews: Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems

Oct-8-2024, 06:42:30 GMT–Neural Information Processing Systems

The authors conduct an analysis of CTC trained acoustic models to determine how information related to phonetic categories is preserved in CTC-based models which directly output graphemes. The work follows a long line of research that has analyzed neural network representations to determine how they model phonemic representations, although to the best of my knowledge this has not been done previously for CTC-based end-to-end architectures. The results and analysis presented by the authors is interesting, although there are some concerns I have with the conclusions that the authors draw that I would like to clarify these points. Please see my detailed comments below. In the paper, the authors conclude that (Line 159--164) "... after the 5th recurrent layer accuracy goes down again. One possible explanation to this may be that higher layers in the model are more sensitive to long distance information that is needed for the speech recognition task, whereas the local information which is needed for classifying phones is better captured in lower layers."

analyzing hidden representation, end-to-end automatic speech recognition system, information, (7 more...)

Neural Information Processing Systems

Oct-8-2024, 06:42:30 GMT

Conferences Web Page

Add feedback

Country:
- North America > United States > Arizona > Maricopa County > Scottsdale (0.05)

Technology:
- Information Technology > Artificial Intelligence
  - Speech > Speech Recognition (1.00)
  - Machine Learning > Learning Graphical Models
    - Undirected Networks > Markov Models (0.31)