AITopics | Sarfjoo, Seyyed Saeed

Collaborating Authors

Sarfjoo, Seyyed Saeed

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ATCO2 corpus: A Large-Scale Dataset for Research on Automatic Speech Recognition and Natural Language Understanding of Air Traffic Control Communications

Zuluaga-Gomez, Juan, Veselý, Karel, Szöke, Igor, Blatt, Alexander, Motlicek, Petr, Kocour, Martin, Rigault, Mickael, Choukri, Khalid, Prasad, Amrutha, Sarfjoo, Seyyed Saeed, Nigmatulina, Iuliia, Cevenini, Claudia, Kolčárek, Pavel, Tart, Allan, Černocký, Jan, Klakow, Dietrich

arXiv.org Artificial IntelligenceJun-15-2023

Personal assistants, automatic speech recognizers and dialogue understanding systems are becoming more critical in our interconnected digital world. A clear example is air traffic control (ATC) communications. ATC aims at guiding aircraft and controlling the airspace in a safe and optimal manner. These voice-based dialogues are carried between an air traffic controller (ATCO) and pilots via very-high frequency radio channels. In order to incorporate these novel technologies into ATC (low-resource domain), large-scale annotated datasets are required to develop the data-driven AI systems. Two examples are automatic speech recognition (ASR) and natural language understanding (NLU). In this paper, we introduce the ATCO2 corpus, a dataset that aims at fostering research on the challenging ATC field, which has lagged behind due to lack of annotated data. The ATCO2 corpus covers 1) data collection and pre-processing, 2) pseudo-annotations of speech data, and 3) extraction of ATC-related named entities. The ATCO2 corpus is split into three subsets. 1) ATCO2-test-set corpus contains 4 hours of ATC speech with manual transcripts and a subset with gold annotations for named-entity recognition (callsign, command, value). 2) The ATCO2-PL-set corpus consists of 5281 hours of unlabeled ATC data enriched with automatic transcripts from an in-domain speech recognizer, contextual information, speaker turn information, signal-to-noise ratio estimate and English language detection score per sample. Both available for purchase through ELDA at http://catalog.elra.info/en-us/repository/browse/ELRA-S0484. 3) The ATCO2-test-set-1h corpus is a one-hour subset from the original test set corpus, that we are offering for free at https://www.atco2.org/data. We expect the ATCO2 corpus will foster research on robust ASR and NLU not only in the field of ATC communications but also in the general research community.

corpus, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2211.04054

Country:

Europe (1.00)
Asia > Middle East > Qatar (0.14)
North America > United States > Pennsylvania (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre:

Research Report (0.50)
Overview (0.48)

Industry:

Transportation > Air (1.00)
Transportation > Infrastructure & Services > Airport (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Effectiveness of Text, Acoustic, and Lattice-based representations in Spoken Language Understanding tasks

Villatoro-Tello, Esaú, Madikeri, Srikanth, Zuluaga-Gomez, Juan, Sharma, Bidisha, Sarfjoo, Seyyed Saeed, Nigmatulina, Iuliia, Motlicek, Petr, Ivanov, Alexei V., Ganapathiraju, Aravind

arXiv.org Artificial IntelligenceMar-17-2023

In this paper, we perform an exhaustive evaluation of different representations to address the intent classification problem in a Spoken Language Understanding (SLU) setup. We benchmark three types of systems to perform the SLU intent detection task: 1) text-based, 2) lattice-based, and a novel 3) multimodal approach. Our work provides a comprehensive analysis of what could be the achievable performance of different state-of-the-art SLU systems under different circumstances, e.g., automatically- vs. manually-generated transcripts. We evaluate the systems on the publicly available SLURP spoken language resource corpus. Our results indicate that using richer forms of Automatic Speech Recognition (ASR) outputs, namely word-consensus-networks, allows the SLU system to improve in comparison to the 1-best setup (5.5% relative improvement). However, crossmodal approaches, i.e., learning from acoustic and text embeddings, obtains performance similar to the oracle setup, a relative improvement of 17.8% over the 1-best configuration, being a recommended alternative to overcome the limitations of working with automatically generated transcripts.

artificial intelligence, representation, speech recognition, (15 more...)

arXiv.org Artificial Intelligence

2212.08489

Country:

Europe > Switzerland (0.47)
North America > United States (0.28)

Genre: Research Report > New Finding (0.34)

Industry: Education (0.54)

Technology: Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)

Add feedback