AITopics

2112.09097

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
Asia > China > Hong Kong (0.04)
(4 more...)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)

Thulke, David, Daheim, Nico, Dugast, Christian, Ney, Hermann

Adapting Document-Grounded Dialog Systems to Spoken Conversations using Data Augmentation and a Noisy Channel Model

arXiv.org Artificial IntelligenceDec-16-2021

This paper summarizes our submission to Task 2 of the second track of the 10th Dialog System Technology Challenge (DSTC10) "Knowledge-grounded Task-oriented Dialogue Modeling on Spoken Conversations". Similar to the previous year's iteration, the task consists of three subtasks: detecting whether a turn is knowledge seeking, selecting the relevant knowledge document and finally generating a grounded response. This year, the focus lies on adapting the system to noisy ASR transcripts. We explore different approaches to make the models more robust to this type of input and to adapt the generated responses to the style of spoken conversations. For the latter, we get the best results with a noisy channel model that additionally reduces the number of short and generic responses. Our best system achieved the 1st rank in the automatic and the 3rd rank in the human evaluation of the challenge.

computational linguistic, dialog, proceedings, (15 more...)

2112.08844

Country:

North America > United States > California > San Francisco County > San Francisco (0.04)
Asia > China > Hong Kong (0.04)
Europe > Germany > North Rhine-Westphalia > Cologne Region > Aachen (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.94)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.70)

#artificialintelligenceDec-15-2021, 21:50:09 GMT

Chinese TV introducing AI sign language presenter at the next Olympics

Chinese TV will introduce the first AI sign language presenter in time for the 2022 Winter Olympics in Beijing. China Central Television (CCTV) and Baidu AI Cloud said the launch of the AI sign language presenter represents a huge leap forwards in'overcoming the barrier of sound with technology'. Nearly 28 million people in China are hearing impaired and about 430 million around the world also suffer from hearing loss. The launch of the AI presenter will allow the state broadcaster to include sign language services for viewers around the clock, and will start by giving updates of the Winter Olympics in Beijing early next year. The presenter achieves high-level sign language expression thanks to Baidu's natural action engine and their sign language translation engine.

ai sign language presenter, chinese tv, olympic, (2 more...)

#artificialintelligence

Country: Asia > China > Beijing > Beijing (0.52)

Industry:

Leisure & Entertainment > Sports > Olympic Games (1.00)
Education > Curriculum > Subject-Specific Education (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.79)

Sosnowski, Witold, Wroblewska, Anna, Gawrysiak, Piotr

Applying SoftTriple Loss for Supervised Language Model Fine Tuning

Natural language processing (NLP) is a rapidly growing area of machine learning with applications wherever a computer needs to operate on a text that involves capturing its semantics. It may include text classification, translation, text summarization, question answering, dialogues. All these tasks are upstream and depend on the quality of the text representation (White et al., 2015). Many models can produce such text representations, from Bag-Of-Word or Word2Vec word embedding to the state-of-the-art language representation model BERT with variations in most NLP tasks. The best performance on text classification tasks is obtained when the model is first trained on a general knowledge corpus to capture semantic relationships between words and then fine-tuned with an additional dense layer on a domain corpus with cross-entropy loss (Radford et al., 2019). We introduce a new loss function TripleEntropy to improve classification performance for fine-tuning general knowledge pre-trained language models based on cross-entropy loss and SoftTriple loss (Devlin et al., 2018; Qian et al., 2019). Triplet Loss transforms the embedding space so that vector representations from the same class can form separable subspaces, stabilizing, and generalizing the language model fine-tuning process. TripleEntropy can improve the fine-tuning process of the RoBERTa based models so the performance on downstream task increases by about (0.02% - 2.29%).

artificial intelligence, machine learning, natural language, (14 more...)

doi: 10.15439/2022F185

2112.08462

Country: Europe > Poland > Masovia Province > Warsaw (0.05)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning

Wang, Zhecan, You, Haoxuan, Li, Liunian Harold, Zareian, Alireza, Park, Suji, Liang, Yiqing, Chang, Kai-Wei, Chang, Shih-Fu

Answering complex questions about images is an ambitious goal for machine intelligence, which requires a joint understanding of images, text, and commonsense knowledge, as well as a strong reasoning ability. Recently, multimodal Transformers have made great progress in the task of Visual Commonsense Reasoning (VCR), by jointly understanding visual objects and text tokens through layers of cross-modality attention. However, these approaches do not utilize the rich structure of the scene and the interactions between objects which are essential in answering complex commonsense questions. We propose a Scene Graph Enhanced Image-Text Learning (SGEITL) framework to incorporate visual scene graphs in commonsense reasoning. To exploit the scene graph structure, at the model structure level, we propose a multihop graph transformer for regularizing attention interaction among hops. As for pre-training, a scene-graph-aware pre-training method is proposed to leverage structure knowledge extracted in the visual scene graph. Moreover, we introduce a method to train and generate domain-relevant visual scene graphs using textual annotations in a weakly-supervised manner. Extensive experiments on VCR and other tasks show a significant performance boost compared with the state-of-the-art methods and prove the efficacy of each proposed component.

graph, scene graph, visual scene graph, (12 more...)

2112.08587

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.05)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)

Textless Speech-to-Speech Translation on Real Data

Lee, Ann, Gong, Hongyu, Duquenne, Paul-Ambroise, Schwenk, Holger, Chen, Peng-Jen, Wang, Changhan, Popuri, Sravya, Pino, Juan, Gu, Jiatao, Hsu, Wei-Ning

We present a textless speech-to-speech translation (S2ST) system that can translate speech from one language into another language and can be built without the need of any text data. Different from existing work in the literature, we tackle the challenge in modeling multi-speaker target speech and train the systems with real-world S2ST data. The key to our approach is a self-supervised unit-based speech normalization technique, which finetunes a pre-trained speech encoder with paired audios from multiple speakers and a single reference speaker to reduce the variations due to accents, while preserving the lexical content. With only 10 minutes of paired data for speech normalization, we obtain on average 3.2 BLEU gain when training the S2ST model on the \vp~S2ST dataset, compared to a baseline trained on un-normalized speech target. We also incorporate automatically mined S2ST data and show an additional 2.0 BLEU gain. To our knowledge, we are the first to establish a textless S2ST technique that can be trained with real-world data and works for multiple language pairs.

speech, speech normalizer, translation, (14 more...)

2112.08352

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Learning Cross-Lingual IR from an English Retriever

Li, Yulong, Franz, Martin, Sultan, Md Arafat, Iyer, Bhavani, Lee, Young-Suk, Sil, Avirup

We present a new cross-lingual information retrieval (CLIR) model trained using multi-stage knowledge distillation (KD). The teacher and the student are heterogeneous systems-the former is a pipeline that relies on machine translation and monolingual IR, while the latter executes a single CLIR operation. We show that the student can learn both multilingual representations and CLIR by optimizing two corresponding KD objectives. Learning multilingual representations from an English-only retriever is accomplished using a novel cross-lingual alignment algorithm that greedily re-positions the teacher tokens for alignment. Evaluation on the XOR-TyDi benchmark shows that the proposed model is far more effective than the existing approach of fine-tuning with cross-lingual labeled IR data, with a gain in accuracy of 25.4 Recall@5kt.

linear trans, representation, student, (14 more...)

2112.08185

Country: Europe > France (0.05)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.67)

Góis, António, Cho, Kyunghyun, Martins, André

Learning Non-Monotonic Automatic Post-Editing of Translations from Human Orderings

Recent research in neural machine translation has explored flexible generation orders, as an alternative to left-to-right generation. However, training non-monotonic models brings a new complication: how to search for a good ordering when there is a combinatorial explosion of orderings arriving at the same final result? Also, how do these automatic orderings compare with the actual behaviour of human translators? Current models rely on manually built biases or are left to explore all possibilities on their own. In this paper, we analyze the orderings produced by human post-editors and use them to train an automatic post-editing system. We compare the resulting system with those trained with left-to-right and random post-editing orderings. We observe that humans tend to follow a nearly left-to-right order, but with interesting deviations, such as preferring to start by correcting punctuation or verbs.

keystroke, left-to-right order, sequence, (13 more...)

2004.1412

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York (0.04)
Europe > Portugal > Lisbon > Lisbon (0.04)
(6 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

#artificialintelligenceDec-14-2021, 18:25:49 GMT

Azure AI empowers organizations to serve users in more than 100 languages

Microsoft announced today that 12 new languages and dialects have been added to Translator. These additions mean that the service can now translate between more than 100 languages and dialects, making information in text and documents accessible to 5.66 billion people worldwide. "One hundred languages is a good milestone for us to achieve our ambition for everyone to be able to communicate regardless of the language they speak," said Xuedong Huang, Microsoft technical fellow and Azure AI chief technology officer. Translator today covers the world's most spoken languages including English, Chinese, Hindi, Arabic and Spanish. In recent years, advances in AI technology have allowed the company to grow its language library with low-resource and endangered languages, such as Inuktitut, a dialect of Inuktut that is spoken by about 40,000 Inuit in Canada.

huang, language and dialect, translator, (11 more...)

#artificialintelligence

Country: North America > Canada (0.26)

Genre: Press Release (0.57)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.53)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.40)

Lee, Young-Suk, Astudillo, Ramon Fernandez, Hoang, Thanh Lam, Naseem, Tahira, Florian, Radu, Roukos, Salim

Maximum Bayes Smatch Ensemble Distillation for AMR Parsing

arXiv.org Artificial IntelligenceDec-14-2021

AMR parsing has experienced an unprecendented increase in performance in the last three years, due to a mixture of effects including architecture improvements and transfer learning. Self-learning techniques have also played a role in pushing performance forward. However, for most recent high performant parsers, the effect of self-learning and silver data generation seems to be fading. In this paper we show that it is possible to overcome this diminishing returns of silver data by combining Smatch-based ensembling techniques with ensemble distillation. In an extensive experimental setup, we push single model English parser performance above 85 Smatch for the first time and return to substantial gains. We also attain a new state-of-the-art for cross-lingual AMR parsing for Chinese, German, Italian and Spanish. Finally we explore the impact of the proposed distillation technique on domain adaptation, and show that it can produce gains rivaling those of human annotated data for QALD-9 and achieve a new state-of-the-art for BioAMR.

distillation, parser, proceedings, (14 more...)

2112.0779

Country:

Europe > Bulgaria > Sofia City Province > Sofia (0.04)
North America > United States > Maryland > Baltimore (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(3 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)