Goto

Collaborating Authors

 emnlp 2021



ML and NLP Research Highlights of 2021

#artificialintelligence

In this post, I will cover the papers and research areas that I found most inspiring. I tried to cover the papers that I was aware of but likely missed many relevant ones. Feel free to highlight them as well as ones that you found inspiring in the comments. Pre-trained models were applied in many different domains and started to be considered critical for ML research [1]. In computer vision, supervised pre-trained models such as Vision Transformer [2] have been scaled up [3] and self-supervised pre-trained models have started to match their performance [4]. The latter have been scaled beyond the controlled environment of ImageNet to random collections of images [5]. In speech, new models have been built based on wav2vec 2.0 [6] such as W2v-BERT [7] as well as more powerful multilingual models such as XLS-R [8]. At the same time, we saw new unified pre-trained models for previously under-researched modality pairs such as for videos and language [9] as well as speech and language [10]. In vision and language, controlled studies shed new light on important components of such multi-modal models [11][12].


EMNLP 2021 in tweets

AIHub

The Conference on Empirical Methods in Natural Language Processing (EMNLP 2021) took place from the 7th to the 11th of November both in Punta Cana and online. If you did not have time to check the papers and the keynotes at the main conference, here are the livetweeted keynotes and papers sorted by language. Live Notes of EMNLP 2021 #EMNLP2021 Keynote by Ido Dagan on 3 directions that #NLProc should pursue: https://t.co/LLeBjcffOP At #EMNLP2021 Evelina Fedorenko makes a strong case to defuse criticism that neural language models cannot "think". Neither can the human language modules in the brain, she argues, based on human brain studies.


GitHub - tanyuqian/ctc-gen-eval: EMNLP 2021 - CTC: A Unified Framework for Evaluating Natural Language Generation

#artificialintelligence

This repo contains code of an automatic evaluation metric described in the paper Compression, Transduction, and Creation: A Unified Framework for Evaluating Natural Language Generation Mingkai Deng*, Bowen Tan* (equal contribution), Zhengzhong Liu, Eric P. Xing, Zhiting Hu EMNLP 2021 Code to reproduce paper results can be found in the train/ folder.) We provide a command line interface (CLI) of CTC score as well as a python module. More details of these models can be found in our paper. We provide three scorers: StyleTransferScorer, SummarizationScorer, and DialogScorer. They can be used like this example below (see demo.py for more examples):


Knowledge Graphs @ EMNLP 2021

#artificialintelligence

If you are an experienced reader of such digests (or previous posts) then you know pretty well the abundance of KG-augmented LMs published at every conference and uploaded to arxiv weekly. If you feel lost -- I can assure you're not the only one. This year, we finally have a sound framework and taxonomy of various KG LM approaches! The authors define 3 big families: 1 no KG supervision, probing knowledge encoded in LM params with cloze-style prompts; 2 KG supervision with entities and IDs; 3 KG supervision with relation templates and surface forms. Each family has a few branches For instance, let's have a look at 4 entity-aware models illustrated below.