AITopics | van Miltenburg, Emiel

Collaborating Authors

van Miltenburg, Emiel

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Natural Language Generation

van Miltenburg, Emiel, Lin, Chenghua

arXiv.org Artificial IntelligenceMar-20-2025

This article provides a brief overview of the field of Natural Language Generation. The term Natural Language Generation (NLG), in its broadest definition, refers to the study of systems that verbalize some form of information through natural language. That information could be stored in a large database or knowledge graph (in data-to-text applications), but NLG researchers may also study summarisation (text-to-text) or image captioning (image-to-text), for example. As a subfield of Natural Language Processing, NLG is closely related to other sub-disciplines such as Machine Translation (MT) and Dialog Systems. Some NLG researchers exclude MT from their definition of the field, since there is no content selection involved where the system has to determine what to say. Conversely, dialog systems do not typically fall under the header of Natural Language Generation since NLG is just one component of dialog systems (the others being Natural Language Understanding and Dialog Management). However, with the rise of Large Language Models (LLMs), different subfields of Natural Language Processing have converged on similar methodologies for the production of natural language and the evaluation of automatically generated text.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2503.16728

Country:

Asia (0.69)
Europe > United Kingdom > England (0.14)
Europe > United Kingdom > Scotland (0.14)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Dual use issues in the field of Natural Language Generation

van Miltenburg, Emiel

arXiv.org Artificial IntelligenceJan-11-2025

This report documents the results of a recent survey in the SIGGEN community, focusing on Dual Use issues in Natural Language Generation (NLG). SIGGEN is the Special Interest Group (SIG) of the Association for Computational Linguistics (ACL) for researchers working on NLG. The survey was prompted by the ACL executive board, which asked all SIGs to provide an overview of dual use issues within their respective subfields. The survey was sent out in October 2024 and the results were processed in January 2025. With 23 respondents, the survey is presumably not representative of all SIGGEN members, but at least this document offers a helpful resource for future discussions. This report is open to feedback from the SIGGEN community. Let me know if you have any questions or comments!

artificial intelligence, dual use issue, natural language, (15 more...)

arXiv.org Artificial Intelligence

2501.06636

Country:

Europe > Sweden (0.14)
Europe > Germany (0.14)

Genre:

Research Report (1.00)
Questionnaire & Opinion Survey (1.00)
Overview (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Law (0.93)
Health & Medicine (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)

Add feedback

Image captioning in different languages

van Miltenburg, Emiel

arXiv.org Artificial IntelligenceMay-31-2024

This short position paper provides a manually curated list of non-English image captioning datasets (as of May 2024). Through this list, we can observe the dearth of datasets in different languages: only 23 different languages are represented. With the addition of the Crossmodal-3600 dataset (Thapliyal et al., 2022, 36 languages) this number increases somewhat, but still this number is tiny compared to the thousands of spoken languages that exist. This paper closes with some open questions for the field of Vision & Language.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2407.09495

Country:

Europe > Germany (0.15)
Europe > Spain (0.14)
Europe > Italy (0.14)
(5 more...)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Evaluating Task-oriented Dialogue Systems: A Systematic Review of Measures, Constructs and their Operationalisations

Braggaar, Anouck, Liebrecht, Christine, van Miltenburg, Emiel, Krahmer, Emiel

arXiv.org Artificial IntelligenceDec-21-2023

This review gives an extensive overview of evaluation methods for task-oriented dialogue systems, paying special attention to practical applications of dialogue systems, for example for customer service. The review (1) provides an overview of the used constructs and metrics in previous work, (2) discusses challenges in the context of dialogue system evaluation and (3) develops a research agenda for the future of dialogue system evaluation. We conducted a systematic review of four databases (ACL, ACM, IEEE and Web of Science), which after screening resulted in 122 studies. Those studies were carefully analysed for the constructs and methods they proposed for evaluation. We found a wide variety in both constructs and methods. Especially the operationalisation is not always clearly reported. We hope that future work will take a more critical approach to the operationalisation and specification of the used constructs. To work towards this aim, this review ends with recommendations for evaluation and suggestions for outstanding questions.

data mining, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2312.13871

Country:

Europe > United Kingdom > England (0.27)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Education (1.00)
Information Technology > Services (0.92)
(2 more...)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Human Computer Interaction (1.00)
Information Technology > Data Science > Data Mining (1.00)
(7 more...)

Add feedback

Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP

Belz, Anya, Thomson, Craig, Reiter, Ehud, Abercrombie, Gavin, Alonso-Moral, Jose M., Arvan, Mohammad, Braggaar, Anouck, Cieliebak, Mark, Clark, Elizabeth, van Deemter, Kees, Dinkar, Tanvi, Dušek, Ondřej, Eger, Steffen, Fang, Qixiang, Gao, Mingqi, Gatt, Albert, Gkatzia, Dimitra, González-Corbelle, Javier, Hovy, Dirk, Hürlimann, Manuela, Ito, Takumi, Kelleher, John D., Klubicka, Filip, Krahmer, Emiel, Lai, Huiyuan, van der Lee, Chris, Li, Yiru, Mahamood, Saad, Mieskes, Margot, van Miltenburg, Emiel, Mosteiro, Pablo, Nissim, Malvina, Parde, Natalie, Plátek, Ondřej, Rieser, Verena, Ruan, Jie, Tetreault, Joel, Toral, Antonio, Wan, Xiaojun, Wanner, Leo, Watson, Lewis, Yang, Diyi

arXiv.org Artificial IntelligenceAug-7-2023

We report our efforts in identifying a set of previous human evaluations in NLP that would be suitable for a coordinated study examining what makes human evaluations in NLP more/less reproducible. We present our results and findings, which include that just 13\% of papers had (i) sufficiently low barriers to reproduction, and (ii) enough obtainable information, to be considered for reproduction, and that all but one of the experiments we selected for reproduction was discovered to have flaws that made the meaningfulness of conducting a reproduction questionable. As a result, we had to change our coordinated study design from a reproduce approach to a standardise-then-reproduce-twice approach. Our overall (negative) finding that the great majority of human evaluations in NLP is not repeatable and/or not reproducible and/or too flawed to justify reproduction, paints a dire picture, but presents an opportunity for a rethink about how to design and report human evaluations in NLP.

artificial intelligence, experiment, natural language, (19 more...)

arXiv.org Artificial Intelligence

2305.01633

Country:

Europe (1.00)
North America > United States > Maine (0.14)
North America > United States > Illinois (0.14)
Asia > Japan > Honshū (0.14)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.88)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Communications > Social Media > Crowdsourcing (0.46)

Add feedback

Evaluating NLG systems: A brief introduction

van Miltenburg, Emiel

arXiv.org Artificial IntelligenceMar-29-2023

Summary This year the International Conference on Natural Language Generation (INLG) will feature an award for the paper with the best evaluation. The purpose of this award is to provide an incentive for NLG researchers to pay more attention to the way they assess the output of their systems. This essay provides a short introduction to evaluation in NLG, explaining key terms and distinctions. How can I evaluate my system? It is hard to say in general how you should evaluate your NLG system.

artificial intelligence, evaluation, natural language, (17 more...)

arXiv.org Artificial Intelligence

2303.16742

Country: Europe > United Kingdom (0.29)

Genre: Research Report > Experimental Study (0.47)

Technology: Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)

Add feedback

Implicit causality in GPT-2: a case study

Huynh, Hien, Lentz, Tomas O., van Miltenburg, Emiel

arXiv.org Artificial IntelligenceDec-8-2022

This case study investigates the extent to which a language model (GPT-2) is able to capture native speakers' intuitions about implicit causality in a sentence completion task. We first reproduce earlier results (showing lower surprisal values for pronouns that are congruent with either the subject or object, depending on which one corresponds to the implicit causality bias of the verb), and then examine the effects of gender and verb frequency on model performance. Our second study examines the reasoning ability of GPT-2: is the model able to produce more sensible motivations for why the subject VERBed the object if the verbs have stronger causality biases? We also developed a methodology to avoid human raters being biased by obscenities and disfluencies generated by the model.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2212.04348

Country: North America > United States > Minnesota (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics

Gehrmann, Sebastian, Adewumi, Tosin, Aggarwal, Karmanya, Ammanamanchi, Pawan Sasanka, Anuoluwapo, Aremu, Bosselut, Antoine, Chandu, Khyathi Raghavi, Clinciu, Miruna, Das, Dipanjan, Dhole, Kaustubh D., Du, Wanyu, Durmus, Esin, Dušek, Ondřej, Emezue, Chris, Gangal, Varun, Garbacea, Cristina, Hashimoto, Tatsunori, Hou, Yufang, Jernite, Yacine, Jhamtani, Harsh, Ji, Yangfeng, Jolly, Shailza, Kumar, Dhruv, Ladhak, Faisal, Madaan, Aman, Maddela, Mounica, Mahajan, Khyati, Mahamood, Saad, Majumder, Bodhisattwa Prasad, Martins, Pedro Henrique, McMillan-Major, Angelina, Mille, Simon, van Miltenburg, Emiel, Nadeem, Moin, Narayan, Shashi, Nikolaev, Vitaly, Niyongabo, Rubungo Andre, Osei, Salomey, Parikh, Ankur, Perez-Beltrachini, Laura, Rao, Niranjan Ramesh, Raunak, Vikas, Rodriguez, Juan Diego, Santhanam, Sashank, Sedoc, João, Sellam, Thibault, Shaikh, Samira, Shimorina, Anastasia, Cabezudo, Marco Antonio Sobrevilla, Strobelt, Hendrik, Subramani, Nishant, Xu, Wei, Yang, Diyi, Yerukola, Akhila, Zhou, Jiawei

arXiv.org Artificial IntelligenceFeb-3-2021

We introduce GEM, a living benchmark for natural language Generation (NLG), its Evaluation, and Metrics. Measuring progress in NLG relies on a constantly evolving ecosystem of automated metrics, datasets, and human evaluation standards. However, due to this moving target, new models often still evaluate on divergent anglo-centric corpora with well-established, but flawed, metrics. This disconnect makes it challenging to identify the limitations of current models and opportunities for progress. Addressing this limitation, GEM provides an environment in which models can easily be applied to a wide set of corpora and evaluation strategies can be tested. Regular updates to the benchmark will help NLG research become more multilingual and evolve the challenge alongside models. This paper serves as the description of the initial release for which we are organizing a shared task at our ACL 2021 Workshop and to which we invite the entire NLG community to participate.

artificial intelligence, deep learning, neural network, (21 more...)

arXiv.org Artificial Intelligence

2102.01672

Country:

Asia (1.00)
Europe > Germany (0.68)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
(3 more...)

Genre:

Research Report (0.50)
Questionnaire & Opinion Survey (0.46)

Industry: Education > Educational Setting > Higher Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback