AITopics | Kaffee, Lucie-Aimée

Collaborating Authors

Kaffee, Lucie-Aimée

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Local Differences, Global Lessons: Insights from Organisation Policies for International Legislation

Kaffee, Lucie-Aimée, Atanasova, Pepa, Rogers, Anna

arXiv.org Artificial IntelligenceFeb-19-2025

The rapid adoption of AI across diverse domains has led to the development of organisational guidelines that vary significantly, even within the same sector. This paper examines AI policies in two domains, news organisations and universities, to understand how bottom-up governance approaches shape AI usage and oversight. By analysing these policies, we identify key areas of convergence and divergence in how organisations address risks such as bias, privacy, misinformation, and accountability. We then explore the implications of these findings for international AI legislation, particularly the EU AI Act, highlighting gaps where practical policy insights could inform regulatory refinements. Our analysis reveals that organisational policies often address issues such as AI literacy, disclosure practices, and environmental impact, areas that are underdeveloped in existing international frameworks. We argue that lessons from domain-specific AI policies can contribute to more adaptive and effective AI governance at the global level. This study provides actionable recommendations for policymakers seeking to bridge the gap between local AI practices and international regulations.

large language model, machine learning, news organisation, (20 more...)

arXiv.org Artificial Intelligence

2503.05737

Country:

Europe (1.00)
North America > United States (0.67)

Genre:

Overview (1.00)
Research Report > Experimental Study (0.87)

Industry:

Media > News (1.00)
Law > Intellectual Property & Technology Law (1.00)
Information Technology > Security & Privacy (1.00)
(2 more...)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(3 more...)

Add feedback

Presumed Cultural Identity: How Names Shape LLM Responses

Pawar, Siddhesh, Arora, Arnav, Kaffee, Lucie-Aimée, Augenstein, Isabelle

arXiv.org Artificial IntelligenceFeb-17-2025

Names are deeply tied to human identity. They can serve as markers of individuality, cultural heritage, and personal history. However, using names as a core indicator of identity can lead to over-simplification of complex identities. When interacting with LLMs, user names are an important point of information for personalisation. Names can enter chatbot conversations through direct user input (requested by chatbots), as part of task contexts such as CV reviews, or as built-in memory features that store user information for personalisation. We study biases associated with names by measuring cultural presumptions in the responses generated by LLMs when presented with common suggestion-seeking queries, which might involve making assumptions about the user. Our analyses demonstrate strong assumptions about cultural identity associated with names present in LLM generations across multiple cultures. Our work has implications for designing more nuanced personalisation systems that avoid reinforcing stereotypes while maintaining meaningful customisation.

assertion, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2502.11995

Country:

North America (1.00)
Africa (0.70)
Europe > United Kingdom (0.67)
Asia > Middle East (0.47)

Genre: Research Report (1.00)

Industry:

Government > Immigration & Customs (0.46)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.46)
Government > Regional Government (0.46)
Information Technology > Security & Privacy (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

Investigating Wit, Creativity, and Detectability of Large Language Models in Domain-Specific Writing Style Adaptation of Reddit's Showerthoughts

Buz, Tolga, Frost, Benjamin, Genchev, Nikola, Schneider, Moritz, Kaffee, Lucie-Aimée, de Melo, Gerard

arXiv.org Artificial IntelligenceMay-2-2024

Recent Large Language Models (LLMs) have shown the ability to generate content that is difficult or impossible to distinguish from human writing. We investigate the ability of differently-sized LLMs to replicate human writing style in short, creative texts in the domain of Showerthoughts, thoughts that may occur during mundane activities. We compare GPT-2 and GPT-Neo fine-tuned on Reddit data as well as GPT-3.5 invoked in a zero-shot manner, against human-authored texts. We measure human preference on the texts across the specific dimensions that account for the quality of creative, witty texts. Additionally, we compare the ability of humans versus fine-tuned RoBERTa classifiers to detect AI-generated texts. We conclude that human evaluators rate the generated texts slightly worse on average regarding their creative quality, but they are unable to reliably distinguish between human-written and AI-generated texts. We further provide a dataset for creative, witty text generation based on Reddit Showerthoughts posts.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2405.0166

Country: Europe > Germany (0.28)

Genre:

Questionnaire & Opinion Survey (1.00)
Research Report > New Finding (0.93)

Industry: Media > News (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

Add feedback

Thorny Roses: Investigating the Dual Use Dilemma in Natural Language Processing

Kaffee, Lucie-Aimée, Arora, Arnav, Talat, Zeerak, Augenstein, Isabelle

arXiv.org Artificial IntelligenceOct-30-2023

Dual use, the intentional, harmful reuse of technology and scientific artefacts, is a problem yet to be well-defined within the context of Natural Language Processing (NLP). However, as NLP technologies continue to advance and become increasingly widespread in society, their inner workings have become increasingly opaque. Therefore, understanding dual use concerns and potential ways of limiting them is critical to minimising the potential harms of research and development. In this paper, we conduct a survey of NLP researchers and practitioners to understand the depth and their perspective of the problem as well as to assess existing available support. Based on the results of our survey, we offer a definition of dual use that is tailored to the needs of the NLP community. The survey revealed that a majority of researchers are concerned about the potential dual use of their research but only take limited action toward it. In light of the survey results, we discuss the current state and potential means for mitigating dual use in NLP and propose a checklist that can be integrated into existing conference ethics-frameworks, e.g., the ACL ethics checklist.

artificial intelligence, dual use, natural language, (20 more...)

arXiv.org Artificial Intelligence

2304.08315

Country:

Europe (1.00)
North America > United States (0.93)

Genre:

Questionnaire & Opinion Survey (1.00)
Research Report > Experimental Study (0.67)

Industry:

Government (1.00)
Law (0.68)
Information Technology > Security & Privacy (0.46)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.93)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)

Add feedback

Probing Pre-Trained Language Models for Cross-Cultural Differences in Values

Arora, Arnav, Kaffee, Lucie-Aimée, Augenstein, Isabelle

arXiv.org Artificial IntelligenceApr-6-2023

Language embeds information about social, cultural, and political values people hold. Prior work has explored social and potentially harmful biases encoded in Pre-Trained Language models (PTLMs). However, there has been no systematic study investigating how values embedded in these models vary across cultures. In this paper, we introduce probes to study which values across cultures are embedded in these models, and whether they align with existing theories and cross-cultural value surveys. We find that PTLMs capture differences in values across cultures, but those only weakly align with established value surveys. We discuss implications of using mis-aligned models in cross-cultural settings, as well as ways of aligning PTLMs with value surveys.

computational linguistic, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2203.13722

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre:

Questionnaire & Opinion Survey (1.00)
Research Report > New Finding (0.93)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Assessing the quality of sources in Wikidata across languages: a hybrid approach

Amaral, Gabriel, Piscopo, Alessandro, Kaffee, Lucie-Aimée, Rodrigues, Odinaldo, Simperl, Elena

arXiv.org Artificial IntelligenceSep-20-2021

Wikidata is one of the most important sources of structured data on the web, built by a worldwide community of volunteers. As a secondary source, its contents must be backed by credible references; this is particularly important as Wikidata explicitly encourages editors to add claims for which there is no broad consensus, as long as they are corroborated by references. Nevertheless, despite this essential link between content and references, Wikidata's ability to systematically assess and assure the quality of its references remains limited. To this end, we carry out a mixed-methods study to determine the relevance, ease of access, and authoritativeness of Wikidata references, at scale and in different languages, using online crowdsourcing, descriptive statistics, and machine learning. Building on previous work of ours, we run a series of microtasks experiments to evaluate a large corpus of references, sampled from Wikidata triples with labels in several languages. We use a consolidated, curated version of the crowdsourced assessments to train several machine learning models to scale up the analysis to the whole of Wikidata. The findings help us ascertain the quality of references in Wikidata, and identify common challenges in defining and capturing the quality of user-generated multilingual structured data on the web. We also discuss ongoing editorial practices, which could encourage the use of higher-quality references in a more immediate way. All data and code used in the study are available on GitHub for feedback and further improvement and deployment by the research community.

crowdsourcing, neural network, wikidata, (24 more...)

arXiv.org Artificial Intelligence

2109.09405

Country:

Europe (1.00)
North America > United States > Hawaii (0.14)
North America > United States > Massachusetts (0.14)
North America > United States > Louisiana (0.14)

Genre: Research Report > New Finding (0.93)

Industry:

Media > News (0.46)
Health & Medicine > Pharmaceuticals & Biotechnology (0.45)
Government > Regional Government (0.45)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Communications > Web (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Neural Wikipedian: Generating Textual Summaries from Knowledge Base Triples

Vougiouklis, Pavlos, Elsahar, Hady, Kaffee, Lucie-Aimée, Gravier, Christoph, Laforest, Frederique, Hare, Jonathon, Simperl, Elena

arXiv.org Artificial IntelligenceOct-31-2017

Most people do not interact with Semantic Web data directly. Unless they have the expertise to understand the underlying technology, they need textual or visual interfaces to help them make sense of it. We explore the problem of generating natural language summaries for Semantic Web data. This is non-trivial, especially in an open-domain context. To address this problem, we explore the use of neural networks. Our system encodes the information from a set of triples into a vector of fixed dimensionality and generates a textual summary by conditioning the output on the encoded vector. We train and evaluate our models on two corpora of loosely aligned Wikipedia snippets and DBpedia and Wikidata triples with promising results.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.websem.2018.07.002

1711.00155

Country:

North America > United States (1.00)
Europe > France (0.68)
Europe > United Kingdom > England (0.28)
(2 more...)

Genre: Research Report (0.64)

Industry:

Media > Music (0.93)
Media > Film (0.69)
Leisure & Entertainment > Sports > Soccer (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.93)
Information Technology > Communications > Web (0.87)

Add feedback