AITopics | van Dijck, Gijs

Collaborating Authors

van Dijck, Gijs

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Adoption of Watermarking for Generative AI Systems in Practice and Implications under the new EU AI Act

Rijsbosch, Bram, van Dijck, Gijs, Kollnig, Konrad

arXiv.org Artificial IntelligenceMar-23-2025

AI-generated images have become so good in recent years that individuals cannot distinguish them any more from "real" images. This development creates a series of societal risks, and challenges our perception of what is true and what is not, particularly with the emergence of "deep fakes" that impersonate real individuals. Watermarking, a technique that involves embedding identifying information within images to indicate their AI-generated nature, has emerged as a primary mechanism to address the risks posed by AI-generated images. The implementation of watermarking techniques is now becoming a legal requirement in many jurisdictions, including under the new 2024 EU AI Act. Despite the widespread use of AI image generation systems, the current status of watermarking implementation remains largely unexamined. Moreover, the practical implications of the AI Act's watermarking requirements have not previously been studied. The present paper therefore both provides an empirical analysis of 50 of the most widely used AI systems for image generation, and embeds this empirical analysis into a legal analysis of the AI Act. We identify four categories of generative AI image systems relevant under the AI Act, outline the legal obligations for each category, and find that only a minority number of providers currently implement adequate watermarking practices.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.18156

Country:

Europe (0.68)
North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

ColBERT-XM: A Modular Multi-Vector Representation Model for Zero-Shot Multilingual Information Retrieval

Louis, Antoine, Saxena, Vageesh, van Dijck, Gijs, Spanakis, Gerasimos

arXiv.org Artificial IntelligenceFeb-22-2024

State-of-the-art neural retrievers predominantly focus on high-resource languages like English, which impedes their adoption in retrieval scenarios involving other languages. Current approaches circumvent the lack of high-quality labeled data in non-English languages by leveraging multilingual pretrained language models capable of cross-lingual transfer. However, these models require substantial task-specific fine-tuning across multiple languages, often perform poorly in languages with minimal representation in the pretraining corpus, and struggle to incorporate new languages after the pretraining phase. In this work, we present a novel modular dense retrieval model that learns from the rich data of a single high-resource language and effectively zero-shot transfers to a wide array of languages, thereby eliminating the need for language-specific labeled data. Our model, ColBERT-XM, demonstrates competitive performance against existing state-of-the-art multilingual retrievers trained on more extensive datasets in various languages. Further analysis reveals that our modular approach is highly data-efficient, effectively adapts to out-of-distribution data, and significantly reduces energy consumption and carbon emissions. By demonstrating its proficiency in zero-shot scenarios, ColBERT-XM marks a shift towards more sustainable and inclusive retrieval systems, enabling effective information accessibility in numerous languages. We publicly release our code and models for the community.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2402.15059

Country:

North America > United States (0.28)
Europe > Netherlands (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Energy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Interpretable Long-Form Legal Question Answering with Retrieval-Augmented Large Language Models

Louis, Antoine, van Dijck, Gijs, Spanakis, Gerasimos

arXiv.org Artificial IntelligenceSep-29-2023

Many individuals are likely to face a legal dispute at some point in their lives, but their lack of understanding of how to navigate these complex issues often renders them vulnerable. The advancement of natural language processing opens new avenues for bridging this legal literacy gap through the development of automated legal aid systems. However, existing legal question answering (LQA) approaches often suffer from a narrow scope, being either confined to specific legal domains or limited to brief, uninformative responses. In this work, we propose an end-to-end methodology designed to generate long-form answers to any statutory law questions, utilizing a "retrieve-then-read" pipeline. To support this approach, we introduce and release the Long-form Legal Question Answering (LLeQA) dataset, comprising 1,868 expert-annotated legal questions in the French language, complete with detailed answers rooted in pertinent legal provisions. Our experimental results demonstrate promising performance on automatic evaluation metrics, but a qualitative analysis uncovers areas for refinement. As one of the only comprehensive, expert-annotated long-form LQA dataset, LLeQA has the potential to not only accelerate research towards resolving a significant real-world issue, but also act as a rigorous benchmark for evaluating NLP models in specialized domains. We publicly release our code, data, and models.

computational linguistic, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2309.1705

Country: Europe > Belgium (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Law (1.00)
Government > Regional Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Finding the Law: Enhancing Statutory Article Retrieval via Graph Neural Networks

Louis, Antoine, van Dijck, Gijs, Spanakis, Gerasimos

arXiv.org Artificial IntelligenceJan-30-2023

Statutory article retrieval (SAR), the task of retrieving statute law articles relevant to a legal question, is a promising application of legal text processing. In particular, high-quality SAR systems can improve the work efficiency of legal professionals and provide basic legal assistance to citizens in need at no cost. Unlike traditional ad-hoc information retrieval, where each document is considered a complete source of information, SAR deals with texts whose full sense depends on complementary information from the topological organization of statute law. While existing works ignore these domain-specific dependencies, we propose a novel graph-augmented dense statute retriever (G-DSR) model that incorporates the structure of legislation via a graph neural network to improve dense retrieval performance. Experimental results show that our approach outperforms strong retrieval baselines on a real-world expert-annotated SAR dataset.

computational linguistic, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2301.12847

Country: Europe (0.67)

Genre: Research Report > New Finding (0.34)

Industry:

Law (1.00)
Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback