AITopics | Clemencon, Stephan

Plotting

Clemencon, Stephan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Towards More Robust NLP System Evaluation: Handling Missing Scores in Benchmarks

Himmi, Anas, Irurozki, Ekhine, Noiry, Nathan, Clemencon, Stephan, Colombo, Pierre

arXiv.org Artificial IntelligenceMay-17-2023

The evaluation of natural language processing (NLP) systems is crucial for advancing the field, but current benchmarking approaches often assume that all systems have scores available for all tasks, which is not always practical. In reality, several factors such as the cost of running baseline, private systems, computational limitations, or incomplete data may prevent some systems from being evaluated on entire tasks. This paper formalize an existing problem in NLP research: benchmarking when some systems scores are missing on the task, and proposes a novel approach to address it. Our method utilizes a compatible partial ranking approach to impute missing data, which is then aggregated using the Borda count method. It includes two refinements designed specifically for scenarios where either task-level or instance-level scores are available. We also introduce an extended benchmark, which contains over 131 million scores, an order of magnitude larger than existing benchmarks. We validate our methods and demonstrate their effectiveness in addressing the challenge of missing system evaluation on an entire task. This work highlights the need for more comprehensive benchmarking approaches that can handle real-world scenarios where not all systems are evaluated on the entire task.

arxiv preprint arxiv, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2305.10284

Country:

Europe (1.00)
Asia > Middle East > UAE (0.14)
North America > United States > Maryland (0.14)

Genre: Research Report > New Finding (0.93)

Industry:

Information Technology (0.46)
Government > Voting & Elections (0.34)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)
(2 more...)

Add feedback

What are the best systems? New perspectives on NLP Benchmarking

Colombo, Pierre, Noiry, Nathan, Irurozki, Ekhine, Clemencon, Stephan

arXiv.org Artificial IntelligenceFeb-10-2022

In Machine Learning, a benchmark refers to an ensemble of datasets associated with one or multiple metrics together with a way to aggregate different systems performances. They are instrumental in (i) assessing the progress of new methods along different axes and (ii) selecting the best systems for practical use. This is particularly the case for NLP with the development of large pre-trained models (e.g. GPT, BERT) that are expected to generalize well on a variety of tasks. While the community mainly focused on developing new datasets and metrics, there has been little interest in the aggregation procedure, which is often reduced to a simple average over various performance measures. However, this procedure can be problematic when the metrics are on a different scale, which may lead to spurious conclusions. This paper proposes a new procedure to rank systems based on their performance across different tasks. Motivated by the social choice theory, the final system ordering is obtained through aggregating the rankings induced by each task and is theoretically grounded. We conduct extensive numerical experiments (on over 270k scores) to assess the soundness of our approach both on synthetic and real scores (e.g. GLUE, EXTREM, SEVAL, TAC, FLICKR). In particular, we show that our method yields different conclusions on state-of-the-art systems than the mean-aggregation procedure while being both more reliable and robust.

arxiv preprint arxiv, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2202.03799

Country:

Europe (0.67)
North America > United States > Maryland (0.14)

Genre: Research Report (1.00)

Industry: Information Technology > Services (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (0.86)

Add feedback

Autoencoding any Data through Kernel Autoencoders

Laforgue, Pierre, Clemencon, Stephan, d'Alche-Buc, Florence

arXiv.org Machine LearningMay-28-2018

This paper investigates a novel algorithmic approach to data representation based on kernel methods. Assuming the observations lie in a Hilbert space X, this work introduces a new formulation of Representation Learning, stated as a regularized empirical risk minimization problem over a class of composite functions. These functions are obtained by composing elementary mappings from vector-valued Reproducing Kernel Hilbert Spaces (vv-RKHSs), and the risk is measured by the expected distortion rate in the input space X. The proposed algorithms crucially rely on the form taken by the minimizers, revealed by a dedicated Representer Theorem. Beyond a first extension of the autoencoding scheme to possibly infinite dimensional Hilbert spaces, an important application of the introduced Kernel Autoencoders (KAEs) arises when X is assumed itself to be a RKHS: this makes it possible to extract finite dimensional representations from any kind of data. Numerical experiments on simulated data as well as real labeled graphs (molecules) provide empirical evidences of the performance attained by KAEs.

health & medicine, neural network, representation, (21 more...)

arXiv.org Machine Learning

1805.11028

Country: Europe > France (0.14)

Genre: Research Report (0.90)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

AUC Optimisation and Collaborative Filtering

Dhanjal, Charanpal, Gaudel, Romaric, Clemencon, Stephan

arXiv.org Machine LearningAug-25-2015

In recommendation systems, one is interested in the ranking of the predicted items as opposed to other losses such as the mean squared error. Although a variety of ways to evaluate rankings exist in the literature, here we focus on the Area Under the ROC Curve (AUC) as it widely used and has a strong theoretical underpinning. In practical recommendation, only items at the top of the ranked list are presented to the users. With this in mind, we propose a class of objective functions over matrix factorisations which primarily represent a smooth surrogate for the real AUC, and in a special case we show how to prioritise the top of the list. The objectives are differentiable and optimised through a carefully designed stochastic gradient-descent-based algorithm which scales linearly with the size of the data. In the special case of square loss we show how to improve computational complexity by leveraging previously computed measures. To understand theoretically the underlying matrix factorisation approaches we study both the consistency of the loss functions with respect to AUC, and generalisation using Rademacher theory. The resulting generalisation analysis gives strong motivation for the optimisation under study. Finally, we provide computation results as to the efficacy of the proposed method using synthetic and real data.

artificial intelligence, auc optimisation, machine learning, (13 more...)

arXiv.org Machine Learning

1508.06091

Country: Europe (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback