Goto

Collaborating Authors

 incertitude


Towards a rigorous evaluation of RAG systems: the challenge of due diligence

Martinon, Grégoire, de Brionne, Alexandra Lorenzo, Bohard, Jérôme, Lojou, Antoine, Hervault, Damien, Brunel, Nicolas J-B.

arXiv.org Artificial Intelligence

The rise of generative AI, has driven significant advancements in high-risk sectors like healthcare and finance. The Retrieval-Augmented Generation (RAG) architecture, combining language models (LLMs) with search engines, is particularly notable for its ability to generate responses from document corpora. Despite its potential, the reliability of RAG systems in critical contexts remains a concern, with issues such as hallucinations persisting. This study evaluates a RAG system used in due diligence for an investment fund. We propose a robust evaluation protocol combining human annotations and LLM-Judge annotations to identify system failures, like hallucinations, off-topic, failed citations, and abstentions. Inspired by the Prediction Powered Inference (PPI) method, we achieve precise performance measurements with statistical guarantees. We provide a comprehensive dataset for further analysis. Our contributions aim to enhance the reliability and scalability of RAG systems evaluation protocols in industrial applications.


Inductive randomness predictors

Vovk, Vladimir

arXiv.org Artificial Intelligence

This paper introduces inductive randomness predictors, which form a superset of inductive conformal predictors. Its focus is on a very simple special case, binary inductive randomness predictors. It is interesting that binary inductive randomness predictors have an advantage over inductive conformal predictors, although they also have a serious disadvantage. This advantage will allow us to reach the surprising conclusion that non-trivial inductive conformal predictors are inadmissible in the sense of statistical decision theory.


Classification des S{\'e}ries Temporelles Incertaines par Transformation Shapelet

Mbouopda, Michael, Nguifo, Engelbert Mephu

arXiv.org Artificial Intelligence

Time serie classification is used in a diverse range of domain such as meteorology, medicine and physics. It aims to classify chronological data. Many accurate approaches have been built during the last decade and shapelet transformation is one of them. However, none of these approaches does take data uncertainty into account. Using uncertainty propagation techiniques, we propose a new dissimilarity measure based on euclidean distance. We also show how to use this new measure to adapt shapelet transformation to uncertain time series classification. An experimental assessment of our contribution is done on some state of the art datasets.


Learning by Transduction

Gammerman, Alex, Vovk, Volodya, Vapnik, Vladimir

arXiv.org Machine Learning

We describe a method for predicting a classification of an object given classifications of the objects in the training set, assuming that the pairs object/classification are generated by an i.i.d. process from a continuous probability distribution. Our method is a modification of Vapnik's support-vector machine; its main novelty is that it gives not only the prediction itself but also a practicable measure of the evidence found in support of that prediction. We also describe a procedure for assigning degrees of confidence to predictions made by the support vector machine. Some experimental results are presented, and possible extensions of the algorithms are discussed.