AITopics | Rossi, Fabrice

Collaborating Authors

Rossi, Fabrice

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Identifying Obfuscated Code through Graph-Based Semantic Analysis of Binary Code

Cohen, Roxane, David, Robin, Yger, Florian, Rossi, Fabrice

arXiv.org Machine LearningApr-2-2025

Protecting sensitive program content is a critical issue in various situations, ranging from legitimate use cases to unethical contexts. Obfuscation is one of the most used techniques to ensure such protection. Consequently, attackers must first detect and characterize obfuscation before launching any attack against it. This paper investigates the problem of function-level obfuscation detection using graph-based approaches, comparing algorithms, from elementary baselines to promising techniques like GNN (Graph Neural Networks), on different feature choices. We consider various obfuscation types and obfuscators, resulting in two complex datasets. Our findings demonstrate that GNNs need meaningful features that capture aspects of function semantics to outperform baselines. Our approach shows satisfactory results, especially in a challenging 11-class classification task and in a practical malware analysis example.

data mining, machine learning, obfuscation, (17 more...)

arXiv.org Machine Learning

2504.01481

Country:

North America > United States (0.46)
Europe > France (0.29)

Genre: Research Report > New Finding (0.86)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)

Add feedback

Meta-survey on outlier and anomaly detection

Olteanu, Madalina, Rossi, Fabrice, Yger, Florian

arXiv.org Artificial IntelligenceDec-12-2023

The impact of outliers and anomalies on model estimation and data processing is of paramount importance, as evidenced by the extensive body of research spanning various fields over several decades: thousands of research papers have been published on the subject. As a consequence, numerous reviews, surveys, and textbooks have sought to summarize the existing literature, encompassing a wide range of methods from both the statistical and data mining communities. While these endeavors to organize and summarize the research are invaluable, they face inherent challenges due to the pervasive nature of outliers and anomalies in all data-intensive applications, irrespective of the specific application field or scientific discipline. As a result, the resulting collection of papers remains voluminous and somewhat heterogeneous. To address the need for knowledge organization in this domain, this paper implements the first systematic meta-survey of general surveys and reviews on outlier and anomaly detection. Employing a classical systematic survey approach, the study collects nearly 500 papers using two specialized scientific search engines. From this comprehensive collection, a subset of 56 papers that claim to be general surveys on outlier detection is selected using a snowball search technique to enhance field coverage. A meticulous quality assessment phase further refines the selection to a subset of 25 high-quality general surveys. Using this curated collection, the paper investigates the evolution of the outlier detection field over a 20-year period, revealing emerging themes and methods. Furthermore, an analysis of the surveys sheds light on the survey writing practices adopted by scholars from different communities who have contributed to this field. Finally, the paper delves into several topics where consensus has emerged from the literature. These include taxonomies of outlier types, challenges posed by high-dimensional data, the importance of anomaly scores, the impact of learning conditions, difficulties in benchmarking, and the significance of neural networks. Non-consensual aspects are also discussed, particularly the distinction between local and global outliers and the challenges in organizing detection methods into meaningful taxonomies.

data mining, detection, machine learning, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.neucom.2023.126634

2312.07101

Country:

Europe (1.00)
North America > United States > Massachusetts (0.28)
Asia > Middle East > Israel > Mediterranean Sea (0.24)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Mixture of von Mises-Fisher distribution with sparse prototypes

Rossi, Fabrice, Barbaro, Florian

arXiv.org Artificial IntelligenceDec-30-2022

Mixtures of von Mises-Fisher distributions can be used to cluster data on the unit hypersphere. This is particularly adapted for high-dimensional directional data such as texts. We propose in this article to estimate a von Mises mixture using a l 1 penalized likelihood. This leads to sparse prototypes that improve clustering interpretability. We introduce an expectation-maximisation (EM) algorithm for this estimation and explore the trade-off between the sparsity term and the likelihood one with a path following algorithm. The model's behaviour is studied on simulated data and, we show the advantages of the approach on real data benchmark. We also introduce a new data set on financial reports and exhibit the benefits of our method for exploratory analysis.

data mining, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.neucom.2022.05.118

2212.14591

Country:

Europe (0.92)
North America > United States > New York (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Banking & Finance > Trading (0.92)
Banking & Finance > Financial Services (0.66)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)

Add feedback

Fast and fully-automated histograms for large-scale data sets

Mendizábal, Valentina Zelaya, Boullé, Marc, Rossi, Fabrice

arXiv.org Artificial IntelligenceDec-27-2022

G-Enum histograms are a new fast and fully automated method for irregular histogram construction. By framing histogram construction as a density estimation problem and its automation as a model selection task, these histograms leverage the Minimum Description Length principle (MDL) to derive two different model selection criteria. Several proven theoretical results about these criteria give insights about their asymptotic behavior and are used to speed up their optimisation. These insights, combined to a greedy search heuristic, are used to construct histograms in linearithmic time rather than the polynomial time incurred by previous works. The capabilities of the proposed MDL density estimation method are illustrated with reference to other fully automated methods in the literature, both on synthetic and large real-world data sets.

artificial intelligence, histogram, machine learning, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.csda.2022.107668

2212.13524

Country: North America > United States > New York (0.28)

Genre: Research Report (1.00)

Add feedback

Challenges in anomaly and change point detection

Olteanu, Madalina, Rossi, Fabrice, Yger, Florian

arXiv.org Artificial IntelligenceDec-27-2022

This paper presents an introduction to the state-of-the-art in anomaly and change-point detection. On the one hand, the main concepts needed to understand the vast scientific literature on those subjects are introduced. On the other, a selection of important surveys and books, as well as two selected active research topics in the field, are presented.

data mining, detection, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2212.1352

Country: Europe (0.47)

Genre: Overview (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.39)

Add feedback

Co-clustering based exploratory analysis of mixed-type data tables

Bouchareb, Aichetou, Boullé, Marc, Clérot, Fabrice, Rossi, Fabrice

arXiv.org Artificial IntelligenceDec-22-2022

Co-clustering is a class of unsupervised data analysis techniques that extract the existing underlying dependency structure between the instances and variables of a data table as homogeneous blocks. Most of those techniques are limited to variables of the same type. In this paper, we propose a mixed data co-clustering method based on a two-step methodology. In the first step, all the variables are binarized according to a number of bins chosen by the analyst, by equal frequency discretization in the numerical case, or keeping the most frequent values in the categorical case. The second step applies a co-clustering to the instances and the binary variables, leading to groups of instances and groups of variable parts. We apply this methodology on several data sets and compare with the results of a Multiple Correspondence Analysis applied to the same data.

artificial intelligence, machine learning, variable part, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-030-18129-1_2

2212.11728

Country:

North America > United States (0.46)
Europe > France (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Model Based Co-clustering of Mixed Numerical and Binary Data

Bouchareb, Aichetou, Boullé, Marc, Clérot, Fabrice, Rossi, Fabrice

arXiv.org Artificial IntelligenceDec-22-2022

The goal of co-clustering is to jointly perform a clustering of rows and a clustering of columns of a data table. Proposed by [Good, 1965] then by [Hartigan, 1975], co-clustering is an extension of the standard clustering that extracts the underlying structure in the data in the form of clusters of row and clusters of columns. The advantage of this technique, over the standard clustering, lies in the joint (simultaneous) analysis of the rows and columns which enables extracting the maximum of information about the interdependence between the two entities. The utility of co-clustering lies in its capacity to create easily interpretable clusters and its capability to reduce a large data table into a significantly smaller matrix having the same structure as the orig-Aichetou Bouchareb, Marc Boullé and Fabrice Clérot: Orange Labs, 2 Avenue Pierre Marzin 22300 Lannion - France, e-mail: firstname.

artificial intelligence, data mining, machine learning, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-030-18129-1_1

2212.11725

Country: Europe > France (0.48)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Data Science > Data Mining (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)

Add feedback

Federated Learning -- Methods, Applications and beyond

Heusinger, Moritz, Raab, Christoph, Rossi, Fabrice, Schleif, Frank-Michael

arXiv.org Artificial IntelligenceDec-22-2022

In recent years the applications of machine learning models have increased rapidly, due to the large amount of available data and technological progress.While some domains like web analysis can benefit from this with only minor restrictions, other fields like in medicine with patient data are strongerregulated. In particular \emph{data privacy} plays an important role as recently highlighted by the trustworthy AI initiative of the EU or general privacy regulations in legislation. Another major challenge is, that the required training \emph{data is} often \emph{distributed} in terms of features or samples and unavailable for classicalbatch learning approaches. In 2016 Google came up with a framework, called \emph{Federated Learning} to solve both of these problems. We provide a brief overview on existing Methods and Applications in the field of vertical and horizontal \emph{Federated Learning}, as well as \emph{Fderated Transfer Learning}.

artificial intelligence, machine learning, survey article, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.14428/esann/2021.ES2021-4

2212.11729

Country:

Europe (1.00)
North America > United States (0.94)

Genre:

Research Report (0.50)
Overview (0.34)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Health Care Technology > Medical Record (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.31)

Add feedback

The State of the Art in Enhancing Trust in Machine Learning Models with the Use of Visualizations

Chatzimparmpas, A., Martins, R., Jusufi, I., Kucher, K., Rossi, Fabrice, Kerren, A.

arXiv.org Artificial IntelligenceDec-22-2022

Machine learning (ML) models are nowadays used in complex applications in various domains, such as medicine, bioinformatics, and other sciences. Due to their black box nature, however, it may sometimes be hard to understand and trust the results they provide. This has increased the demand for reliable visualization tools related to enhancing trust in ML models, which has become a prominent topic of research in the visualization community over the past decades. To provide an overview and present the frontiers of current research on the topic, we present a State-of-the-Art Report (STAR) on enhancing trust in ML models with the use of interactive visualization. We define and describe the background of the topic, introduce a categorization for visualization techniques that aim to accomplish this goal, and discuss insights and opportunities for future research directions. Among our contributions is a categorization of trust against different facets of interactive ML, expanded and improved from previous research. Our results are investigated from different analytical perspectives: (a) providing a statistical overview, (b) summarizing key findings, (c) performing topic analyses, and (d) exploring the data sets used in the individual papers, all with the support of an interactive web-based survey browser. We intend this survey to be beneficial for visualization researchers whose interests involve making ML models more trustworthy, as well as researchers and practitioners from other disciplines in their search for effective visualization techniques suitable for solving their tasks with confidence and conveying meaning to their data.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1111/cgf.14034

2212.11737

Country:

Europe (1.00)
North America > United States > New York (0.45)
North America > United States > California (0.27)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
(6 more...)

Technology:

Information Technology > Visualization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(5 more...)

Add feedback

Improved Algorithm for the Network Alignment Problem with Application to Binary Diffing

Mengin, Elie, Rossi, Fabrice

arXiv.org Machine LearningDec-31-2021

In this paper, we present a novel algorithm to address the Network Alignment problem. It is inspired from a previous message passing framework of Bayati et al. [2] and includes several modifications designed to significantly speed up the message updates as well as to enforce their convergence. Experiments show that our proposed model outperforms other state-of-the-art solvers. Finally, we propose an application of our method in order to address the Binary Diffing problem. We show that our solution provides better assignment than the reference differs in almost all submitted instances and outline the importance of leveraging the graphical structure of binary programs.

artificial intelligence, assignment, machine learning, (14 more...)

arXiv.org Machine Learning

doi: 10.1016/j.procs.2021.08.099

2112.15336

Country:

Europe (1.00)
North America > United States > Pennsylvania (0.14)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback