AITopics | Vaiter, Samuel

Collaborating Authors

Vaiter, Samuel

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Learning Theory for Kernel Bilevel Optimization

Khoury, Fares El, Pauwels, Edouard, Vaiter, Samuel, Arbel, Michael

arXiv.org Artificial IntelligenceFeb-12-2025

Bilevel optimization has emerged as a technique for addressing a wide range of machine learning problems that involve an outer objective implicitly determined by the minimizer of an inner problem. In this paper, we investigate the generalization properties for kernel bilevel optimization problems where the inner objective is optimized over a Reproducing Kernel Hilbert Space. This setting enables rich function approximation while providing a foundation for rigorous theoretical analysis. In this context, we establish novel generalization error bounds for the bilevel problem under finite-sample approximation. Our approach adopts a functional perspective, inspired by (Petrulionyte et al., 2024), and leverages tools from empirical process theory and maximal inequalities for degenerate $U$-processes to derive uniform error bounds. These generalization error estimates allow to characterize the statistical accuracy of gradient-based methods applied to the empirical discretization of the bilevel problem.

artificial intelligence, inequality, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2502.08457

Country: Europe > France > Provence-Alpes-Côte d'Azur (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)

Add feedback

CHANI: Correlation-based Hawkes Aggregation of Neurons with bio-Inspiration

Jaffard, Sophie, Vaiter, Samuel, Reynaud-Bouret, Patricia

arXiv.org Machine LearningMay-29-2024

The present work aims at proving mathematically that a neural network inspired by biology can learn a classification task thanks to local transformations only. In this purpose, we propose a spiking neural network named CHANI (Correlation-based Hawkes Aggregation of Neurons with bio-Inspiration), whose neurons activity is modeled by Hawkes processes. Synaptic weights are updated thanks to an expert aggregation algorithm, providing a local and simple learning rule. We were able to prove that our network can learn on average and asymptotically. Moreover, we demonstrated that it automatically produces neuronal assemblies in the sense that the network can encode several classes and that a same neuron in the intermediate layers might be activated by more than one class, and we provided numerical simulations on synthetic dataset. This theoretical approach contrasts with the traditional empirical validation of biologically inspired networks and paves the way for understanding how local learning rules enable neurons to form assemblies able to represent complex concepts.

artificial intelligence, machine learning, neuron, (17 more...)

arXiv.org Machine Learning

2405.18828

Country: Europe > France > Provence-Alpes-Côte d'Azur (0.14)

Genre: Research Report (0.63)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.67)
Energy > Oil & Gas (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Derivatives of Stochastic Gradient Descent

Iutzeler, Franck, Pauwels, Edouard, Vaiter, Samuel

arXiv.org Artificial IntelligenceMay-24-2024

The differentiation of iterative algorithms has been a subject of research since the 1990s (Gilbert, 1992; Christianson, 1994; Beck, 1994), and was succinctly described as "piggyback differentiation" by Griewank and Faure (2003). This idea has gained renewed interest within the machine learning community, particularly for applications such as hyperparameter optimization (Maclaurin et al., 2015; Franceschi et al., 2017), metalearning (Finn et al., 2017; Rajeswaran et al., 2019), and learning discretization of total variation (Chambolle and Pock, 2021; Bogensperger et al., 2022). When applied to an optimization problem, an important theoretical concern is the convergence of the derivatives of iterates to the derivatives of the solution. Traditional guarantees focus on asymptotic convergence to the solution derivative, as described by the implicit function theorem (Gilbert, 1992; Christianson, 1994; Beck, 1994). This issue has inspired recent works for smooth optimization algorithms (Mehmood and Ochs, 2020, 2022), generic nonsmooth iterations (Bolte et al., 2022), and second-order methods (Bolte et al., 2023).

artificial intelligence, differentiation, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2405.15894

Country: Europe > France > Provence-Alpes-Côte d'Azur (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.88)

Add feedback

Convergence of Message Passing Graph Neural Networks with Generic Aggregation On Large Random Graphs

Cordonnier, Matthieu, Keriven, Nicolas, Tremblay, Nicolas, Vaiter, Samuel

arXiv.org Artificial IntelligenceJul-13-2023

We study the convergence of message passing graph neural networks on random graph models to their continuous counterpart as the number of nodes tends to infinity. Until now, this convergence was only known for architectures with aggregation functions in the form of normalized means, or, equivalently, of an application of classical operators like the adjacency matrix or the graph Laplacian. We extend such results to a large class of aggregation functions, that encompasses all classically used message passing graph neural networks, such as attention-based message passing, max convolutional message passing or (degree-normalized) convolutional message passing. Under mild assumptions, we give non-asymptotic bounds with high probability to quantify this convergence. Our main result is based on the McDiarmid inequality. Interestingly, this result does not apply to the case where the aggregation is a coordinate-wise maximum. We treat this case separately and obtain a different convergence rate.

artificial intelligence, graph, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2304.1114

Country:

North America > United States > Rhode Island (0.14)
Europe > United Kingdom > England (0.14)
Europe > France > Provence-Alpes-Côte d'Azur (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Architecture > Distributed Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

On the Robustness of Text Vectorizers

Catellier, Rémi, Vaiter, Samuel, Garreau, Damien

arXiv.org Artificial IntelligenceJun-12-2023

A fundamental issue in machine learning is the robustness of the model with respect to changes in the input. In natural language processing, models typically contain a first embedding layer, transforming a sequence of tokens into vector representations. While the robustness with respect to changes of continuous inputs is well-understood, the situation is less clear when considering discrete changes, for instance replacing a word by another in an input sentence. Our work formally proves that popular embedding schemes, such as concatenation, TF-IDF, and Paragraph Vector (a.k.a. doc2vec), exhibit robustness in the H\"older or Lipschitz sense with respect to the Hamming distance. We provide quantitative bounds for these schemes and demonstrate how the constants involved are affected by the length of the document. These findings are exemplified through a series of numerical examples.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2303.07203

Country:

Europe > France (0.46)
North America > United States > Hawaii (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.88)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

What functions can Graph Neural Networks compute on random graphs? The role of Positional Encoding

Keriven, Nicolas, Vaiter, Samuel

arXiv.org Artificial IntelligenceMay-24-2023

We aim to deepen the theoretical understanding of Graph Neural Networks (GNNs) on large graphs, with a focus on their expressive power. Existing analyses relate this notion to the graph isomorphism problem, which is mostly relevant for graphs of small sizes, or studied graph classification or regression tasks, while prediction tasks on nodes are far more relevant on large graphs. Recently, several works showed that, on very general random graphs models, GNNs converge to certains functions as the number of nodes grows. In this paper, we provide a more complete and intuitive description of the function space generated by equivariant GNNs for node-tasks, through general notions of convergence that encompass several previous examples. We emphasize the role of input node features, and study the impact of node Positional Encodings (PEs), a recent line of work that has been shown to yield state-of-the-art results in practice. Through the study of several examples of PEs on large random graphs, we extend previously known universality results to significantly more general models. Our theoretical results hint at some normalization tricks, which is shown numerically to have a positive impact on GNN generalization on synthetic and real data. Our proofs contain new concentration inequalities of independent interest.

artificial intelligence, graph, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2305.14814

Country:

North America > United States (0.28)
Europe > France (0.28)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

One-step differentiation of iterative algorithms

Bolte, Jérôme, Pauwels, Edouard, Vaiter, Samuel

arXiv.org Artificial IntelligenceMay-23-2023

Differentiating the solution of a machine learning problem is a important task, e.g., in hyperparameters optimization [9], in neural architecture search [26] and when using convex layers [3]. There are two main ways to achieve this goal: automatic differentiation (AD) and implicit differentiation (ID). Automatic differentiation implements the idea of evaluating derivatives through the compositional rules of differential calculus in a user-transparent way. It is a mature concept [23] implemented in several machine learning frameworks [31, 16, 1]. However, the time and memory complexity incurred may become prohibitive as soon as the computational graph becomes bigger, a typical example being unrolling iterative optimization algorithms such as gradient descent [5]. The alternative, implicit differentiation, is not always accessible: it does not solely relies on the compositional rules of differential calculus and usually requires solving a linear system. The user needs to implement custom rules in an automatic differentiation framework (as done, for example, in [4]) or use dedicated libraries such as [11, 3, 10] implementing these rules for given models. Provided that the implementation is carefully done, this is most of the time the gold standard for the task of differentiating problem solutions.

artificial intelligence, differentiation, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2305.13768

Country: Europe > France (0.47)

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

A Lower Bound and a Near-Optimal Algorithm for Bilevel Empirical Risk Minimization

Dagréou, Mathieu, Moreau, Thomas, Vaiter, Samuel, Ablin, Pierre

arXiv.org Artificial IntelligenceApr-18-2023

Bilevel optimization problems, which are problems where two optimization problems are nested, have more and more applications in machine learning. In many practical cases, the upper and the lower objectives correspond to empirical risk minimization problems and therefore have a sum structure. In this context, we propose a bilevel extension of the celebrated SARAH algorithm. We demonstrate that the algorithm requires $\mathcal{O}((n+m)^{\frac12}\varepsilon^{-1})$ gradient computations to achieve $\varepsilon$-stationarity with $n+m$ the total number of samples, which improves over all previous bilevel algorithms. Moreover, we provide a lower bound on the number of oracle calls required to get an approximate stationary point of the objective function of the bilevel problem. This lower bound is attained by our algorithm, which is therefore optimal in terms of sample complexity.

algorithm, artificial intelligence, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2302.08766

Country: Europe > France (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.86)

Add feedback

Gradient scarcity with Bilevel Optimization for Graph Learning

Ghanem, Hashem, Vaiter, Samuel, Keriven, Nicolas

arXiv.org Artificial IntelligenceMar-24-2023

A common issue in graph learning under the semi-supervised setting is referred to as gradient scarcity. That is, learning graphs by minimizing a loss on a subset of nodes causes edges between unlabelled nodes that are far from labelled ones to receive zero gradients. The phenomenon was first described when optimizing the graph and the weights of a Graph Neural Network (GCN) with a joint optimization algorithm. In this work, we give a precise mathematical characterization of this phenomenon, and prove that it also emerges in bilevel optimization, where additional dependency exists between the parameters of the problem. While for GCNs gradient scarcity occurs due to their finite receptive field, we show that it also occurs with the Laplacian regularization model, in the sense that gradients amplitude decreases exponentially with distance to labelled nodes. To alleviate this issue, we study several solutions: we propose to resort to latent graph learning using a Graph-to-Graph model (G2G), graph regularization to impose a prior structure on the graph, or optimizing on a larger graph than the original one with a reduced diameter. Our experiments on synthetic and real datasets validate our analysis and prove the efficiency of the proposed solutions.

artificial intelligence, graph, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2303.13964

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

The derivatives of Sinkhorn-Knopp converge

Pauwels, Edouard, Vaiter, Samuel

arXiv.org Machine LearningAug-3-2022

We show that the derivatives of the Sinkhorn-Knopp algorithm, or iterative proportional fitting procedure, converge towards the derivatives of the entropic regularization of the optimal transport problem with a locally uniform linear convergence rate.

artificial intelligence, machine learning, optimization problem, (16 more...)

arXiv.org Machine Learning

2207.12717

Country: Europe > France (0.46)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback