Goto

Collaborating Authors

 Sangnier, Maxime


Nonparametric estimation of Hawkes processes with RKHSs

arXiv.org Machine Learning

Hawkes processes are a class of past-dependent point processes, widely used in many applications such as seismology [Ogata, 1988], criminology [Olinde and Short, 2020] and neuroscience [Reynaud-Bouret et al., 2013] for their ability to capture complex dependence structures. In their multidimensional version [Ogata, 1988], Hawkes processes can model pairwise interactions between different types of events, allowing to recover a connectivity graph between different features. Originally developed by Hawkes [1971] in order to model self-exciting phenomena, where each event increases the probability of a new event occurring, many extensions have been proposed ever since. In particular, nonlinear Hawkes processes have been introduced notably to detect inhibiting interactions, when an event can decrease the probability of another one appearing. Hawkes processes with inhibition are notoriously more complicated to handle due to the loss of many properties of linear Hawkes processes such as the cluster representation and the branching structure of the process [Hawkes and Oakes, 1974]. Since the first article on nonlinear Hawkes processes [Brémaud and Massoulié, 1996] proving in particular their existence, many works have focused on inhibition in the past few years. Among them, limit theorems have been established in [Costa et al., 2020] while Duval et al. [2022] obtained mean-field results on the behaviour of two neuronal populations. Regarding statistical inference, in the frequentist setting we can mention the exact maximum likelihood procedure of Bonnet et al. [2023], the least-squares approach by Bacry et al. [2020] and the nonparametric approach based on Bernstein-type polynomials by Lemonnier and Vayatis [2014]. While the first one proposes an exact inference procedure, it is restricted to exponential kernels.


Maximum Likelihood Estimation for Hawkes Processes with self-excitation or inhibition

arXiv.org Machine Learning

The Hawkes model is a point process observed on the real line, which generally corresponds to the time, where any previously encountered event has a direct influence on the chances of future events occurring. This past-dependent mathematical model was introduced in [1] and its first application was to model earthquakes occurrences [2, 3]. Since then, Hawkes processes have been widely used in various fields, for instance finance [4], social media [5, 6], epidemiology [7], sociology [8] and neuroscience [9]. The main advantage of Hawkes processes is their ability to model different kinds of relationships between phenomena through an unknown kernel or transfer function. The Hawkes model was originally introduced as a self-exciting point process where the appearance of an event increases the chances of another one triggering. Several estimation procedures have been proposed for the kernel function, both in parametric [2, 10, 11] and nonparametric [9, 12] frameworks. However, the inhibition setting, where the presence of an event decreases the chance of another occurring, has drawn less attention in the literature, although it can be of great interest in several fields, in particular in neuroscience [13]. In this inhibition context, the cluster representation [14] on which is based the construction of a self-exciting Hawkes process, is no longer valid.


Approximating Lipschitz continuous functions with GroupSort neural networks

arXiv.org Machine Learning

Recent advances in adversarial attacks and Wasserstein GANs have advocated for use of neural networks with restricted Lipschitz constants. Motivated by these observations, we study the recently introduced GroupSort neural networks, with constraints on the weights, and make a theoretical step towards a better understanding of their expressive power. We show in particular how these networks can represent any Lipschitz continuous piecewise linear functions. We also prove that they are well-suited for approximating Lipschitz continuous functions and exhibit upper bounds on both the depth and size. To conclude, the efficiency of GroupSort networks compared with more standard ReLU networks is illustrated in a set of synthetic experiments.


Some Theoretical Insights into Wasserstein GANs

arXiv.org Machine Learning

Generative Adversarial Networks (GANs) have been successful in producing outstanding results in areas as diverse as image, video, and text generation. Building on these successes, a large number of empirical studies have validated the benefits of the cousin approach called Wasserstein GANs (WGANs), which brings stabilization in the training process. In the present paper, we add a new stone to the edifice by proposing some theoretical advances in the properties of WGANs. First, we properly define the architecture of WGANs in the context of integral probability metrics parameterized by neural networks and highlight some of their basic mathematical features. We stress in particular interesting optimization properties arising from the use of a parametric 1-Lipschitz discriminator. Then, in a statistically-driven approach, we study the convergence of empirical WGANs as the sample size tends to infinity, and clarify the adversarial effects of the generator and the discrimi-nator by underlining some trade-off properties. These features are finally illustrated with experiments using both synthetic and real-world datasets.


Joint quantile regression in vector-valued RKHSs

Neural Information Processing Systems

Addressing the will to give a more complete picture than an average relationship provided by standard regression, a novel framework for estimating and predicting simultaneously several conditional quantiles is introduced. The proposed methodology leverages kernel-based multi-task learning to curb the embarrassing phenomenon of quantile crossing, with a one-step estimation procedure and no post-processing. Moreover, this framework comes along with theoretical guarantees and an efficient coordinate descent learning algorithm. Numerical experiments on benchmark and real datasets highlight the enhancements of our approach regarding the prediction error, the crossing occurrences and the training time. Papers published at the Neural Information Processing Systems Conference.


Accelerated proximal boosting

arXiv.org Machine Learning

Gradient boosting is a prediction method that iteratively combines weak learners to produce a complex and accurate model. From an optimization point of view, the learning procedure of gradient boosting mimics a gradient descent on a functional variable. This paper proposes to build upon the proximal point algorithm when the empirical risk to minimize is not differentiable. In addition, the novel boosting approach, called accelerated proximal boosting, benefits from Nesterov's acceleration in the same way as gradient boosting [Biau et al., 2018]. Advantages of leveraging proximal methods for boosting are illustrated by numerical experiments on simulated and real-world data. In particular, we exhibit a favorable comparison over gradient boosting regarding convergence rate and prediction accuracy.


Infinite-Task Learning with Vector-Valued RKHSs

arXiv.org Machine Learning

Machine learning has witnessed the tremendous success of solving tasks depending on a hyperparameter. While multi-task learning is celebrated for its capacity to solve jointly a finite number of tasks, learning a continuum of tasks for various loss functions is still a challenge. A promising approach, called Parametric Task Learning, has paved the way in the case of piecewise-linear loss functions. We propose a generic approach, called Infinite-Task Learning, to solve jointly a continuum of tasks via vector-valued RKHSs. We provide generalization guarantees to the suggested scheme and illustrate its efficiency in cost-sensitive classification, quantile regression and density level set estimation.


Joint quantile regression in vector-valued RKHSs

Neural Information Processing Systems

Addressing the will to give a more complete picture than an average relationship provided by standard regression, a novel framework for estimating and predicting simultaneously several conditional quantiles is introduced. The proposed methodology leverages kernel-based multi-task learning to curb the embarrassing phenomenon of quantile crossing, with a one-step estimation procedure and no post-processing. Moreover, this framework comes along with theoretical guarantees and an efficient coordinate descent learning algorithm. Numerical experiments on benchmark and real datasets highlight the enhancements of our approach regarding the prediction error, the crossing occurrences and the training time.