AITopics | Venturi, Daniele

Plotting

Venturi, Daniele

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Uncertainty propagation in feed-forward neural network models

Diamzon, Jeremy, Venturi, Daniele

arXiv.org Machine LearningMar-29-2025

We develop new uncertainty propagation methods for feed-forward neural network architectures with leaky ReLU activation functions subject to random perturbations in the input vectors. In particular, we derive analytical expressions for the probability density function (PDF) of the neural network output and its statistical moments as a function of the input uncertainty and the parameters of the network, i.e., weights and biases. A key finding is that an appropriate linearization of the leaky ReLU activation function yields accurate statistical results even for large perturbations in the input vectors. This can be attributed to the way information propagates through the network. We also propose new analytically tractable Gaussian copula surrogate models to approximate the full joint PDF of the neural network output. To validate our theoretical results, we conduct Monte Carlo simulations and a thorough error analysis on a multi-layer neural network representing a nonlinear integro-differential operator between two polynomial function spaces. Our findings demonstrate excellent agreement between the theoretical predictions and Monte Carlo simulations.

artificial intelligence, machine learning, neural network, (14 more...)

arXiv.org Machine Learning

2503.21059

Country: North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Malliavin-Bismut Score-based Diffusion Models

Mirafzali, Ehsan, Gupta, Utkarsh, Wyrod, Patrick, Proske, Frank, Venturi, Daniele, Marinescu, Razvan

arXiv.org Artificial IntelligenceMar-21-2025

We introduce a new framework that employs Malliavin calculus to derive explicit expressions for the score function -- i.e., the gradient of the log-density -- associated with solutions to stochastic differential equations (SDEs). Our approach integrates classical integration-by-parts techniques with modern tools, such as Bismut's formula and Malliavin calculus, to address linear and nonlinear SDEs. In doing so, we establish a rigorous connection between the Malliavin derivative, its adjoint (the Malliavin divergence or the Skorokhod integral), Bismut's formula, and diffusion generative models, thus providing a systematic method for computing $\nabla \log p_t(x)$. For the linear case, we present a detailed study proving that our formula is equivalent to the actual score function derived from the solution of the Fokker--Planck equation for linear SDEs. Additionally, we derive a closed-form expression for $\nabla \log p_t(x)$ for nonlinear SDEs with state-independent diffusion coefficients. These advancements provide fresh theoretical insights into the smoothness and structure of probability densities and practical implications for score-based generative modelling, including the design and analysis of new diffusion models. Moreover, our findings promote the adoption of the robust Malliavin calculus framework in machine learning research. These results directly apply to various pure and applied mathematics fields, such as generative modelling, the study of SDEs driven by fractional Brownian motion, and the Fokker--Planck equations associated with nonlinear SDEs.

artificial intelligence, machine learning, malliavin derivative, (17 more...)

arXiv.org Artificial Intelligence

2503.16917

Country:

Europe (0.67)
North America > United States > California (0.14)

Genre:

Research Report (1.00)
Workflow (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.67)

Add feedback

Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models

Zhang, Hanlin, Edelman, Benjamin L., Francati, Danilo, Venturi, Daniele, Ateniese, Giuseppe, Barak, Boaz

arXiv.org Artificial IntelligenceNov-14-2023

Watermarking generative models consists of planting a statistical signal (watermark) in a model's output so that it can be later verified that the output was generated by the given model. A strong watermarking scheme satisfies the property that a computationally bounded attacker cannot erase the watermark without causing significant quality degradation. In this paper, we study the (im)possibility of strong watermarking schemes. We prove that, under well-specified and natural assumptions, strong watermarking is impossible to achieve. This holds even in the private detection algorithm setting, where the watermark insertion and detection algorithms share a secret key, unknown to the attacker. To prove this result, we introduce a generic efficient watermark attack; the attacker is not required to know the private key of the scheme or even which scheme is used. Our attack is based on two assumptions: (1) The attacker has access to a "quality oracle" that can evaluate whether a candidate output is a high-quality response to a prompt, and (2) The attacker has access to a "perturbation oracle" which can modify an output with a nontrivial probability of maintaining quality, and which induces an efficiently mixing random walk on high-quality outputs. We argue that both assumptions can be satisfied in practice by an attacker with weaker computational capabilities than the watermarked model itself, to which the attacker has only black-box access. Furthermore, our assumptions will likely only be easier to satisfy over time as models grow in capabilities and modalities. We demonstrate the feasibility of our attack by instantiating it to attack three existing watermarking schemes for large language models: Kirchenbauer et al. (2023), Kuditipudi et al. (2023), and Zhao et al. (2023). The same attack successfully removes the watermarks planted by all three schemes, with only minor quality degradation.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2311.04378

Country:

North America > United States (1.00)
Europe (0.67)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(3 more...)

Add feedback

The Mori-Zwanzig formulation of deep learning

Venturi, Daniele, Li, Xiantao

arXiv.org Artificial IntelligenceMay-19-2023

We develop a new formulation of deep learning based on the Mori-Zwanzig (MZ) formalism of irreversible statistical mechanics. The new formulation is built upon the well-known duality between deep neural networks and discrete dynamical systems, and it allows us to directly propagate quantities of interest (conditional expectations and probability density functions) forward and backward through the network by means of exact linear operator equations. Such new equations can be used as a starting point to develop new effective parameterizations of deep neural networks, and provide a new framework to study deep-learning via operator theoretic methods. The proposed MZ formulation of deep learning naturally introduces a new concept, i.e., the memory of the neural network, which plays a fundamental role in low-dimensional modeling and parameterization. By using the theory of contraction mappings, we develop sufficient conditions for the memory of the neural network to decay with the number of layers. This allows us to rigorously transform deep networks into shallow ones, e.g., by reducing the number of neurons per layer (using projection operators), or by reducing the total number of layers (using the decay property of the memory operator).

artificial intelligence, machine learning, neural network, (16 more...)

arXiv.org Artificial Intelligence

2209.05544

Country: North America > United States (0.67)

Genre: Research Report (0.81)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback