Rademacher Complexity of Neural ODEs via Chen-Fliess Series
Hanson, Joshua, Raginsky, Maxim
–arXiv.org Artificial Intelligence
Several recent works have examined continuous-depth idealizations of deep neural nets, viewing them as continuous-time ordinary differential equation (ODE) models with either fixed or time-varying parameters. Traditional discrete-layer nets can be recovered by applying an appropriate temporal discretization scheme, e.g., the Euler or Runge-Kutta methods. In applications, this perspective has resulted in advantages concerning regularization (Kelly et al., 2020; Kobyzev et al., 2021; Pal et al., 2021), efficient parameterization (Queiruga et al., 2020), convergence speed (Chen et al., 2023), applicability to non-uniform data (Sahin and Kozat, 2019), among others. As a theoretical tool, continuous-depth idealizations have lead to better understanding of the contribution of depth to model expressiveness and generalizability (Marion, 2023; Massaroli et al., 2020), new or improved training strategies via framing as an optimal control problem (Corbett and Kangin, 2022), and novel model variations (Jia and Benson, 2019; Peluchetti and Favaro, 2020). Considered as generic control systems, continuous-depth nets can admit a number of distinct inputoutput configurations depending on how the control system "anatomy" is delegated. Controlled neural ODEs (Kidger et al., 2020) and continuous-time recurrent neural nets (Fermanian et al., 2021) treat the (time-varying) control signal as the input to the model; the initial condition is either fixed or treated as a trainable parameter; the (time-varying) output signal is the model output; and any free parameters of the vector fields (weights) are held constant in time.
arXiv.org Artificial Intelligence
Jan-30-2024