Marcondes, Diego
Generalized Resubstitution for Regression Error Estimation
Marcondes, Diego, Braga-Neto, Ulisses
We propose generalized resubstitution error estimators for regression, a broad family of estimators, each corresponding to a choice of empirical probability measures and loss function. The usual sum of squares criterion is a special case corresponding to the standard empirical probability measure and the quadratic loss. Other choices of empirical probability measure lead to more general estimators with superior bias and variance properties. We prove that these error estimators are consistent under broad assumptions. In addition, procedures for choosing the empirical measure based on the method of moments and maximum pseudo-likelihood are proposed and investigated. Detailed experimental results using polynomial regression demonstrate empirically the superior finite-sample bias and variance properties of the proposed estimators. The R code for the experiments is provided.
The Lattice Overparametrization Paradigm for the Machine Learning of Lattice Operators
Marcondes, Diego, Barrera, Junior
The machine learning of lattice operators has three possible bottlenecks. From a statistical standpoint, it is necessary to design a constrained class of operators based on prior information with low bias, and low complexity relative to the sample size. From a computational perspective, there should be an efficient algorithm to minimize an empirical error over the class. From an understanding point of view, the properties of the learned operator need to be derived, so its behavior can be theoretically understood. The statistical bottleneck can be overcome due to the rich literature about the representation of lattice operators, but there is no general learning algorithm for them. In this paper, we discuss a learning paradigm in which, by overparametrizing a class via elements in a lattice, an algorithm for minimizing functions in a lattice is applied to learn. We present the stochastic lattice descent algorithm as a general algorithm to learn on constrained classes of operators as long as a lattice overparametrization of it is fixed, and we discuss previous works which are proves of concept. Moreover, if there are algorithms to compute the basis of an operator from its overparametrization, then its properties can be deduced and the understanding bottleneck is also overcome. This learning paradigm has three properties that modern methods based on neural networks lack: control, transparency and interpretability. Nowadays, there is an increasing demand for methods with these characteristics, and we believe that mathematical morphology is in a unique position to supply them. The lattice overparametrization paradigm could be a missing piece for it to achieve its full potential within modern machine learning.
Distribution-free Deviation Bounds of Learning via Model Selection with Cross-validation Risk Estimation
Marcondes, Diego, Peixoto, Clรกudia
Cross-validation techniques for risk estimation and model selection are widely used in statistics and machine learning. However, the understanding of the theoretical properties of learning via model selection with cross-validation risk estimation is quite low in face of its widespread use. In this context, this paper presents learning via model selection with cross-validation risk estimation as a general systematic learning framework within classical statistical learning theory and establishes distribution-free deviation bounds in terms of VC dimension, giving detailed proofs of the results and considering both bounded and unbounded loss functions. We also deduce conditions under which the deviation bounds of learning via model selection are tighter than that of learning via empirical risk minimization in the whole hypotheses space, supporting the better performance of model selection frameworks observed empirically in some instances.
Learning the hypotheses space from data through a U-curve algorithm: a statistically consistent complexity regularizer for Model Selection
Marcondes, Diego, Simonis, Adilson, Barrera, Junior
This paper proposes a data-driven systematic, consistent and non-exhaustive approach to Model Selection, that is an extension of the classical agnostic PAC learning model. In this approach, learning problems are modeled not only by a hypothesis space $\mathcal{H}$, but also by a Learning Space $\mathbb{L}(\mathcal{H})$, a poset of subspaces of $\mathcal{H}$, which covers $\mathcal{H}$ and satisfies a property regarding the VC dimension of related subspaces, that is a suitable algebraic search space for Model Selection algorithms. Our main contributions are a data-driven general learning algorithm to perform regularized Model Selection on $\mathbb{L}(\mathcal{H})$ and a framework under which one can, theoretically, better estimate a target hypothesis with a given sample size by properly modeling $\mathbb{L}(\mathcal{H})$ and employing high computational power. A remarkable consequence of this approach are conditions under which a non-exhaustive search of $\mathbb{L}(\mathcal{H})$ can return an optimal solution. The results of this paper lead to a practical property of Machine Learning, that the lack of experimental data may be mitigated by a high computational capacity. In a context of continuous popularization of computational power, this property may help understand why Machine Learning has become so important, even where data is expensive and hard to get.
Robust parameter estimation in dynamical systems via Statistical Learning with an application to epidemiological models
Marcondes, Diego
We propose a robust parameter estimation method for dynamical systems based on Statistical Learning techniques which aims to estimate a set of parameters that well fit the dynamics in order to obtain robust evidences about the qualitative behaviour of its trajectory. The method is quite general and flexible, since it dos not rely on any specific property of the dynamical system, and represents a mathematical formalisation of the procedure consisting of sampling and testing parameters, in which evolutions generated by candidate parameters are tested against observed data to assess goodness-of-fit. The Statistical Learning framework introduces a mathematically rigorous scheme to this general approach for parameter estimation, adding to the great field of parameter estimation in dynamical systems. The method is specially useful for estimating parameters in epidemiological compartmental models. We illustrate it in simulated and real data about COVID-19 spread in the US in order to assess qualitatively the peak of deaths by the disease.
Learning the Hypotheses Space from data Part I: Learning Space and U-curve Property
Marcondes, Diego, Simonis, Adilson, Barrera, Junior
The agnostic PAC learning model consists of: a Hypothesis Space $\mathcal{H}$, a probability distribution $P$, a sample complexity function $m_{\mathcal{H}}(\epsilon,\delta): [0,1]^{2} \mapsto \mathbb{Z}_{+}$ of precision $\epsilon$ and confidence $1 - \delta$, a finite i.i.d. sample $\mathcal{D}_{N}$, a cost function $\ell$ and a learning algorithm $\mathbb{A}(\mathcal{H},\mathcal{D}_{N})$, which estimates $\hat{h} \in \mathcal{H}$ that approximates a target function $h^{\star} \in \mathcal{H}$ seeking to minimize out-of-sample error. In this model, prior information is represented by $\mathcal{H}$ and $\ell$, while problem solution is performed through their instantiation in several applied learning models, with specific algebraic structures for $\mathcal{H}$ and corresponding learning algorithms. However, these applied models use additional important concepts not covered by the classic PAC learning theory: model selection and regularization. This paper presents an extension of this model which covers these concepts. The main principle added is the selection, based solely on data, of a subspace of $\mathcal{H}$ with a VC-dimension compatible with the available sample. In order to formalize this principle, the concept of Learning Space $\mathbb{L}(\mathcal{H})$, which is a poset of subsets of $\mathcal{H}$ that covers $\mathcal{H}$ and satisfies a property regarding the VC dimension of related subspaces, is presented as the natural search space for model selection algorithms. A remarkable result obtained on this new framework are conditions on $\mathbb{L}(\mathcal{H})$ and $\ell$ that lead to estimated out-of-sample error surfaces, which are true U-curves on $\mathbb{L}(\mathcal{H})$ chains, enabling a more efficient search on $\mathbb{L}(\mathcal{H})$. Hence, in this new framework, the U-curve optimization problem becomes a natural component of model selection algorithms.