mbox
The Sample Complexity of Parameter-Free Stochastic Convex Optimization
Lawrence, Jared, Kalinsky, Ari, Bradfield, Hannah, Carmon, Yair, Hinder, Oliver
We study the sample complexity of stochastic convex optimization when problem parameters, e.g., the distance to optimality, are unknown. We pursue two strategies. First, we develop a reliable model selection method that avoids overfitting the validation set. This method allows us to generically tune the learning rate of stochastic optimization methods to match the optimal known-parameter sample complexity up to $\log\log$ factors. Second, we develop a regularization-based method that is specialized to the case that only the distance to optimality is unknown. This method provides perfect adaptability to unknown distance to optimality, demonstrating a separation between the sample and computational complexity of parameter-free stochastic convex optimization. Combining these two methods allows us to simultaneously adapt to multiple problem structures. Experiments performing few-shot learning on CIFAR-10 by fine-tuning CLIP models and prompt engineering Gemini to count shapes indicate that our reliable model selection method can help mitigate overfitting to small validation sets.
EmotioNet Challenge
This track requires the identification of 12 action units (AUs). The AUs included in the challenge are: 1, 2, 4, 5, 6, 9, 12, 17, 20, 25, 26, 43. Training data: The EmotioNet database includes 950,000 images with annotated AUs. These were annotated with the algorithm described in [1]. You can train your system using this set.
Tutorial #5: variational autoencoders
The goal of the variational autoencoder (VAE) is to learn a probability distribution $Pr(\mathbf{x})$ over a multi-dimensional variable $\mathbf{x}$. There are two main reasons for modelling distributions. First, we might want to draw samples (generate) from the distribution to create new plausible values of $\mathbf{x}$. Second, we might want to measure the likelihood that a new vector $\mathbf{x} {*}$ was created by this probability distribution. In fact, it turns out that the variational autoencoder is well-suited to the former task but not for the latter. It is common to talk about the variational autoencoder as if it is the model of $Pr(\mathbf{x})$. However, this is misleading; the variational autoencoder is a neural architecture that is designed to help learn the model for $Pr(\mathbf{x})$.
random kitchen sinks as approximation to kernel machine
The dimension of $w$ does not make sense to me. In order to approximate the kernel function with sufficient accuracy, we need to use a high number of $D$. It gives me a feeling of a high risk of overfitting the model and my model is appeared to be overfitting when I am trying to make use of it. Isn't the true approximation to the kernel machine should be I think the author is trying to fit a linear model in the feature space (as $z(x)$ is the feature map) rather than the standard kernel trick which does not need to evaluate the feature map. But I don't understand why the author do not need to compute the sample average of $K$ (or do something similar to $z$)? The implementation here is also fitting a model of $D$ parameter, no averaging step is done, which makes me quite confusing.
Chapter 29 Smoothing Introduction to Data Science
Before continuing learning about machine learning algorithms, we introduce the important concept of smoothing. Smoothing is a very powerful technique used all across data analysis. Other names given to this technique are curve fitting and low pass filtering. It is designed to detect trends in the presence of noisy data in cases in which the shape of the trend is unknown. The smoothing name comes from the fact that to accomplish this feat, we assume that the trend is smooth, as in a smooth surface.
Lessons from Bayesian disease diagnosis: Don't over-interpret the Bayes factor, VERSION 2
This revision has corrected derivations, new R/JAGS code, and new diagrams.] Overview "Captain, the prior probability of this character dying and leaving the show is infinitesimal." A primary example of Bayes' rule is for disease diagnosis (or illicit drug screening). The example is invoked routinely to explain the importance of prior probabilities. Here's one version of it: Suppose a diagnostic test has a 97% detection rate and a 5% false alarm rate.
Automating Quantified Multimodal Logics in Simple Type Theory -- A Case Study
This paper presents a case study in quantified multimodal logics. An interesting aspect of this case study is that off the shelf theorem provers and model generators for simple type theory, that is, classical higher-order logic, are employed to automate problems in quantified multimodal logics, that is, nonclassical logics. This is enabled by our recent embedding of normal quantified multimodal logics in simple type theory [8, 10], which is sound and complete [10]. Interestingly, not only reasoning within various nonclassical logics can be automated this way but also reasoning about them. For example, the equivalence between different properties of accessibility relations and their associated multimodal axioms can be proved automatically.