Not enough data to create a plot.
Try a different view from the menu above.
about the assumptions, related work, and evaluation. CONTENT
We thank all reviewers for their valuable time and feedback. Note that multiple recent works (offline and online) simply assume a linear MDP with known features in analysis. KL-divergence formulation to impose different distribution priors when available. R2, thank you very much for the suggestions. We agree about Section 3.3 and in retrospect should have saved We will remove it and move some of the Appendix into the paper.
Beyond Optimism: Exploration With Partially Observable Rewards
Exploration in reinforcement learning (RL) remains an open challenge. RL algorithms rely on observing rewards to train the agent, and if informative rewards are sparse the agent learns slowly or may not learn at all. To improve exploration and reward discovery, popular algorithms rely on optimism. But what if sometimes rewards are unobservable, e.g., situations of partial monitoring in bandits and the recent formalism of monitored Markov decision process? In this case, optimism can lead to suboptimal behavior that does not explore further to collapse uncertainty. With this paper, we present a novel exploration strategy that overcomes the limitations of existing methods and guarantees convergence to an optimal policy even when rewards are not always observable. We further propose a collection of tabular environments for benchmarking exploration in RL (with and without unobservable rewards) and show that our method outperforms existing ones.
Appendices A Masking distribution
When choosing which time-steps to mask, each latent speech representation in an utterance is considered a candidate starting time-step with probability p where M is the length of each masked span starting from the respective time step; both are hyper-parameters. Sampled starting time steps are expanded to length M and spans can overlap. For a 15 sec long audio sample, the average mask length is 14.7 time-steps, corresponding to 299ms of audio, with a median of 10 time-steps, and a maximum of about 100 time steps; about 49% of all time-steps in the sample will be masked. A plot of the corresponding mask length distribution is shown in Figure 2 and an ablation of M and p as well as the effect of other masking strategies is shown in Table 5. Reducing M results in increased prediction accuracy for the self-supervised but the task becomes trivial when spans with length one are masked, leading to poor performance on downstream speech recognition tasks. When masking without overlap, we choose starting time steps with p =0.037 which results in the total number of masked tokens to match the baseline.
92d1e1eb1cd6f9fba3227870bb6d7f07-AuthorFeedback.pdf
We thank the reviewers for their fruitful comments! Response to Reviewer 2: We predict characters for Librispeech/Libri-light. Thank you for the pointer! "when the official LibriSpeech LM... is incorporated into decoding, it is not clear whether the experiments still represent We will also try to make it more self-contained given the space restrictions. "I'm not convinced that this training works well conceptually." "... for ASR, we have a lot of transcribed data, and we can make a strong ASR model and perform transfer learning." "... how to extract K detractors." - The distractors are quantized latent speech representations sampled from masked If another masked time-step uses the same quantized latent, then it won't be sampled. "The paper would have been significantly different in terms of quality had you applied you approach to some standard This follows other recent work on semi-supervised methods for speech such as "Improved Noisy Student Training Synnaeve et al., 2020" which achieve some of the strongest results.
Listenable Maps for Zero-Shot Audio Classifiers
Interpreting the decisions of deep learning models, including audio classifiers, is crucial for ensuring the transparency and trustworthiness of this technology. In this paper, we introduce LMAC-ZS (Listenable Maps for Zero-Shot Audio Classifiers), which, to the best of our knowledge, is the first decoder-based post-hoc explanation method for explaining the decisions of zero-shot audio classifiers. The proposed method utilizes a novel loss function that aims to closely reproduce the original similarity patterns between text-and-audio pairs in the generated explanations. We provide an extensive evaluation using the Contrastive Language-Audio Pretraining (CLAP) model to showcase that our interpreter remains faithful to the decisions in a zero-shot classification context. Moreover, we qualitatively show that our method produces meaningful explanations that correlate well with different text prompts.
The Limits of Differential Privacy in Online Learning
Differential privacy (DP) is a formal notion that restricts the privacy leakage of an algorithm when running on sensitive data, in which privacy-utility trade-off is one of the central problems in private data analysis. In this work, we investigate the fundamental limits of differential privacy in online learning algorithms and present evidence that separates three types of constraints: no DP, pure DP, and approximate DP. We first describe a hypothesis class that is online learnable under approximate DP but not online learnable under pure DP under the adaptive adversarial setting. This indicates that approximate DP must be adopted when dealing with adaptive adversaries. We then prove that any private online learner must make an infinite number of mistakes for almost all hypothesis classes. This essentially generalizes previous results and shows a strong separation between private and non-private settings since a finite mistake bound is always attainable (as long as the class is online learnable) when there is no privacy requirement.
Improved Techniques for Training Score-Based Generative Models
Score-based generative models can produce high quality image samples comparable to GANs, without requiring adversarial optimization. However, existing training procedures are limited to images of low resolution (typically below 32 32), and can be unstable under some settings. We provide a new theoretical analysis of learning and sampling from score-based models in high dimensional spaces, explaining existing failure modes and motivating new solutions that generalize across datasets. To enhance stability, we also propose to maintain an exponential moving average of model weights. With these improvements, we can scale scorebased generative models to various image datasets, with diverse resolutions ranging from 64 64 to 256 256. Our score-based models can generate high-fidelity samples that rival best-in-class GANs on various image datasets, including CelebA, FFHQ, and several LSUN categories.
Symmetry-Informed Governing Equation Discovery
Despite the advancements in learning governing differential equations from observations of dynamical systems, data-driven methods are often unaware of fundamental physical laws, such as frame invariance. As a result, these algorithms may search an unnecessarily large space and discover less accurate or overly complex equations. In this paper, we propose to leverage symmetry in automated equation discovery to compress the equation search space and improve the accuracy and simplicity of the learned equations. Specifically, we derive equivariance constraints from the time-independent symmetries of ODEs. Depending on the types of symmetries, we develop a pipeline for incorporating symmetry constraints into various equation discovery algorithms, including sparse regression and genetic programming. In experiments across diverse dynamical systems, our approach demonstrates better robustness against noise and recovers governing equations with significantly higher probability than baselines without symmetry.