Goto

Collaborating Authors

 Machine Learning



Masked Pre-training Enables Universal Zero-shot Denoiser 1 Yi Jin

Neural Information Processing Systems

In this work, we observe that model trained on vast general images via masking strategy, has been naturally embedded with their distribution knowledge, thus spontaneously attains the underlying potential for strong image denoising. Based on this observation, we propose a novel zero-shot denoising paradigm, i.e., Masked Pre-train then Iterative fill (MPI). MPI first trains model via masking and then employs pre-trained weight for high-quality zero-shot image denoising on a single noisy image. Concretely, MPI comprises two key procedures: 1) Masked Pre-training involves training model to reconstruct massive natural images with random masking for generalizable representations, gathering the potential for valid zero-shot denoising on images with varying noise degradation and even in distinct image types.


Learning dynamic polynomial proofs

Neural Information Processing Systems

Polynomial inequalities lie at the heart of many mathematical disciplines. In this paper, we consider the fundamental computational task of automatically searching for proofs of polynomial inequalities. We adopt the framework of semi-algebraic proof systems that manipulate polynomial inequalities via elementary inference rules that infer new inequalities from the premises. These proof systems are known to be very powerful, but searching for proofs remains a major difficulty. In this work, we introduce a machine learning based method to search for a dynamic proof within these proof systems. We propose a deep reinforcement learning framework that learns an embedding of the polynomials and guides the choice of inference rules, taking the inherent symmetries of the problem as an inductive bias. We compare our approach with powerful and widely-studied linear programming hierarchies based on static proof systems, and show that our method reduces the size of the linear program by several orders of magnitude while also improving performance. These results hence pave the way towards augmenting powerful and well-studied semi-algebraic proof systems with machine learning guiding strategies for enhancing the expressivity of such proof systems.


A Appendix

Neural Information Processing Systems

A.1 Dataset Samples We show different kinds of perturbations in our benchmarks in Fig.5. Specifically, our benchmarks include 9 basic types of perturbations, including Gaussian blur, Gaussian noise, radial distortion, and RGB and HSV channels. Another type of datasets include multiple perturbations, where we create multiple random combinations of the basic perturbations. We also include 7 types of previously unseen perturbations (during training) from ImageNet-C [20], which are snow, fog, frost, motion blur, zoom blur, pixelate, and jpeg compression. For each type of perturbation, we generate 5 or 10 levels of varying intensity based on sensitivity analysis in the FID-MA space.


A Max-Min Entropy Framework for Reinforcement Learning

Neural Information Processing Systems

In this paper, we propose a max-min entropy framework for reinforcement learning (RL) to overcome the limitation of the soft actor-critic (SAC) algorithm implementing the maximum entropy RL in model-free sample-based learning. Whereas the maximum entropy RL guides learning for policies to reach states with high entropy in the future, the proposed max-min entropy framework aims to learn to visit states with low entropy and maximize the entropy of these low-entropy states to promote better exploration. For general Markov decision processes (MDPs), an efficient algorithm is constructed under the proposed max-min entropy framework based on disentanglement of exploration and exploitation. Numerical results show that the proposed algorithm yields drastic performance improvement over the current state-of-the-art RL algorithms.


Reformulating Zero-shot Action Recognition for Multi-label Actions (Supplementary Material)

Neural Information Processing Systems

Standard video models expect frame dimensions with the same height and width, so we crop a square region around the actor and resize it to the network specific dimensions (112 112). We present some examples of AVA video frames with their annotations as well as the generated crops in Figure 1. This square crop can cause multiple actors to appear within one clip, as seen in the second example, but it ensures the aspect ratio of the person is not altered, which is necessary as this is the manner in which the video model is trained. Figure 1: Example of original ground-truth bounding boxes (left) in the AVA dataset, with the cropped actors on the right. For PS-ZSAR prediction confidences are obtained from the softmax probabilities output by our pair-wise similarity function.


Bringing Image Structure to Video via Frame-Clip Consistency of Object Tokens

Neural Information Processing Systems

Recent action recognition models have achieved impressive results by integrating objects, their locations and interactions. However, obtaining dense structured annotations for each frame is tedious and time-consuming, making these methods expensive to train and less scalable. On the other hand, one does often have access to a small set of annotated images, either within or outside the domain of interest. Here we ask how such images can be leveraged for downstream video understanding tasks. We propose a learning framework StructureViT (SViT for short), which demonstrates how utilizing the structure of a small number of images only available during training can improve a video model.


clarify that B-RAI [24] is a recently proposed algorithm for estimating the posterior probability of causal relations among observed

Neural Information Processing Systems

We would like to sincerely thank you for your important ideas and constructive comments. It is not related to the deep learning domain. We will clearly state these contributions in the paper. As you suggest, we will define B2N, RAI, and GGT in the paper. An ensemble of 15 (last point on the curve, Figure 1), having a total of 3.6M parameters, is Optimizing for a specific loss hinders other objectives, e.g., accuracy and calibration.


Efficient Algorithms for Smooth Minimax Optimization

Neural Information Processing Systems

In terms of g(, y), we consider two settings - strongly convex and nonconvex - and improve upon the best known rates in both. For strongly-convex g(, y), y, we propose a new direct optimal algorithm combining Mirror-Prox and Nesterov's AGD, and show that it can find global optimum in Õ (1/k


strongly-convex-concave minimax problems first, which we will add in the final revision

Neural Information Processing Systems

We thank all the reviewers for their constructive comments. Conceptual DIAG: The intuition behind Algorithm 1 stems from a "conceptual" version of DIAG (also specified in Algorithm 1, Step 4), which is inspired from the conceptual version of Mirror-Prox (MP) (cf. Thus the overall speed of Imp-STEP is O()) steps. Response to reviewer 1: We agree with and will include, the reviewer's comment, that the non-smoothness of We will devote more space to explaining the DIAG algorithm and discussing more related works. We will add a precise justification (which was omitted due to the lack of space) in the next revision.