Plotting


Why Normalizing Flows Fail to Detect Out-of-Distribution Data

Neural Information Processing Systems

Detecting out-of-distribution (OOD) data is crucial for robust machine learning systems. Normalizing flows are flexible deep generative models that often surprisingly fail to distinguish between in- and out-of-distribution data: a flow trained on pictures of clothing assigns higher likelihood to handwritten digits. We investigate why normalizing flows perform poorly for OOD detection. We demonstrate that flows learn local pixel correlations and generic image-to-latent-space transformations which are not specific to the target image datasets, focusing on flows based on coupling layers. We show that by modifying the architecture of flow coupling layers we can bias the flow towards learning the semantic structure of the target data, improving OOD detection.


Learning threshold neurons via the "edge of stability " Anonymous Author(s) Affiliation Address email

Neural Information Processing Systems

Large step sizes are necessary to learn the "threshold neuron" of a ReLU network (2) for a simple binary classification task (1). We choose d = 200, n = 300, ฮป = 3, and run gradient descent with the logistic loss.



Evolutionary Neural Architecture Search for Transformer in Knowledge Tracing

Neural Information Processing Systems

Transformer has achieved excellent performance in the knowledge tracing (KT) task, but they are criticized for the manually selected input features for fusion and the defect of single global context modelling to directly capture students' forgetting behavior in KT, when the related records are distant from the current record in terms of time. To address the issues, this paper first considers adding convolution operations to the Transformer to enhance its local context modelling ability used for students' forgetting behavior, then proposes an evolutionary neural architecture search approach to automate the input feature selection and automatically determine where to apply which operation for achieving the balancing of the local/global context modelling. In the search space design, the original global path containing the attention module in Transformer is replaced with the sum of a global path and a local path that could contain different convolutions, and the selection of input features is also considered. To search the best architecture, we employ an effective evolutionary algorithm to explore the search space and also suggest a search space reduction strategy to accelerate the convergence of the algorithm. Experimental results on the two largest and most challenging education datasets demonstrate the effectiveness of the architecture found by the proposed approach.



Stochastic Recursive Gradient Descent Ascent for Stochastic Nonconvex-Strongly-Concave Minimax Problems 2 Zhichao Huang

Neural Information Processing Systems

We focus on the stochastic setting, where we can only access an unbiased stochastic gradient estimate of f at each iteration. This formulation includes many machine learning applications as special cases such as robust optimization and adversary training.


Stochastic Recursive Gradient Descent Ascent for Stochastic Nonconvex-Strongly-Concave Minimax Problems 2 Zhichao Huang

Neural Information Processing Systems

We focus on the stochastic setting, where we can only access an unbiased stochastic gradient estimate of f at each iteration. This formulation includes many machine learning applications as special cases such as robust optimization and adversary training.


Reply to Reviewer 2, 3 and 4 Novelty of the analysis: Besides keeping the estimator of f(x k,y

Neural Information Processing Systems

We are happy to cite this paper and compare it with SREDA. A more reasonable way is to solve it by accessing the (stochastic) gradient of f like SREDA. We are happy to follow the reviewer's suggestion and include it in the main text if the paper is accepted.


Alien Amidar Assault Asterix Asteroids Atlantis

Neural Information Processing Systems

For all authors... (a) Do the main claims made in the abstract and introduction accurately reflect the paper's contributions and scope? If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] Code provided as supplemental. If you used crowdsourcing or conducted research with human subjects... (a) Did you include the full text of instructions given to participants and screenshots, if applicable? [N/A] (b) Did you describe any potential participant risks, with links to Institutional Review Board (IRB) approvals, if applicable? [N/A] (c) Did you include the estimated hourly wage paid to participants and the total amount spent on participant compensation? A.1 Implementation, Hyperparameters and Evaluation Details The implementation of our main agent, Tandem DQN, is based on the Double-DQN [van Hasselt et al., 2016] agent provided in the DQN Zoo open-source agent collection [Quan and Ostrovski, 2020]. Figure 12: Tandem DQN: Active vs. passive performance on four selected Classic Control domains.