direct comparison
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. Mallows models are a classically studied class of distributions over permutations that can be viewed as a sequential model in which items are inserted one by one into a ranking. This paper proposes an interesting hierarchical generalization of Mallows models in which groups of items are sequentially ``merged'' together (as they would be in mergesort). The model can also be viewed as a special case of a recently proposed class of ``riffle independent'' models by Huang/Guestrin, but with a more tractable number of parameters in general and better computational properties. There are several nice contributions in this paper, including a simple and elegant characterization of identifiability of the structure, as well as an interesting structure estimation algorithm based on the inside-outside parsing algorithm for stochastic context free grammars.
- Asia > Middle East > Lebanon (0.06)
- North America > Canada > Quebec > Montreal (0.05)
Uncertainty Estimation using Variance-Gated Distributions
Gillis, H. Martin, Xu, Isaac, Trappenberg, Thomas
Evaluation of per-sample uncertainty quantification from neural networks is essential for decision-making involving high-risk applications. A common approach is to use the predictive distribution from Bayesian or approximation models and decompose the corresponding predictive uncertainty into epistemic (model-related) and aleatoric (data-related) components. However, additive decomposition has recently been questioned. In this work, we propose an intuitive framework for uncertainty estimation and decomposition based on the signal-to-noise ratio of class probability distributions across different model predictions. We introduce a variance-gated measure that scales predictions by a confidence factor derived from ensembles. We use this measure to discuss the existence of a collapse in the diversity of committee machines.
Reviews: Sequence Modeling with Unconstrained Generation Order
Updated review: The authors have indicated that they will run additional experiments and make the clarifications I requested, so I will raise my score in 7 in agreement with the other reviews leading to an "accept" consensus. However, I do note that in their rebuttal the authors describe Gu et al., Stern et al., and Welleck et al. as "concurrent work". To be totally clear, all three of those papers were posted to arxiv in early February; the NeurIPS deadline was over 3 months later and it is now 6 months after they papers appeared online. I would argue that 3 (or 6) months is long enough to provide a more direct comparison and would not consider this submission "concurrent work". I don't think this warrants rejecting the paper, but I do want to note that I disagree with the authors here and still believe that a more direct comparison is appropriate.
Reviews: End-to-End Kernel Learning with Supervised Convolutional Kernel Networks
This paper proposes an original idea and theoretically appealing solutions to solve it. Quality: Its first part (Section 1, 2) is excellent, but its latter part may be a little weak. Section 3 is a little dense with numerous details and heuristics. A pseudo-code showing the overall framework may be helpful for readers. Section 4 is a little short to validate the potential effectiveness of the proposed method. A direct comparison between CKN and CNN with same architecture may be more informative for readers.
A self-organizing multiple-view representation of 3D objects
The form in which these models are best stored depends on the kind of information available in the input, and on the trade-off between the amount of memory allocated for the storage and the degree of sophistication required of the recognition process. In computer vision, a distinction can be made between representation schemes that use 3D object-centered coordinate systems and schemes that store viewpoint-specific information such as 2D views of objects.
The Odious Comparisons Of GPU Inference Performance And Value
While AI training dims the lights at hyperscalers and cloud builders and costs billions of dollars a year, in the long run, there will be a whole lot more aggregate processing done on AI inference than on AI training. It might be a factor of 2X to 3X compute capacity higher soon, and anywhere from 10X to 100X higher capacity within a decade. What we all do suspect, however, is that there will be relatively few heavy duty AI training devices and platforms that use them and myriad and numerous AI inference devices. And so the relative performance and price/performance of compute engines that run inference are going to be important as they are deployed at scale. Meta Platforms helped invent many of the machine learning techniques and technologies that are being deployed in production these days, and it is was no surprise to us that the company had created a unified inference framework, called AITemplate, which it open sourced and described earlier this month in an MetaAI engineering blog post.
Toward A Formalized Approach for Spike Sorting Algorithms and Hardware Evaluation
Zhang, Tim, Lammie, Corey, Azghadi, Mostafa Rahimi, Amirsoleimani, Amirali, Ahmadi, Majid, Genov, Roman
Spike sorting algorithms are used to separate extracellular recordings of neuronal populations into single-unit spike activities. The development of customized hardware implementing spike sorting algorithms is burgeoning. However, there is a lack of a systematic approach and a set of standardized evaluation criteria to facilitate direct comparison of both software and hardware implementations. In this paper, we formalize a set of standardized criteria and a publicly available synthetic dataset entitled Synthetic Simulations Of Extracellular Recordings (SSOER), which was constructed by aggregating existing synthetic datasets with varying Signal-To-Noise Ratios (SNRs). Furthermore, we present a benchmark for future comparison, and use our criteria to evaluate a simulated Resistive Random-Access Memory (RRAM) In-Memory Computing (IMC) system using the Discrete Wavelet Transform (DWT) for feature extraction. Our system consumes approximately (per channel) 10.72mW and occupies an area of 0.66mm$^2$ in a 22nm FDSOI Complementary Metal-Oxide-Semiconductor (CMOS) process.
- North America > Canada > Ontario > Toronto (0.15)
- North America > Canada > Quebec > Montreal (0.14)
- Oceania > Australia > Queensland (0.04)
- North America > Canada > Ontario > Essex County > Windsor (0.04)
- Semiconductors & Electronics (0.48)
- Health & Medicine (0.48)
An AGM Approach to Revising Preferences
Haret, Adrian, Wallner, Johannes P.
We look at preference change arising out of an interaction between two elements: the first is an initial preference ranking encoding a pre-existing attitude; the second element is new preference information signaling input from an authoritative source, which may come into conflict with the initial preference. The aim is to adjust the initial preference and bring it in line with the new preference, without having to give up more information than necessary. We model this process using the formal machinery of belief change, along the lines of the well-known AGM approach. We propose a set of fundamental rationality postulates, and derive the main results of the paper: a set of representation theorems showing that preference change according to these postulates can be rationalized as a choice function guided by a ranking on the comparisons in the initial preference order. We conclude by presenting operators satisfying our proposed postulates. Our approach thus allows us to situate preference revision within the larger family of belief change operators.
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Europe > Austria > Styria > Graz (0.04)
MaxDropout: Deep Neural Network Regularization Based on Maximum Output Values
Santos, Claudio Filipi Goncalves do, Colombo, Danilo, Roder, Mateus, Papa, João Paulo
Different techniques have emerged in the deep learning scenario, such as Convolutional Neural Networks, Deep Belief Networks, and Long Short-Term Memory Networks, to cite a few. In lockstep, regularization methods, which aim to prevent overfitting by penalizing the weight connections, or turning off some units, have been widely studied either. In this paper, we present a novel approach called MaxDropout, a regularizer for deep neural network models that works in a supervised fashion by removing (shutting off) the prominent neurons (i.e., most active) in each hidden layer. The model forces fewer activated units to learn more representative information, thus providing sparsity. Regarding the experiments, we show that it is possible to improve existing neural networks and provide better results in neural networks when Dropout is replaced by MaxDropout. The proposed method was evaluated in image classification, achieving comparable results to existing regularizers, such as Cutout and RandomErasing, also improving the accuracy of neural networks that uses Dropout by replacing the existing layer by MaxDropout.
- North America > Canada > Ontario > Toronto (0.14)
- South America > Brazil > São Paulo (0.04)
Simple coarse graining and sampling strategies for image recognition
A conceptually simple way to recognize images is to directly compare test-set data and training-set data. The accuracy of this approach is limited by the method of comparison used, and by the extent to which the training-set data covers the required configuration space. Here we show that this coverage can be substantially increased using simple strategies of coarse graining (replacing groups of images by their centroids) and sampling (using distinct sets of centroids in combination). We use the MNIST data set to show that coarse graining can be used to convert a subset of training images into about an order of magnitude fewer image centroids, with no loss of accuracy of classification of test-set images by direct (nearest-neighbor) classification. Distinct batches of centroids can be used in combination as a means of sampling configuration space, and can classify test-set data more accurately than can the unaltered training set. The approach works most naturally with multiple processors in parallel.
- North America > United States > Massachusetts (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Asia > Middle East > Jordan (0.04)