Plotting

Self-Supervised Few-Shot Learning on Point Clouds

Neural Information Processing Systems

Visualization of ball covers The cover-tree approach of using the balls to group the points in a point cloud is visualized in Figure 1. The visualization shows the process of considering balls shown as transparent spheres at different scales with different densities in a cover-tree. Fig 1a represents the top level (root) of cover-tree which covers the point cloud in a single ball i.e., at level i. Fig 1b and Fig 1c shows the balls at lower level with smaller radiuses as the tree is descended. Thus, we learn local features using balls at various levels with different packing densities. A.1 3D Object Classification Training This section provides the implementation details of our proposed self-supervised network.


Multiview Human Body Reconstruction from Uncalibrated Cameras

Neural Information Processing Systems

We present a new method to reconstruct 3D human body pose and shape by fusing visual features from multiview images captured by uncalibrated cameras. Existing multiview approaches often use spatial camera calibration (intrinsic and extrinsic parameters) to geometrically align and fuse visual features. Despite remarkable performances, the requirement of camera calibration restricted their applicability to real-world scenarios, e.g., reconstruction from social videos with wide-baseline cameras. We address this challenge by leveraging the commonly observed human body as a semantic calibration target, which eliminates the requirement of camera calibration. Specifically, we map per-pixel image features to a canonical body surface coordinate system agnostic to views and poses using dense keypoints (correspondences). This feature mapping allows us to semantically, instead of geometrically, align and fuse visual features from multiview images. We learn a self-attention mechanism to reason about the confidence of visual features across and within views. With fused visual features, a regressor is learned to predict the parameters of a body model. We demonstrate that our calibration-free multiview fusion method reliably reconstructs 3D body pose and shape, outperforming stateof-the-art single view methods with post-hoc multiview fusion, particularly in the presence of non-trivial occlusion, and showing comparable accuracy to multiview methods that require calibration.


42cd63cb189c30ed03e42ce2c069566c-AuthorFeedback.pdf

Neural Information Processing Systems

We sincerely thank all reviewers for their constructive comments. We hope this would shed some light on a better understanding of parameter sharing in NAS. We sincerely appreciate your recognition of our technical contributions. (Line 181). Meanwhile, as you pointed out, different optimization of APS would be interesting to explore in the future.


DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving

Neural Information Processing Systems

Solving mathematical problems requires advanced reasoning abilities and presents notable challenges for large language models. Previous works usually synthesize data from proprietary models to augment existing datasets, followed by instruction tuning to achieve top-tier results. However, our analysis of these datasets reveals severe biases towards easy queries, with frequent failures to generate any correct response for the most challenging queries. Hypothesizing that difficult queries are crucial to learning complex reasoning, we propose Difficulty-Aware Rejection Tuning (DART), a method that allocates difficult queries more trials during the synthesis phase, enabling more extensive training on difficult samples. Utilizing DART, we have created new datasets for mathematical problem-solving that focus more on difficult queries and are substantially smaller than previous ones. Remarkably, our synthesis process solely relies on a 7B-sized open-weight model, without reliance on the commonly used proprietary GPT-4. We fine-tune various base models on our datasets ranging from 7B to 70B in size, resulting in a series of strong models called DART-Math. In comprehensive in-domain and out-of-domain evaluation on 6 mathematical benchmarks, DART-Math outperforms vanilla rejection tuning significantly, being superior or comparable to previous arts, despite using much smaller datasets and no proprietary models.


A Spatial Conditioning Without Bubble Artifacts and spatial conditioning'style's s,s b 2 R

Neural Information Processing Systems

Let us begin by recalling how SPADE works, and study where its defects come from. These statistics are calculated via averages over examples and all spatial dimensions. To clarify, the subtraction and division in (3) are broadcasted on non-channel dimensions, and the pointwise multiplication and addition are broadcasted over examples. SPADE layers are remarkably similar to the Adaptive Instance Normalization (AdaIN) layers that are used in StyleGAN to condition on z. Finally, the conditioning of the generator's output y = g(z) (StyleGAN is an unconditional generative model) is done via AdaIN layers conditioned on s(z).


Approximate Gaussian process inference for the drift function in stochastic differential equations

Neural Information Processing Systems

We introduce a nonparametric approach for estimating drift functions in systems of stochastic differential equations from sparse observations of the state vector. Using a Gaussian process prior over the drift as a function of the state vector, we develop an approximate EM algorithm to deal with the unobserved, latent dynamics between observations. The posterior over states is approximated by a piecewise linearized process of the Ornstein-Uhlenbeck type and the MAP estimation of the drift is facilitated by a sparse Gaussian process regression.


From Stochastic Mixability to Fast Rates

Neural Information Processing Systems

Empirical risk minimization (ERM) is a fundamental learning rule for statistical learning problems where the data is generated according to some unknown distribution P and returns a hypothesis f chosen from a fixed class F with small loss l. In the parametric setting, depending upon (l, F, P) ERM can have slow (1/ n) or fast (1/n) rates of convergence of the excess risk as a function of the sample size n. There exist several results that give sufficient conditions for fast rates in terms of joint properties of l, F, and P, such as the margin condition and the Bernstein condition. In the non-statistical prediction with expert advice setting, there is an analogous slow and fast rate phenomenon, and it is entirely characterized in terms of the mixability of the loss l (there being no role there for F or P). The notion of stochastic mixability builds a bridge between these two models of learning, reducing to classical mixability in a special case. The present paper presents a direct proof of fast rates for ERM in terms of stochastic mixability of (l, F, P), and in so doing provides new insight into the fast-rates phenomenon.


Iterative Methods via Locally Evolving Set Process Baojian Zhou 1,2 Yifan Sun 3

Neural Information Processing Systems

Given the damping factor ฮฑ and precision tolerance ฯต, Andersen et al. [2] introduced Approximate Personalized PageRank (APPR), the de facto local method for approximating the PPR vector, with runtime bounded by ฮ˜(1/(ฮฑฯต)) independent of the graph size. Recently, Fountoulakis & Yang [12] asked whether faster local algorithms could be developed using ร•(1/( ฮฑฯต)) operations. By noticing that APPR is a local variant of Gauss-Seidel, this paper explores the question of whether standard iterative solvers can be effectively localized. We propose to use the locally evolving set process, a novel framework to characterize the algorithm locality, and demonstrate that many standard solvers can be effectively localized.



Neural Embeddings Rank: Aligning 3D latent dynamics with movements

Neural Information Processing Systems

Aligning neural dynamics with movements is a fundamental goal in neuroscience and brain-machine interfaces. However, there is still a lack of dimensionality reduction methods that can effectively align low-dimensional latent dynamics with movements. To address this gap, we propose Neural Embeddings Rank (NER), a technique that embeds neural dynamics into a 3D latent space and contrasts the embeddings based on movement ranks. NER learns to regress continuous representations of neural dynamics (i.e., embeddings) on continuous movements. We apply NER and six other dimensionality reduction techniques to neurons in the primary motor cortex (M1), dorsal premotor cortex (PMd), and primary somatosensory cortex (S1) as monkeys perform reaching tasks.