Not enough data to create a plot.
Try a different view from the menu above.
Consent in Crisis: The Rapid Decline of the AI Data Commons, Ariel Lee
General-purpose artificial intelligence (AI) systems are built on massive swathes of public web data, assembled into corpora such as C4, RefinedWeb, and Dolma. To our knowledge, we conduct the first, large-scale, longitudinal audit of the consent protocols for the web domains underlying AI training corpora. Our audit of 14, 000 web domains provides an expansive view of crawlable web data and how codified data use preferences are changing over time. We observe a proliferation of AIspecific clauses to limit use, acute differences in restrictions on AI developers, as well as general inconsistencies between websites' expressed intentions in their Terms of Service and their robots.txt. We diagnose these as symptoms of ineffective web protocols, not designed to cope with the widespread re-purposing of the internet for AI.
Transformers Can Do Arithmetic with the Right Embeddings
The poor performance of transformers on arithmetic tasks seems to stem in large part from their inability to keep track of the exact position of each digit inside of a large span of digits. We mend this problem by adding an embedding to each digit that encodes its position relative to the start of the number. In addition to the boost these embeddings provide on their own, we show that this fix enables architectural modifications such as input injection and recurrent layers to improve performance even further. With positions resolved, we can study the logical extrapolation ability of transformers. Can they solve arithmetic problems that are larger and more complex than those in their training data? We find that training on only 20 digit numbers with a single GPU for one day, we can reach state-of-the-art performance, achieving up to 99% accuracy on 100 digit addition problems. Finally, we show that these gains in numeracy also unlock improvements on other multi-step reasoning tasks including sorting and multiplication.
YOLOv10: Real-Time End-to-End Object Detection Ao Wang 1 Hui Chen 2 Kai Chen
Over the past years, YOLOs have emerged as the predominant paradigm in the field of real-time object detection owing to their effective balance between computational cost and detection performance. Researchers have explored the architectural designs, optimization objectives, data augmentation strategies, and others for YO-LOs, achieving notable progress. However, the reliance on the non-maximum suppression (NMS) for post-processing hampers the end-to-end deployment of YOLOs and adversely impacts the inference latency. Besides, the design of various components in YOLOs lacks the comprehensive and thorough inspection, resulting in noticeable computational redundancy and limiting the model's capability.
Score-Optimal Diffusion Schedules Christopher Williams Andrew Campbell Department of Statistics Department of Statistics University of Oxford University of Oxford Arnaud Doucet
DDMs generate a path of probability distributions interpolating between a reference Gaussian distribution and a data distribution by incrementally injecting noise into the data. To numerically simulate the sampling process, a discretisation schedule from the reference back towards clean data must be chosen. An appropriate discretisation schedule is crucial to obtain high quality samples. However, beyond hand crafted heuristics, a general method for choosing this schedule remains elusive. This paper presents a novel algorithm for adaptively selecting an optimal discretisation schedule with respect to a cost that we derive. Our cost measures the work done by the simulation procedure to transport samples from one point in the diffusion path to the next. Our method does not require hyperparameter tuning and adapts to the dynamics and geometry of the diffusion path. Our algorithm only involves the evaluation of the estimated Stein score, making it scalable to existing pre-trained models at inference time and online during training. We find that our learned schedule recovers performant schedules previously only discovered through manual search and obtains competitive FID scores on image datasets.
Reviewer 1 of these papers investigate the relationship between regret and stability of an online learning algorithm and a comparison
We thank all reviewers for their comments. Minor comments will be addressed in the final version. Comparison with related work Thanks for the references to work of Ross & Bagnell, Saha et al., and Arora et al. Your questioning of the dimension dependence in Theorem 3.2 and Corollary 3.3 is valid. OGD/FTRL algorithms in these settings will not incur the dimension dependence. Further, this dimension dependence only arises in Theorem 3.2 and Corollary 3.3.
Fast Channel Simulation via Error-Correcting Codes
We consider the design of practically-implementable schemes for the task of channel simulation. Existing methods do not scale with the number of simultaneous uses of the channel and are therefore unable to harness the amortization gains associated with simulating many uses of the channel at once. We show how techniques from the theory of error-correcting codes can be applied to achieve scalability and hence improved performance. As an exemplar, we focus on how polar codes can be used to efficiently simulate i.i.d.
c3266c14d7eb9715d9fad4306133aa4e-Paper-Conference.pdf
Heterogeneity, e.g., due to different types of layers or multiple sub-models, poses key challenges in analyzing the generalization behavior of several modern architectures. For instance, descriptors based on Persistent Homology (PH) are being increasingly integrated into Graph Neural Networks (GNNs) to augment them with rich topological features; however, the generalization of such PH schemes remains unexplored. We introduce a novel compositional PAC-Bayes framework that provides a general recipe to analyze a broad spectrum of models including those with heterogeneous layers. Specifically, we provide the first data-dependent generalization bounds for a widely adopted PH vectorization scheme (that subsumes persistence landscapes, images, and silhouettes) as well as PH-augmented GNNs. Using our framework, we also obtain bounds for GNNs and neural nets with ease. Our bounds also inform the design of novel regularizers. Empirical evaluations on several standard real-world datasets demonstrate that our theoretical bounds highly correlate with empirical generalization performance, leading to improved classifier design via our regularizers.
Stochastic Gradient Hamiltonian Monte Carlo Methods with Recursive Variance Reduction
Difan Zou, Pan Xu, Quanquan Gu
Stochastic Gradient Hamiltonian Monte Carlo (SGHMC) algorithms have received increasing attention in both theory and practice. In this paper, we propose a Stochastic Recursive Variance-Reduced gradient HMC (SRVR-HMC) algorithm. It makes use of a semi-stochastic gradient estimator that recursively accumulates the gradient information to reduce the variance of the stochastic gradient. We provide a convergence analysis of SRVR-HMC for sampling from a class of non-log-concave distributions and show that SRVR-HMC converges faster than all existing HMC-type algorithms based on underdamped Langevin dynamics. Thorough experiments on synthetic and real-world datasets validate our theory and demonstrate the superiority of SRVR-HMC.
Fisher Efficient Inference of Intractable Models
Song Liu, Takafumi Kanamori, Wittawat Jitkrittum, Yu Chen
Maximum Likelihood Estimators (MLE) has many good properties. For example, the asymptotic variance of MLE solution attains equality of the asymptotic Cramér-Rao lower bound (efficiency bound), which is the minimum possible variance for an unbiased estimator. However, obtaining such MLE solution requires calculating the likelihood function which may not be tractable due to the normalization term of the density model. In this paper, we derive a Discriminative Likelihood Estimator (DLE) from the Kullback-Leibler divergence minimization criterion implemented via density ratio estimation and a Stein operator. We study the problem of model inference using DLE. We prove its consistency and show that the asymptotic variance of its solution can attain the equality of the efficiency bound under mild regularity conditions. We also propose a dual formulation of DLE which can be easily optimized. Numerical studies validate our asymptotic theorems and we give an example where DLE successfully estimates an intractable model constructed using a pre-trained deep neural network.
c2e06e9a80370952f6ec5463c77cbace-AuthorFeedback.pdf
We thank reviewers for their insightful comments and here we would like to address some questions raised in the review. R1: "Consistency results are given but these assume the parameter space is compact (and other not so simple R1: "... though this further assumes a (fairly strong) condition of uniform convergence..." The uniform "R1: it would be good to compare DLE to for example KSD on a We run the same typical/outlier image detection task in Section 6.2 on Fashion MNIST dataset and compare DLE and KSD (see the figure). "R2:... but it didn't compare to other methods like Contrastive Divergence However, we consider a much wider family of models. Those results will be presented in revision. R2:"... how a practitioner could select the Stein features to use?" R3: "...some guidelines or heuristics for how to select the feature.." Section 4.3 provides an information-criterion based model selection method.