Goto

Collaborating Authors

 number


Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data

Yuanzhi Li, Yingyu Liang

Neural Information Processing Systems

Neural networks have many successful applications, while much less theoretical understanding has been gained. Towards bridging this gap, we study the problem of learning a two-layer overparameterized ReLU neural network for multi-class classification via stochastic gradient descent (SGD) from random initialization. In the overparameterized setting, when the data comes from mixtures of well-separated distributions, we prove that SGD learns a network with a small generalization error, albeit the network has enough capacity to fit arbitrary labels. Furthermore, the analysis provides interesting insights into several aspects of learning neural networks and can be verified based on empirical studies on synthetic data and on the MNIST dataset.


Number's up: Calculators hold out against AI

The Japan Times

Number's up: Calculators hold out against AI The Casio Mini, the world's first personal calculator, is seen at the Toshio Kashio Memorial Museum of Invention in Tokyo on Nov. 25. Tokyo/Bangkok - The humble pocket calculator may not be able to keep up with the mathematical capabilities of new technology, but it will never hallucinate. The device's enduring reliability equates to millions of sales each year for Japan's Casio, which is even eyeing expansion in certain regions. Despite lightning-speed advances in artificial intelligence, chatbots still sometimes stumble on basic addition. In a time of both misinformation and too much information, quality journalism is more crucial than ever.



A Histological

Neural Information Processing Systems

These images were evenly split between cases diagnosed with adenocarcinoma of the lung and squamous cell carcinoma, representing the two most common sub-types in lung cancer. The images were scanned on an Aperio scanner at a resolution of 0 . Different classes used for conditioning were annotated digitally by a pathologist using an apple pencil with the instruction to clearly demarcate boundaries between tissue regions. The pathologist could choose from a list of 40 distinct annotation categories, aiming to cover all possible annotation requirements. All data handling was performed in strict accordance with privacy regulations and ethical standards, ensuring the protection of patient information at all times.


PolyGraph Discrepancy: a classifier-based metric for graph generation

Krimmel, Markus, Hartout, Philip, Borgwardt, Karsten, Chen, Dexiong

arXiv.org Machine Learning

Existing methods for evaluating graph generative models primarily rely on Maximum Mean Discrepancy (MMD) metrics based on graph descriptors. While these metrics can rank generative models, they do not provide an absolute measure of performance. Their values are also highly sensitive to extrinsic parameters, namely kernel and descriptor parametrization, making them incomparable across different graph descriptors. We introduce PolyGraph Discrepancy (PGD), a new evaluation framework that addresses these limitations. It approximates the Jensen-Shannon distance of graph distributions by fitting binary classifiers to distinguish between real and generated graphs, featurized by these descriptors. The data log-likelihood of these classifiers approximates a variational lower bound on the JS distance between the two distributions. Resulting metrics are constrained to the unit interval [0,1] and are comparable across different graph descriptors. We further derive a theoretically grounded summary metric that combines these individual metrics to provide a maximally tight lower bound on the distance for the given descriptors. Thorough experiments demonstrate that PGD provides a more robust and insightful evaluation compared to MMD metrics. The PolyGraph framework for benchmarking graph generative models is made publicly available at https://github.com/BorgwardtLab/polygraph-benchmark.



Supplementary Material: Discovering Reinforcement Learning Algorithms Junhyuk Oh Matteo Hessel Wojciech M. Czarnecki Zhongwen Xu Hado van Hasselt Satinder Singh David Silver DeepMind

Neural Information Processing Systems

In tabular grid worlds, object locations are randomised across lifetimes but fixed within a lifetime. There are two different action spaces. The other version has only 9 movement actions. The episode terminates after a fixed number of steps (i.e., chain length), which is There is no state aliasing because all states are distinct. We trained LPGs by simulating 960 parallel lifetimes (i.e., batch size for meta-gradients), each of Rectified linear unit (ReLU) was used as activation function throughout the experiment.


On the Number of Linear Regions of Deep Neural Networks

Neural Information Processing Systems

We study the complexity of functions computable by deep feedforward neural networks with piecewise linear activations in terms of the symmetries and the number of linear regions that they have. Deep networks are able to sequentially map portions of each layer's input-space to the same output. In this way, deep models compute functions that react equally to complicated patterns of different inputs. The compositional structure of these functions enables them to re-use pieces of computation exponentially often in terms of the network's depth. This paper investigates the complexity of such compositional maps and contributes new theoretical results regarding the advantage of depth for neural networks with piecewise linear activation functions. In particular, our analysis is not specific to a single family of models, and as an example, we employ it for rectifier and maxout networks. We improve complexity bounds from pre-existing work and investigate the behavior of units in higher layers.



Adaptive political surveys and GPT-4: Tackling the cold start problem with simulated user interactions

Bachmann, Fynn, van der Weijden, Daan, Heitz, Lucien, Sarasua, Cristina, Bernstein, Abraham

arXiv.org Artificial Intelligence

Adaptive questionnaires dynamically select the next question for a survey participant based on their previous answers. Due to digitalisation, they have become a viable alternative to traditional surveys in application areas such as political science. One limitation, however, is their dependency on data to train the model for question selection. Often, such training data (i.e., user interactions) are unavailable a priori. To address this problem, we (i) test whether Large Language Models (LLM) can accurately generate such interaction data and (ii) explore if these synthetic data can be used to pre-train the statistical model of an adaptive political survey. To evaluate this approach, we utilise existing data from the Swiss Voting Advice Application (VAA) Smartvote in two ways: First, we compare the distribution of LLM-generated synthetic data to the real distribution to assess its similarity. Second, we compare the performance of an adaptive questionnaire that is randomly initialised with one pre-trained on synthetic data to assess their suitability for training. We benchmark these results against an "oracle" questionnaire with perfect prior knowledge. We find that an off-the-shelf LLM (GPT-4) accurately generates answers to the Smartvote questionnaire from the perspective of different Swiss parties. Furthermore, we demonstrate that initialising the statistical model with synthetic data can (i) significantly reduce the error in predicting user responses and (ii) increase the candidate recommendation accuracy of the VAA. Our work emphasises the considerable potential of LLMs to create training data to improve the data collection process in adaptive questionnaires in LLM-affine areas such as political surveys.