Goto

Collaborating Authors

 single step


Adversarial Robustness is at Odds with Lazy Training

Neural Information Processing Systems

Recent works show that adversarial examples exist for random neural networks [Daniely and Schacham, 2020] and that these examples can be found using a single step of gradient ascent [Bubeck et al., 2021]. In this work, we extend this line of work to ``lazy training'' of neural networks -- a dominant model in deep learning theory in which neural networks are provably efficiently learnable. We show that over-parametrized neural networks that are guaranteed to generalize well and enjoy strong computational guarantees remain vulnerable to attacks generated using a single step of gradient ascent.


Adversarial Robustness is at Odds with Lazy Training

Neural Information Processing Systems

In this work, we extend this line of work to " lazy training " of neural networks - a dominant model in deep learning theory in which neural networks are provably efficiently learnable.



OccamLLM: Fast and Exact Language Model Arithmetic in a Single Step

Neural Information Processing Systems

Language model systems often enable LLMs to generate code for arithmetic operations to achieve accurate calculations. However, this approach compromises speed and security, and fine-tuning risks the language model losing prior capabilities. We propose a framework that enables exact arithmetic in *a single autoregressive step*, providing faster, more secure, and more interpretable LLM systems with arithmetic capabilities. We use the hidden states of a LLM to control a symbolic architecture that performs arithmetic. Furthermore, OccamLlama outperforms GPT 4o with and without a code interpreter on average across a range of mathematical problem solving benchmarks, demonstrating that OccamLLMs can excel in arithmetic tasks, even surpassing much larger models.


Graph neural networks extrapolate out-of-distribution for shortest paths

Nerem, Robert R., Chen, Samantha, Dasgupta, Sanjoy, Wang, Yusu

arXiv.org Artificial Intelligence

Neural networks (NNs), despite their success and wide adoption, still struggle to extrapolate out-of-distribution (OOD), i.e., to inputs that are not well-represented by their training dataset. Addressing the OOD generalization gap is crucial when models are deployed in environments significantly different from the training set, such as applying Graph Neural Networks (GNNs) trained on small graphs to large, real-world graphs. One promising approach for achieving robust OOD generalization is the framework of neural algorithmic alignment, which incorporates ideas from classical algorithms by designing neural architectures that resemble specific algorithmic paradigms (e.g. dynamic programming). The hope is that trained models of this form would have superior OOD capabilities, in much the same way that classical algorithms work for all instances. We rigorously analyze the role of algorithmic alignment in achieving OOD generalization, focusing on graph neural networks (GNNs) applied to the canonical shortest path problem. We prove that GNNs, trained to minimize a sparsity-regularized loss over a small set of shortest path instances, exactly implement the Bellman-Ford (BF) algorithm for shortest paths. In fact, if a GNN minimizes this loss within an error of $\epsilon$, it implements the BF algorithm with an error of $O(\epsilon)$. Consequently, despite limited training data, these GNNs are guaranteed to extrapolate to arbitrary shortest-path problems, including instances of any size. Our empirical results support our theory by showing that NNs trained by gradient descent are able to minimize this loss and extrapolate in practice.


Adversarial Robustness is at Odds with Lazy Training

Neural Information Processing Systems

Recent works show that adversarial examples exist for random neural networks [Daniely and Schacham, 2020] and that these examples can be found using a single step of gradient ascent [Bubeck et al., 2021]. In this work, we extend this line of work to lazy training'' of neural networks -- a dominant model in deep learning theory in which neural networks are provably efficiently learnable. We show that over-parametrized neural networks that are guaranteed to generalize well and enjoy strong computational guarantees remain vulnerable to attacks generated using a single step of gradient ascent.


Rome was Not Built in a Single Step: Hierarchical Prompting for LLM-based Chip Design

Nakkab, Andre, Zhang, Sai Qian, Karri, Ramesh, Garg, Siddharth

arXiv.org Artificial Intelligence

Large Language Models (LLMs) are effective in computer hardware synthesis via hardware description language (HDL) generation. However, LLM-assisted approaches for HDL generation struggle when handling complex tasks. We introduce a suite of hierarchical prompting techniques which facilitate efficient stepwise design methods, and develop a generalizable automation pipeline for the process. To evaluate these techniques, we present a benchmark set of hardware designs which have solutions with or without architectural hierarchy. Using these benchmarks, we compare various open-source and proprietary LLMs, including our own fine-tuned Code Llama-Verilog model. Our hierarchical methods automatically produce successful designs for complex hardware modules that standard flat prompting methods cannot achieve, allowing smaller open-source LLMs to compete with large proprietary models. Hierarchical prompting reduces HDL generation time and yields savings on LLM costs. Our experiments detail which LLMs are capable of which applications, and how to apply hierarchical methods in various modes. We explore case studies of generating complex cores using automatic scripted hierarchical prompts, including the first-ever LLM-designed processor with no human feedback. Tools for the Recurrent Optimization via Machine Editing (ROME) method can be found at https://github.com/ajn313/ROME-LLM


Training trajectories, mini-batch losses and the curious role of the learning rate

Sandler, Mark, Zhmoginov, Andrey, Vladymyrov, Max, Miller, Nolan

arXiv.org Artificial Intelligence

Stochastic gradient descent plays a fundamental role in nearly all applications of deep learning. However its ability to converge to a global minimum remains shrouded in mystery. In this paper we propose to study the behavior of the loss function on fixed mini-batches along SGD trajectories. We show that the loss function on a fixed batch appears to be remarkably convex-like. In particular for ResNet the loss for any fixed mini-batch can be accurately modeled by a quadratic function and a very low loss value can be reached in just one step of gradient descent with sufficiently large learning rate. We propose a simple model that allows to analyze the relationship between the gradients of stochastic mini-batches and the full batch. Our analysis allows us to discover the equivalency between iterate aggregates and specific learning rate schedules. In particular, for Exponential Moving Average (EMA) and Stochastic Weight Averaging we show that our proposed model matches the observed training trajectories on ImageNet. Our theoretical model predicts that an even simpler averaging technique, averaging just two points a many steps apart, significantly improves accuracy compared to the baseline. We validated our findings on ImageNet and other datasets using ResNet architecture.


New 3-D printing technique can make autonomous robots in a single step

Los Angeles Times

Building a robot is hard. Building one that can sense its environment and learn how to get around on its own is even harder. But UCLA engineers took on an even bigger challenge. Not only did they create autonomous robots, they 3-D printed them in a single step. Each robot is about the size of a fingertip.


Researchers Demonstrate AI Can Be Fooled

#artificialintelligence

The artificial intelligence systems used by image recognition tools, such as those that certain connected cars use to identify street signs, can be tricked to make an incorrect identification by a low-cost but effective attack using a camera, a projector and a PC, according to Purdue University researchers. A research paper describes an Optical Adversarial Attack, or OPAD, which uses a projector to project calculated patterns that alter the appearance of the 3D objects to AI-based image recognition systems. The paper will be presented in October at an ICCV 2021 Workshop. In an experiment, a pattern was projected onto a stop sign, causing the image recognition to read the sign as a speed limit sign instead. The researchers say this attack method could also work with image recognition tools in applications ranging from military drones to facial recognition systems, potentially undermining their reliability.