kolter
Improvedtechniquesfordeterministicl2robustness
Gradient NormPreserving (GNP) architectures where each layer preserves the gradient norm during backpropagation. For 1-Lipschitz Convolutional Neural Networks (CNNs), this involves using orthogonal convolutions (convolution layers with an orthogonal Jacobian matrix) [Li et al., 2019b, Trockman and Kolter,
Small Language Models Are the New Rage, Researchers Say
The original version of this story appeared in Quanta Magazine. Large language models work well because they're so large. The latest models from OpenAI, Meta, and DeepSeek use hundreds of billions of "parameters"--the adjustable knobs that determine connections among data and get tweaked during the training process. With more parameters, the models are better able to identify patterns and connections, which in turn makes them more powerful and accurate. But this power comes at a cost.