Goto

Collaborating Authors

 kolter


Improvedtechniquesfordeterministicl2robustness

Neural Information Processing Systems

Gradient NormPreserving (GNP) architectures where each layer preserves the gradient norm during backpropagation. For 1-Lipschitz Convolutional Neural Networks (CNNs), this involves using orthogonal convolutions (convolution layers with an orthogonal Jacobian matrix) [Li et al., 2019b, Trockman and Kolter,


70afbf2259b4449d8ae1429e054df1b1-Paper.pdf

Neural Information Processing Systems

This approach allows for formal subdifferentiation: forinstance, replacing derivativesbyClarkeJacobians in the usual differentiation formulas is fully justified for a wide class of nonsmooth problems.







Certified Robustness for Deep Equilibrium Models via Serialized Random Smoothing

Neural Information Processing Systems

Implicit models such as Deep Equilibrium Models (DEQs) have emerged as promising alternative approaches for building deep neural networks. Their certified robustness has gained increasing research attention due to security concerns.


Certified Robustness for Deep Equilibrium Models via Serialized Random Smoothing

Neural Information Processing Systems

Implicit models such as Deep Equilibrium Models (DEQs) have emerged as promising alternative approaches for building deep neural networks. Their certified robustness has gained increasing research attention due to security concerns.


Small Language Models Are the New Rage, Researchers Say

WIRED

The original version of this story appeared in Quanta Magazine. Large language models work well because they're so large. The latest models from OpenAI, Meta, and DeepSeek use hundreds of billions of "parameters"--the adjustable knobs that determine connections among data and get tweaked during the training process. With more parameters, the models are better able to identify patterns and connections, which in turn makes them more powerful and accurate. But this power comes at a cost.