Goto

Collaborating Authors

 wallach



f4f2f2b3c67da711df6df557fc870c4a-Paper-Conference.pdf

Neural Information Processing Systems

We find that the inconsistency between training and inference of BN is the leading cause that results in the failure of BN in NLP. We define Training Inference Discrepancy (TID) to quantitatively measure this inconsistencyand reveal that TID can indicate BN'sperformance, supported by extensiveexperiments,includingimageclassification,neuralmachinetranslation, language modeling, sequence labeling, andtextclassification tasks.



Gradient-basedEditingofMemoryExamplesfor Online Task-freeContinualLearning

Neural Information Processing Systems

GMED-editedexamplesremain similar to their unedited forms, but can yield increased loss in the upcoming model updates, thereby making thefuture replays more effectiveinovercoming catastrophic forgetting.




AdaptiveStochasticVarianceReduction forNon-convexFinite-SumMinimization

Neural Information Processing Systems

Toourknowledge, ADASPIDER isthefirstparameterfree non-convex variance-reduction method in the sense that it does not require the knowledge of problem-dependent parameters, such as smoothness constant L,targetaccuracyϵoranybound ongradient norms.


high

Neural Information Processing Systems

We show it depends on the precise way in which the limit is taken, and in particular on how the quantityofdata,thehiddenlayerwidth,&thelearningratescalesasd .


Anonlinepassive-aggressivealgorithmfor difference-of-squaresclassification

Neural Information Processing Systems

Forsuch models, one particularly elegant approach is that ofpassive-aggressive learning[3]. In this framework, a model isonly updated when itfails to classify an example correctly with high confidence.