Goto

Collaborating Authors

 Education




Label Delay in Online Continual Learning

Neural Information Processing Systems

We introduce a new continual learning framework with explicit modeling of the label delay between data and label streams over time steps.



ALPINE: Unveiling The Planning Capability of Autoregressive Learning in Language Models Siwei Wang

Neural Information Processing Systems

Our mathematical characterization shows that Transformer architectures can execute path-finding by embedding the adjacency and reachability matrices within their weights. Furthermore, our theoretical analysis of gradient-based learning dynamics reveals that LLMs can learn both the adjacency and a limited form of the reachability matrices.






Inductive biases of multi-task learning and finetuning: multiple regimes of feature reuse

Neural Information Processing Systems

Neural networks are often trained on multiple tasks, either simultaneously (multi-task learning, MTL) or sequentially (pretraining and subsequent finetuning, PT+FT). In particular, it is common practice to pretrain neural networks on a large auxiliary task before finetuning on a downstream task with fewer samples. Despite the prevalence of this approach, the inductive biases that arise from learning multiple tasks are poorly characterized. In this work, we address this gap.