Goto

Collaborating Authors

 Statistical Learning








Linear Time Algorithms for k-means with Multi-Swap Local Search Junyu Huang

Neural Information Processing Systems

The local search methods have been widely used to solve the clustering problems. In practice, local search algorithms for clustering problems mainly adapt the single-swap strategy, which enables them to handle large-scale datasets and achieve linear running time in the data size.


Transformers learn to implement preconditioned gradient descent for in-context learning

Neural Information Processing Systems

Several recent works demonstrate that transformers can implement algorithms like gradient descent. By a careful construction of weights, these works show that multiple layers of transformers are expressive enough to simulate iterations of gradient descent.



726ab29b61a749b36d2593648716ae3c-Paper-Conference.pdf

Neural Information Processing Systems

Hence, the performance of LLMs in various NLP tasks depends significantly onthecrucial roleplayedbytheattention mechanism with thesoftmaxunit.