Efficient Training of Retrieval Models Using Negative Cache

Neural Information Processing Systems 

A popular paradigm for such learning tasks involves training two separate neural networks (often called two-towers or dual-encoders), each representing a query and a document.