r/MachineLearning - [Project] pgANN Fast Approximate Nearest Neighbor (ANN) searches with a PostgreSQL database.


Hi, we did experiment with ES, using range queries on the vectors and boolean querying them and also tried using LSH/MinHash to save a signature for each vector. Did you have a different approach in mind? Also, you're correct about L-1 & L2 distances being poor metrics in this dimensionality, but our goal was to fetch a subset of (say) a few thousand "good enough" results - from a pool of a tens of millions - that can then be re-ranked with cosine or such metric. Unfortunately, there are no easy wins in ANN and this works well enough for us. We hope others can benefit as well.