Retrieval Models Aren't Tool-Savvy: Benchmarking Tool Retrieval for Large Language Models