Have LLMs Advanced Enough? A Challenging Problem Solving Benchmark For Large Language Models