FLEX: Expert-level False-Less EXecution Metric for Reliable Text-to-SQL Benchmark

Open in new window