TrustSQL: Benchmarking Text-to-SQL Reliability with Penalty-Based Scoring