Human Behavioral Benchmarking: Numeric Magnitude Comparison Effects in Large Language Models

Open in new window