CodeARC: Benchmarking Reasoning Capabilities of LLM Agents for Inductive Program Synthesis