1 Supplementary Material

Neural Information Processing Systems 

To investigate this further, we first observe that Claude-3.7-Sonnet Figure 1 shows the average pass rate under budgets of 12,000, 10 14,000, 16,000, and 17,000 tokens. As the data demonstrate, enlarging the thinking budget yields no 11 appreciable improvement in performance. This finding underscores 14 the challenging nature of ENGDESIGN and suggests its value as a rigorous testbed for future efforts 15 to enhance LLMs' engineering design proficiency. Figure 1: Average pass rate (%) of Claude-3.7-Thinking

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found