LAB-Bench: Measuring Capabilities of Language Models for Biology Research