Testing Language Model Agents Safely in the Wild