How Reliable Are AI-Generated-Text Detectors? An Assessment Framework Using Evasive Soft Prompts