Joint speech and overlap detection: a benchmark over multiple audio setup and speech domains